Local LLM Services

Doco Translate supports running translations entirely on your Mac using local AI models. This keeps your documents private — no data leaves your machine.

Why Use Local Models?

  • Privacy — Your documents never leave your Mac. No text is sent to external servers.
  • No API costs — Run translations without paying per-request fees.
  • Offline capability — Translate documents without an internet connection (after downloading models).
  • Full control — Choose exactly which model to use and configure its behavior.

Trade-off: Local models generally produce lower quality translations compared to cloud-based AI services, and translation speed depends on your Mac's hardware (CPU, GPU, and RAM).

Supported Local Services

Ollama

Ollama is a popular open-source tool for running large language models locally on macOS, Linux, and Windows.

  • Default Host: http://localhost:11434
  • API Key: Not required (unless authentication is enabled on your Ollama instance)
  • Website: https://ollama.com

Setup

  1. Install Ollama:
    • Download from ollama.com or install via Homebrew:
      brew install ollama
      
  2. Pull a model: Open Terminal and run:
    ollama pull qwen3.6
    

    Popular models for translation:
    • qwen3.6 — Strong multilingual support, especially for Chinese and Asian languages
    • llama3.2 — General-purpose, good balance of speed and quality
    • gemma4 — Google's open model, good for European languages
  3. Start Ollama: Ollama runs automatically after installation. If not, start it manually:
    ollama serve
    
  4. Configure in Doco Translate:
    • Go to Settings → Services → Ollama.
    • The default host (http://localhost:11434) should work out of the box.
    • Use Fetch model list to automatically detect models you've pulled.
    • Select a model from the dropdown.
    • Click Verify service to test the connection.

Tips for Ollama

  • Model size matters: Larger models (70B+) produce better translations but require more RAM and run slower. Start with 7B–8B models for a good balance.
  • GPU acceleration: Ollama automatically uses Apple Silicon GPU acceleration on M-series Macs.
  • Keep Ollama running: Make sure the Ollama service is running before using it in Doco Translate.

LM Studio

LM Studio is a desktop application for discovering, downloading, and running local LLMs with a graphical interface.

Setup

  1. Install LM Studio:
  2. Download a model:
    • Open LM Studio.
    • Use the search bar to find a model (e.g., qwen3.6, gemma4).
    • Click Download on your preferred model variant.
  3. Start the local server:
    • In LM Studio, go to the Local Server tab (left sidebar).
    • Select the model you downloaded.
    • Click Start Server.
  4. Configure in Doco Translate:
    • Go to Settings → Services → LM Studio.
    • The default host (http://localhost:1234) should work if LM Studio's server is running.
    • Use Fetch model list to detect the loaded model, or enter the model name manually.
    • Click Verify service to test the connection.

Tips for LM Studio

  • Only one model at a time: LM Studio loads one model into memory at a time. Switching models requires unloading the current one first.
  • Quantization: LM Studio supports various quantization levels (Q4, Q5, Q8). Lower quantization (Q4) uses less memory but may reduce quality.
  • Server must be running: The LM Studio local server must be active for Doco Translate to connect.

Configuring Local Services

Local service settings in Doco Translate are similar to cloud AI services, with a few differences:

  • No API key required — Local services don't need authentication by default. If you've configured authentication on your local service, you can enter the credentials.
  • Custom host — You can change the host if your local service runs on a different port or machine.
  • Model selection — Use Fetch model list to auto-detect available models, or add models manually.

Custom Local Services

If you run another local LLM server that's compatible with the OpenAI API format:

  1. Go to Settings → Services and click Custom Service.
  2. Enter a name for your service.
  3. Select the OpenAI protocol.
  4. Enter the host address of your local server.
  5. Configure the model name and other settings as needed.

This works with any OpenAI-compatible server, including:

  • vLLM
  • text-generation-webui
  • LocalAI
  • llama.cpp server
  • Any custom API server

Performance Considerations

Translation speed with local models depends on several factors:

FactorImpact
Model sizeSmaller models (7B) are faster; larger models (70B+) are slower but more accurate
QuantizationLower quantization = faster but less accurate
HardwareApple Silicon M-series chips provide the best performance
RAMLarger models require more RAM (8B ≈ 5GB, 70B ≈ 40GB)
ConcurrencyLower concurrency settings (1–2) work better for local models to avoid overloading

Recommendation: Start with a 7B–8B model and increase model size only if quality is insufficient. Set Max Concurrent Pages to 1 or 2 for local services to avoid overloading your Mac.


Previous: AI Services · Next: Custom Services