Local LLM Services
Doco Translate supports running translations entirely on your Mac using local AI models. This keeps your documents private — no data leaves your machine.
Why Use Local Models?
- Privacy — Your documents never leave your Mac. No text is sent to external servers.
- No API costs — Run translations without paying per-request fees.
- Offline capability — Translate documents without an internet connection (after downloading models).
- Full control — Choose exactly which model to use and configure its behavior.
Trade-off: Local models generally produce lower quality translations compared to cloud-based AI services, and translation speed depends on your Mac's hardware (CPU, GPU, and RAM).
Supported Local Services
Ollama
Ollama is a popular open-source tool for running large language models locally on macOS, Linux, and Windows.
- Default Host:
http://localhost:11434 - API Key: Not required (unless authentication is enabled on your Ollama instance)
- Website: https://ollama.com
Setup
- Install Ollama:
- Download from ollama.com or install via Homebrew:
brew install ollama
- Download from ollama.com or install via Homebrew:
- Pull a model:
Open Terminal and run:
ollama pull qwen3.6
Popular models for translation:qwen3.6— Strong multilingual support, especially for Chinese and Asian languagesllama3.2— General-purpose, good balance of speed and qualitygemma4— Google's open model, good for European languages
- Start Ollama:
Ollama runs automatically after installation. If not, start it manually:
ollama serve - Configure in Doco Translate:
- Go to Settings → Services → Ollama.
- The default host (
http://localhost:11434) should work out of the box. - Use Fetch model list to automatically detect models you've pulled.
- Select a model from the dropdown.
- Click Verify service to test the connection.
Tips for Ollama
- Model size matters: Larger models (70B+) produce better translations but require more RAM and run slower. Start with 7B–8B models for a good balance.
- GPU acceleration: Ollama automatically uses Apple Silicon GPU acceleration on M-series Macs.
- Keep Ollama running: Make sure the Ollama service is running before using it in Doco Translate.
LM Studio
LM Studio is a desktop application for discovering, downloading, and running local LLMs with a graphical interface.
- Default Host:
http://localhost:1234 - API Key: Not required
- Website: https://lmstudio.ai
Setup
- Install LM Studio:
- Download from lmstudio.ai.
- Download a model:
- Open LM Studio.
- Use the search bar to find a model (e.g.,
qwen3.6,gemma4). - Click Download on your preferred model variant.
- Start the local server:
- In LM Studio, go to the Local Server tab (left sidebar).
- Select the model you downloaded.
- Click Start Server.
- Configure in Doco Translate:
- Go to Settings → Services → LM Studio.
- The default host (
http://localhost:1234) should work if LM Studio's server is running. - Use Fetch model list to detect the loaded model, or enter the model name manually.
- Click Verify service to test the connection.
Tips for LM Studio
- Only one model at a time: LM Studio loads one model into memory at a time. Switching models requires unloading the current one first.
- Quantization: LM Studio supports various quantization levels (Q4, Q5, Q8). Lower quantization (Q4) uses less memory but may reduce quality.
- Server must be running: The LM Studio local server must be active for Doco Translate to connect.
Configuring Local Services
Local service settings in Doco Translate are similar to cloud AI services, with a few differences:
- No API key required — Local services don't need authentication by default. If you've configured authentication on your local service, you can enter the credentials.
- Custom host — You can change the host if your local service runs on a different port or machine.
- Model selection — Use Fetch model list to auto-detect available models, or add models manually.
Custom Local Services
If you run another local LLM server that's compatible with the OpenAI API format:
- Go to Settings → Services and click Custom Service.
- Enter a name for your service.
- Select the OpenAI protocol.
- Enter the host address of your local server.
- Configure the model name and other settings as needed.
This works with any OpenAI-compatible server, including:
- vLLM
- text-generation-webui
- LocalAI
- llama.cpp server
- Any custom API server
Performance Considerations
Translation speed with local models depends on several factors:
| Factor | Impact |
|---|---|
| Model size | Smaller models (7B) are faster; larger models (70B+) are slower but more accurate |
| Quantization | Lower quantization = faster but less accurate |
| Hardware | Apple Silicon M-series chips provide the best performance |
| RAM | Larger models require more RAM (8B ≈ 5GB, 70B ≈ 40GB) |
| Concurrency | Lower concurrency settings (1–2) work better for local models to avoid overloading |
Recommendation: Start with a 7B–8B model and increase model size only if quality is insufficient. Set Max Concurrent Pages to 1 or 2 for local services to avoid overloading your Mac.
Previous: AI Services · Next: Custom Services
