Local LLM Services

Doco Translate supports running translations entirely on your Mac using local AI models. This keeps your documents private — no data leaves your machine.

Why Use Local Models?

Privacy — Your documents never leave your Mac. No text is sent to external servers.
No API costs — Run translations without paying per-request fees.
Offline capability — Translate documents without an internet connection (after downloading models).
Full control — Choose exactly which model to use and configure its behavior.

Trade-off: Local models generally produce lower quality translations compared to cloud-based AI services, and translation speed depends on your Mac's hardware (CPU, GPU, and RAM).

Supported Local Services

Ollama

Ollama is a popular open-source tool for running large language models locally on macOS, Linux, and Windows.

Default Host: http://localhost:11434
API Key: Not required (unless authentication is enabled on your Ollama instance)
Website: https://ollama.com

Setup

Install Ollama:
- Download from ollama.com or install via Homebrew:
```
brew install ollama
```
Pull a model: Open Terminal and run:
```
ollama pull qwen3.6
```
Popular models for translation:
- qwen3.6 — Strong multilingual support, especially for Chinese and Asian languages
- llama3.2 — General-purpose, good balance of speed and quality
- gemma4 — Google's open model, good for European languages
Start Ollama: Ollama runs automatically after installation. If not, start it manually:
```
ollama serve
```
Configure in Doco Translate:
- Go to Settings → Services → Ollama.
- The default host (http://localhost:11434) should work out of the box.
- Use Fetch model list to automatically detect models you've pulled.
- Select a model from the dropdown.
- Click Verify service to test the connection.

Tips for Ollama

Model size matters: Larger models (70B+) produce better translations but require more RAM and run slower. Start with 7B–8B models for a good balance.
GPU acceleration: Ollama automatically uses Apple Silicon GPU acceleration on M-series Macs.
Keep Ollama running: Make sure the Ollama service is running before using it in Doco Translate.

LM Studio

LM Studio is a desktop application for discovering, downloading, and running local LLMs with a graphical interface.

Default Host: http://localhost:1234
API Key: Not required
Website: https://lmstudio.ai

Setup

Install LM Studio:
- Download from lmstudio.ai.
Download a model:
- Open LM Studio.
- Use the search bar to find a model (e.g., qwen3.6, gemma4).
- Click Download on your preferred model variant.
Start the local server:
- In LM Studio, go to the Local Server tab (left sidebar).
- Select the model you downloaded.
- Click Start Server.
Configure in Doco Translate:
- Go to Settings → Services → LM Studio.
- The default host (http://localhost:1234) should work if LM Studio's server is running.
- Use Fetch model list to detect the loaded model, or enter the model name manually.
- Click Verify service to test the connection.

Tips for LM Studio

Only one model at a time: LM Studio loads one model into memory at a time. Switching models requires unloading the current one first.
Quantization: LM Studio supports various quantization levels (Q4, Q5, Q8). Lower quantization (Q4) uses less memory but may reduce quality.
Server must be running: The LM Studio local server must be active for Doco Translate to connect.

Configuring Local Services

Local service settings in Doco Translate are similar to cloud AI services, with a few differences:

No API key required — Local services don't need authentication by default. If you've configured authentication on your local service, you can enter the credentials.
Custom host — You can change the host if your local service runs on a different port or machine.
Model selection — Use Fetch model list to auto-detect available models, or add models manually.

Custom Local Services

If you run another local LLM server that's compatible with the OpenAI API format:

Go to Settings → Services and click Custom Service.
Enter a name for your service.
Select the OpenAI protocol.
Enter the host address of your local server.
Configure the model name and other settings as needed.

This works with any OpenAI-compatible server, including:

vLLM
text-generation-webui
LocalAI
llama.cpp server
Any custom API server

Performance Considerations

Translation speed with local models depends on several factors:

Factor	Impact
Model size	Smaller models (7B) are faster; larger models (70B+) are slower but more accurate
Quantization	Lower quantization = faster but less accurate
Hardware	Apple Silicon M-series chips provide the best performance
RAM	Larger models require more RAM (8B ≈ 5GB, 70B ≈ 40GB)
Concurrency	Lower concurrency settings (1–2) work better for local models to avoid overloading

Recommendation: Start with a 7B–8B model and increase model size only if quality is insufficient. Set Max Concurrent Pages to 1 or 2 for local services to avoid overloading your Mac.

Previous: AI Services · Next: Custom Services