Files

2.9 KiB

Model Guide

This reference covers common Ollama models and selection guidance.

Chat/General Models

Model Params Best For Notes
qwen3:4b 4B Fast tasks, quick answers Thinking-enabled, very fast
llama3.1:8b 8B General chat, reasoning Good all-rounder
gemma3:12b 12.2B Creative, design tasks Google model, good quality
phi4-reasoning:latest 14.7B Complex reasoning Thinking-enabled
mistral-small3.1:latest 24B Technical tasks May need CPU offload
deepseek-r1:8b 8.2B Deep reasoning Thinking-enabled, chain-of-thought

Coding Models

Model Params Best For Notes
qwen2.5-coder:7b 7.6B Code generation, review Best local coding model
codellama:7b 7B Code completion Meta's code model
deepseek-coder:6.7b 6.7B Code tasks Good alternative

Embedding Models

Model Params Dimensions Notes
bge-m3:latest 567M 1024 Multilingual, good quality
nomic-embed-text 137M 768 Fast, English-focused
mxbai-embed-large 335M 1024 High quality embeddings

Model Selection Guide

By Task Type

  • Quick questions: qwen3:4b (fastest)
  • General chat: llama3.1:8b
  • Coding: qwen2.5-coder:7b
  • Complex reasoning: phi4-reasoning or deepseek-r1:8b
  • Creative/design: gemma3:12b
  • Embeddings: bge-m3:latest

By Speed vs Quality

Fastest ←──────────────────────────────→ Best Quality
qwen3:4b → llama3.1:8b → gemma3:12b → mistral-small3.1

Tool Use Support

Models with good tool/function calling support:

  • qwen2.5-coder:7b - Excellent
  • qwen3:4b - Good
  • llama3.1:8b - Basic
  • mistral models - Good
  • ⚠️ Others - May not support tools natively

OpenClaw Integration

To use Ollama models in OpenClaw sub-agents, use these model paths:

ollama/qwen3:4b
ollama/llama3.1:8b
ollama/qwen2.5-coder:7b
ollama/gemma3:12b
ollama/mistral-small3.1:latest
ollama/phi4-reasoning:latest
ollama/deepseek-r1:8b

Auth Profile Required

OpenClaw requires an auth profile even for Ollama (no actual auth needed). Add to auth-profiles.json:

"ollama:default": {
  "type": "api_key",
  "provider": "ollama",
  "key": "ollama"
}

Hardware Considerations

  • 8GB VRAM: Can run models up to ~13B comfortably
  • 16GB VRAM: Can run most models including 24B+
  • CPU offload: Ollama automatically offloads to CPU/RAM for larger models
  • Larger models may be slower due to partial CPU inference

Installing Models

# Pull a model
ollama pull llama3.1:8b

# Or via the skill script
python3 scripts/ollama.py pull llama3.1:8b

# List installed models
python3 scripts/ollama.py list