# Email Processor Learning-based mailbox cleanup using Himalaya (IMAP) + Ollama (local LLM). Classifies emails, learns from your decisions over time, and gradually automates common actions. ## Prerequisites - **uv** — Python package manager, handles venv and dependencies automatically. - **Himalaya** — CLI email client, handles IMAP connection and auth. - **Ollama** — local LLM server. ```bash # Install uv (macOS) brew install uv # Install himalaya (macOS) brew install himalaya # Configure himalaya for your IMAP account (first time only) himalaya account list # should show your account after setup # Install and start Ollama, pull the model brew install ollama ollama pull kamekichi128/qwen3-4b-instruct-2507:latest ``` ## How It Works The system has two phases: a **learning phase** where it builds up knowledge from your decisions, and a **steady state** where it handles most emails automatically. ### Learning Phase (first ~20 decisions) The confidence threshold is automatically raised to 95%. Most emails get queued. 1. **Cron runs `scan`.** For each unseen email, the classifier uses Qwen3's general knowledge (no history yet) to suggest an action. Most come back at 60-80% confidence — below the 95% threshold — so they get saved to `pending_emails.json` with the suggestion attached. A few obvious spam emails might hit 95%+ and get auto-deleted. 2. **You run `review list`.** It prints what's pending: ``` 1. [msg_f1d43ea3] Subject: New jobs matching your profile From: LinkedIn Suggested: delete (82%) 2. [msg_60c56a87] Subject: Your order shipped From: Amazon Suggested: archive (78%) 3. [msg_ebd24205] Subject: Meeting tomorrow at 3pm From: Coworker Suggested: keep (70%) ``` 3. **You act on them.** Either individually or in bulk: ```bash ./email-processor.sh review 1 delete # agree with suggestion ./email-processor.sh review 2 archive # agree with suggestion ./email-processor.sh review accept # accept all suggestions at once ``` Each command executes via himalaya, appends to `decision_history.json`, and marks the pending entry as done. 4. **Next scan is smarter.** The classifier now has few-shot examples in the prompt: ``` History for linkedin.com: delete 2 times --- Past decisions --- From: LinkedIn | Subject: New jobs matching your profile -> delete From: Amazon | Subject: Your package delivered -> archive --- ``` Confidence scores climb. You keep reviewing. History grows. ### Steady State (20+ decisions) The threshold drops to the configured 75%. The classifier has rich context. - **Repeat senders** (LinkedIn, Amazon, Uber) get auto-acted at 85-95% confidence during `scan`. They never touch the pending queue. - **New or ambiguous senders** may fall below 75% and get queued. - **You occasionally run `review list`** to handle stragglers — each decision further improves future classifications. - **`stats` shows your automation rate** climbing: 60%, 70%, 80%+. The pending queue shrinks over time. It's not a backlog — it's an ever-narrowing set of emails the system hasn't learned to handle yet. ## Usage All commands are non-interactive — they take arguments, act, and exit. Compatible with cron/OpenClaw. ```bash # Make the entry script executable (first time) chmod +x email-processor.sh # --- Scan --- ./email-processor.sh scan # classify unseen emails ./email-processor.sh scan --recent 30 # classify last 30 days ./email-processor.sh scan --dry-run # classify only, no changes ./email-processor.sh scan --recent 7 --dry-run # combine both # --- Review --- ./email-processor.sh review list # show pending queue ./email-processor.sh review 1 delete # delete email #1 ./email-processor.sh review msg_f1d43ea3 archive # archive by ID ./email-processor.sh review all delete # delete all pending ./email-processor.sh review accept # accept all suggestions # --- Other --- ./email-processor.sh stats # show decision history ``` Or call Python directly: `python main.py scan --dry-run` ## Actions | Action | Effect | |---|---| | `delete` | Move to Trash (`himalaya message delete`) | | `archive` | Move to Archive folder | | `keep` | Leave unread in inbox (no changes) | | `mark_read` | Add `\Seen` flag, stays in inbox | | `label:` | Move to named folder (created if needed) | ## Auto-Action Criteria Scan auto-acts when the classifier's confidence meets the threshold. During the learning phase (fewer than `bootstrap_min_decisions` total decisions, default 20), a higher threshold of 95% is used automatically. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over. This means on day one, only very obvious emails (spam, clear promotions) get auto-acted. As you review emails and build history, the system gradually handles more on its own. ## Configuration `config.json` — only Ollama and automation settings. IMAP auth is managed by himalaya's own config. ```json { "ollama": { "host": "http://localhost:11434", "model": "kamekichi128/qwen3-4b-instruct-2507:latest" }, "rules": { "max_body_length": 1000 }, "automation": { "confidence_threshold": 75, "bootstrap_min_decisions": 20 } } ``` | Key | Description | |---|---| | `ollama.host` | Ollama server URL. Default `http://localhost:11434`. | | `ollama.model` | Ollama model to use for classification. | | `rules.max_body_length` | Max characters of email body sent to the LLM. Longer bodies are truncated. Keeps prompt size and latency down. | | `automation.confidence_threshold` | Minimum confidence (0-100) for auto-action in steady state. Emails below this get queued for review. Lower = more automation, higher = more manual review. | | `automation.bootstrap_min_decisions` | Number of decisions needed before leaving the learning phase. During the learning phase, the threshold is raised to 95% regardless of `confidence_threshold`. Set to 0 to skip the learning phase entirely. | ## Testing ```bash # 1. Verify himalaya can reach your mailbox himalaya envelope list --page-size 3 # 2. Verify Ollama is running with the model ollama list # should show kamekichi128/qwen3-4b-instruct-2507:latest # 3. Dry run — classify recent emails without touching anything ./email-processor.sh scan --recent 7 --dry-run # 4. Live run — classify and act (auto-act or queue) ./email-processor.sh scan --recent 7 # 5. Check what got queued ./email-processor.sh review list # 6. Act on a queued email to seed decision history ./email-processor.sh review 1 delete # 7. Check that the decision was recorded ./email-processor.sh stats ``` ## File Structure ``` email_processor/ main.py # Entry point — scan/review/stats subcommands classifier.py # LLM prompt builder + response parser decision_store.py # Decision history storage + few-shot retrieval config.json # Ollama + automation settings email-processor.sh # Shell wrapper (activates venv, forwards args) data/ pending_emails.json # Queue of emails awaiting review decision_history.json # Past decisions (few-shot learning data) logs/ YYYY-MM-DD.log # Daily processing logs ``` ## Design Decisions ### Himalaya instead of raw IMAP All IMAP operations go through the `himalaya` CLI via subprocess calls. This means: - No IMAP credentials stored in config.json — himalaya manages its own auth. - No connection management, reconnect logic, or SSL setup in Python. - Each action is a single himalaya command (e.g., `himalaya message delete 42`). The tradeoff is a subprocess spawn per operation, but for email volumes (tens per run, not thousands) this is negligible. ### Non-interactive design Every command takes its full input as arguments, acts, and exits. No `input()` calls, no interactive loops. This makes the system compatible with cron/OpenClaw and composable with other scripts. The pending queue on disk (`pending_emails.json`) is the shared state between scan and review invocations. ### decision_history.json as the "database" `data/decision_history.json` is the only persistent state that matters for learning. It's a flat JSON array — every decision (user or auto) is appended as an entry. The classifier reads the whole file on each email to find relevant few-shot examples via relevance scoring. The pending queue (`pending_emails.json`) is transient — emails pass through it and get marked "done". Logs are for debugging. The decision history is what the system learns from. A flat JSON file works fine for hundreds or low thousands of decisions. SQLite would make sense if the history grows past ~10k entries and the linear scan becomes noticeable, or if concurrent writes from multiple processes become necessary. Neither applies at current scale. ### Few-shot learning via relevance scoring Rather than sending the entire decision history to the LLM, `decision_store.get_relevant_examples()` scores each past decision against the current email using three signals: - Exact sender domain match (+3 points) - Recipient address match (+2 points) - Subject keyword overlap (+1 per shared word, stop-words excluded) The top 5 most relevant examples are injected into the prompt as few-shot demonstrations. This keeps the prompt small while giving the model the most useful context. ### Conservative auto-action Auto-action uses a single confidence threshold with an adaptive learning phase. When the decision history has fewer than `bootstrap_min_decisions` (default 20) entries, the threshold is raised to 95% — only very obvious classifications get auto-acted. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over. This lets the system start working from day one while being cautious until it has enough examples to learn from. ### `keep` means unread The `keep` action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from `mark_read`, which dismisses low-priority emails without moving them. During scan, queued emails are marked as read to prevent re-processing, but that's a scan-level concern separate from the `keep` action itself. ### Fail-safe classification If the LLM call fails (Ollama down, model not loaded, timeout), the classifier returns `action="keep"` with `confidence=0`. This guarantees the email gets queued for manual review rather than being auto-acted upon. The system never auto-trashes an email it couldn't classify.