Learning-based mailbox cleanup using Himalaya (IMAP) + Ollama (local LLM). Classifies emails, learns from your decisions over time, and gradually automates common actions.

Prerequisites

Himalaya — CLI email client, handles IMAP connection and auth.
Ollama — local LLM server.
Python 3.8+

# Install himalaya (macOS)
brew install himalaya

# Configure himalaya for your IMAP account (first time only)
himalaya account list  # should show your account after setup

# Install and start Ollama, pull the model
brew install ollama
ollama pull kamekichi128/qwen3-4b-instruct-2507:latest

# Set up Python venv and install dependencies
cd scripts/email_processor
python3 -m venv venv
source venv/bin/activate
pip install ollama

How It Works

The system has two phases: a learning phase where it builds up knowledge from your decisions, and a steady state where it handles most emails automatically.

Learning Phase (first ~20 decisions)

The confidence threshold is automatically raised to 95%. Most emails get queued.

Cron runs scan. For each unseen email, the classifier uses Qwen3's general knowledge (no history yet) to suggest an action. Most come back at 60-80% confidence — below the 95% threshold — so they get saved to pending_emails.json with the suggestion attached. A few obvious spam emails might hit 95%+ and get auto-deleted.

You run review list. It prints what's pending:

  1. [msg_f1d43ea3]  Subject: New jobs matching your profile
     From: LinkedIn    Suggested: delete (82%)
  2. [msg_60c56a87]  Subject: Your order shipped
     From: Amazon      Suggested: archive (78%)
  3. [msg_ebd24205]  Subject: Meeting tomorrow at 3pm
     From: Coworker    Suggested: keep (70%)

You act on them. Either individually or in bulk:

./email-processor.sh review 1 delete     # agree with suggestion
./email-processor.sh review 2 archive    # agree with suggestion
./email-processor.sh review accept       # accept all suggestions at once

Each command executes via himalaya, appends to decision_history.json, and marks the pending entry as done.

Next scan is smarter. The classifier now has few-shot examples in the prompt:

History for linkedin.com: delete 2 times
--- Past decisions ---
From: LinkedIn | Subject: New jobs matching your profile -> delete
From: Amazon | Subject: Your package delivered -> archive
---

Confidence scores climb. You keep reviewing. History grows.

Steady State (20+ decisions)

The threshold drops to the configured 75%. The classifier has rich context.

Repeat senders (LinkedIn, Amazon, Uber) get auto-acted at 85-95% confidence during scan. They never touch the pending queue.
New or ambiguous senders may fall below 75% and get queued.
You occasionally run review list to handle stragglers — each decision further improves future classifications.
stats shows your automation rate climbing: 60%, 70%, 80%+.

The pending queue shrinks over time. It's not a backlog — it's an ever-narrowing set of emails the system hasn't learned to handle yet.

Usage

All commands are non-interactive — they take arguments, act, and exit. Compatible with cron/OpenClaw.

# Make the entry script executable (first time)
chmod +x email-processor.sh

# --- Scan ---
./email-processor.sh scan                         # classify unseen emails
./email-processor.sh scan --recent 30             # classify last 30 days
./email-processor.sh scan --dry-run               # classify only, no changes
./email-processor.sh scan --recent 7 --dry-run    # combine both

# --- Review ---
./email-processor.sh review list                  # show pending queue
./email-processor.sh review 1 delete              # delete email #1
./email-processor.sh review msg_f1d43ea3 archive  # archive by ID
./email-processor.sh review all delete            # delete all pending
./email-processor.sh review accept                # accept all suggestions

# --- Other ---
./email-processor.sh stats                        # show decision history
./email-processor.sh migrate                      # import old decisions

Or call Python directly: python main.py scan --dry-run

Actions

Action	Effect
`delete`	Move to Trash (`himalaya message delete`)
`archive`	Move to Archive folder
`keep`	Leave unread in inbox (no changes)
`mark_read`	Add `\Seen` flag, stays in inbox
`label:<name>`	Move to named folder (created if needed)

Auto-Action Criteria

Scan auto-acts when the classifier's confidence meets the threshold. During the learning phase (fewer than bootstrap_min_decisions total decisions, default 20), a higher threshold of 95% is used automatically. Once enough history accumulates, the configured confidence_threshold (default 75%) takes over.

This means on day one, only very obvious emails (spam, clear promotions) get auto-acted. As you review emails and build history, the system gradually handles more on its own.

Configuration

config.json — only Ollama and automation settings. IMAP auth is managed by himalaya's own config.

{
  "ollama": {
    "host": "http://localhost:11434",
    "model": "kamekichi128/qwen3-4b-instruct-2507:latest"
  },
  "rules": {
    "max_body_length": 1000
  },
  "automation": {
    "confidence_threshold": 75,
    "bootstrap_min_decisions": 20
  }
}

Key	Description
`ollama.host`	Ollama server URL. Default `http://localhost:11434`.
`ollama.model`	Ollama model to use for classification.
`rules.max_body_length`	Max characters of email body sent to the LLM. Longer bodies are truncated. Keeps prompt size and latency down.
`automation.confidence_threshold`	Minimum confidence (0-100) for auto-action in steady state. Emails below this get queued for review. Lower = more automation, higher = more manual review.
`automation.bootstrap_min_decisions`	Number of decisions needed before leaving the learning phase. During the learning phase, the threshold is raised to 95% regardless of `confidence_threshold`. Set to 0 to skip the learning phase entirely.

Testing

# 1. Verify himalaya can reach your mailbox
himalaya envelope list --page-size 3

# 2. Verify Ollama is running with the model
ollama list  # should show kamekichi128/qwen3-4b-instruct-2507:latest

# 3. Dry run — classify recent emails without touching anything
./email-processor.sh scan --recent 7 --dry-run

# 4. Live run — classify and act (auto-act or queue)
./email-processor.sh scan --recent 7

# 5. Check what got queued
./email-processor.sh review list

# 6. Act on a queued email to seed decision history
./email-processor.sh review 1 delete

# 7. Check that the decision was recorded
./email-processor.sh stats

File Structure

email_processor/
  main.py              # Entry point — scan/review/stats/migrate subcommands
  classifier.py        # LLM prompt builder + response parser
  decision_store.py    # Decision history storage + few-shot retrieval
  config.json          # Ollama + automation settings
  email-processor.sh   # Shell wrapper (activates venv, forwards args)
  data/
    pending_emails.json    # Queue of emails awaiting review
    decision_history.json  # Past decisions (few-shot learning data)
  logs/
    YYYY-MM-DD.log         # Daily processing logs

Design Decisions

Himalaya instead of raw IMAP

All IMAP operations go through the himalaya CLI via subprocess calls. This means:

No IMAP credentials stored in config.json — himalaya manages its own auth.
No connection management, reconnect logic, or SSL setup in Python.
Each action is a single himalaya command (e.g., himalaya message delete 42).

The tradeoff is a subprocess spawn per operation, but for email volumes (tens per run, not thousands) this is negligible.

Non-interactive design

Every command takes its full input as arguments, acts, and exits. No input() calls, no interactive loops. This makes the system compatible with cron/OpenClaw and composable with other scripts. The pending queue on disk (pending_emails.json) is the shared state between scan and review invocations.

decision_history.json as the "database"

data/decision_history.json is the only persistent state that matters for learning. It's a flat JSON array — every decision (user or auto) is appended as an entry. The classifier reads the whole file on each email to find relevant few-shot examples via relevance scoring.

The pending queue (pending_emails.json) is transient — emails pass through it and get marked "done". Logs are for debugging. The decision history is what the system learns from.

A flat JSON file works fine for hundreds or low thousands of decisions. SQLite would make sense if the history grows past ~10k entries and the linear scan becomes noticeable, or if concurrent writes from multiple processes become necessary. Neither applies at current scale.

Few-shot learning via relevance scoring

Rather than sending the entire decision history to the LLM, decision_store.get_relevant_examples() scores each past decision against the current email using three signals:

Exact sender domain match (+3 points)
Recipient address match (+2 points)
Subject keyword overlap (+1 per shared word, stop-words excluded)

The top 5 most relevant examples are injected into the prompt as few-shot demonstrations. This keeps the prompt small while giving the model the most useful context.

Conservative auto-action

Auto-action uses a single confidence threshold with an adaptive learning phase. When the decision history has fewer than bootstrap_min_decisions (default 20) entries, the threshold is raised to 95% — only very obvious classifications get auto-acted. Once enough history accumulates, the configured confidence_threshold (default 75%) takes over. This lets the system start working from day one while being cautious until it has enough examples to learn from.

`keep` means unread

The keep action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from mark_read, which dismisses low-priority emails without moving them. During scan, queued emails are marked as read to prevent re-processing, but that's a scan-level concern separate from the keep action itself.

Fail-safe classification

If the LLM call fails (Ollama down, model not loaded, timeout), the classifier returns action="keep" with confidence=0. This guarantees the email gets queued for manual review rather than being auto-acted upon. The system never auto-trashes an email it couldn't classify.

README.md

Email Processor