Email Processor
Learning-based mailbox cleanup using Himalaya (IMAP) + Ollama (local LLM). Classifies emails, learns from your decisions over time, and gradually automates common actions.
Prerequisites
- Himalaya — CLI email client, handles IMAP connection and auth.
- Ollama — local LLM server.
- Python 3.8+
# Install himalaya (macOS)
brew install himalaya
# Configure himalaya for your IMAP account (first time only)
himalaya account list # should show your account after setup
# Install and start Ollama, pull the model
brew install ollama
ollama pull kamekichi128/qwen3-4b-instruct-2507:latest
# Set up Python venv and install dependencies
cd scripts/email_processor
python3 -m venv venv
source venv/bin/activate
pip install ollama
How It Works
The system has two phases: a learning phase where it builds up knowledge from your decisions, and a steady state where it handles most emails automatically.
Learning Phase (first ~20 decisions)
The confidence threshold is automatically raised to 95%. Most emails get queued.
-
Cron runs
scan. For each unseen email, the classifier uses Qwen3's general knowledge (no history yet) to suggest an action. Most come back at 60-80% confidence — below the 95% threshold — so they get saved topending_emails.jsonwith the suggestion attached. A few obvious spam emails might hit 95%+ and get auto-deleted. -
You run
review list. It prints what's pending:1. [msg_f1d43ea3] Subject: New jobs matching your profile From: LinkedIn Suggested: delete (82%) 2. [msg_60c56a87] Subject: Your order shipped From: Amazon Suggested: archive (78%) 3. [msg_ebd24205] Subject: Meeting tomorrow at 3pm From: Coworker Suggested: keep (70%) -
You act on them. Either individually or in bulk:
./email-processor.sh review 1 delete # agree with suggestion ./email-processor.sh review 2 archive # agree with suggestion ./email-processor.sh review accept # accept all suggestions at onceEach command executes via himalaya, appends to
decision_history.json, and marks the pending entry as done. -
Next scan is smarter. The classifier now has few-shot examples in the prompt:
History for linkedin.com: delete 2 times --- Past decisions --- From: LinkedIn | Subject: New jobs matching your profile -> delete From: Amazon | Subject: Your package delivered -> archive ---Confidence scores climb. You keep reviewing. History grows.
Steady State (20+ decisions)
The threshold drops to the configured 75%. The classifier has rich context.
- Repeat senders (LinkedIn, Amazon, Uber) get auto-acted at 85-95% confidence during
scan. They never touch the pending queue. - New or ambiguous senders may fall below 75% and get queued.
- You occasionally run
review listto handle stragglers — each decision further improves future classifications. statsshows your automation rate climbing: 60%, 70%, 80%+.
The pending queue shrinks over time. It's not a backlog — it's an ever-narrowing set of emails the system hasn't learned to handle yet.
Usage
All commands are non-interactive — they take arguments, act, and exit. Compatible with cron/OpenClaw.
# Make the entry script executable (first time)
chmod +x email-processor.sh
# --- Scan ---
./email-processor.sh scan # classify unseen emails
./email-processor.sh scan --recent 30 # classify last 30 days
./email-processor.sh scan --dry-run # classify only, no changes
./email-processor.sh scan --recent 7 --dry-run # combine both
# --- Review ---
./email-processor.sh review list # show pending queue
./email-processor.sh review 1 delete # delete email #1
./email-processor.sh review msg_f1d43ea3 archive # archive by ID
./email-processor.sh review all delete # delete all pending
./email-processor.sh review accept # accept all suggestions
# --- Other ---
./email-processor.sh stats # show decision history
./email-processor.sh migrate # import old decisions
Or call Python directly: python main.py scan --dry-run
Actions
| Action | Effect |
|---|---|
delete |
Move to Trash (himalaya message delete) |
archive |
Move to Archive folder |
keep |
Leave unread in inbox (no changes) |
mark_read |
Add \Seen flag, stays in inbox |
label:<name> |
Move to named folder (created if needed) |
Auto-Action Criteria
Scan auto-acts when the classifier's confidence meets the threshold. During the learning phase (fewer than bootstrap_min_decisions total decisions, default 20), a higher threshold of 95% is used automatically. Once enough history accumulates, the configured confidence_threshold (default 75%) takes over.
This means on day one, only very obvious emails (spam, clear promotions) get auto-acted. As you review emails and build history, the system gradually handles more on its own.
Configuration
config.json — only Ollama and automation settings. IMAP auth is managed by himalaya's own config.
{
"ollama": {
"host": "http://localhost:11434",
"model": "kamekichi128/qwen3-4b-instruct-2507:latest"
},
"rules": {
"max_body_length": 1000
},
"automation": {
"confidence_threshold": 75,
"bootstrap_min_decisions": 20
}
}
| Key | Description |
|---|---|
ollama.host |
Ollama server URL. Default http://localhost:11434. |
ollama.model |
Ollama model to use for classification. |
rules.max_body_length |
Max characters of email body sent to the LLM. Longer bodies are truncated. Keeps prompt size and latency down. |
automation.confidence_threshold |
Minimum confidence (0-100) for auto-action in steady state. Emails below this get queued for review. Lower = more automation, higher = more manual review. |
automation.bootstrap_min_decisions |
Number of decisions needed before leaving the learning phase. During the learning phase, the threshold is raised to 95% regardless of confidence_threshold. Set to 0 to skip the learning phase entirely. |
Testing
# 1. Verify himalaya can reach your mailbox
himalaya envelope list --page-size 3
# 2. Verify Ollama is running with the model
ollama list # should show kamekichi128/qwen3-4b-instruct-2507:latest
# 3. Dry run — classify recent emails without touching anything
./email-processor.sh scan --recent 7 --dry-run
# 4. Live run — classify and act (auto-act or queue)
./email-processor.sh scan --recent 7
# 5. Check what got queued
./email-processor.sh review list
# 6. Act on a queued email to seed decision history
./email-processor.sh review 1 delete
# 7. Check that the decision was recorded
./email-processor.sh stats
File Structure
email_processor/
main.py # Entry point — scan/review/stats/migrate subcommands
classifier.py # LLM prompt builder + response parser
decision_store.py # Decision history storage + few-shot retrieval
config.json # Ollama + automation settings
email-processor.sh # Shell wrapper (activates venv, forwards args)
data/
pending_emails.json # Queue of emails awaiting review
decision_history.json # Past decisions (few-shot learning data)
logs/
YYYY-MM-DD.log # Daily processing logs
Design Decisions
Himalaya instead of raw IMAP
All IMAP operations go through the himalaya CLI via subprocess calls. This means:
- No IMAP credentials stored in config.json — himalaya manages its own auth.
- No connection management, reconnect logic, or SSL setup in Python.
- Each action is a single himalaya command (e.g.,
himalaya message delete 42).
The tradeoff is a subprocess spawn per operation, but for email volumes (tens per run, not thousands) this is negligible.
Non-interactive design
Every command takes its full input as arguments, acts, and exits. No input() calls, no interactive loops. This makes the system compatible with cron/OpenClaw and composable with other scripts. The pending queue on disk (pending_emails.json) is the shared state between scan and review invocations.
decision_history.json as the "database"
data/decision_history.json is the only persistent state that matters for learning. It's a flat JSON array — every decision (user or auto) is appended as an entry. The classifier reads the whole file on each email to find relevant few-shot examples via relevance scoring.
The pending queue (pending_emails.json) is transient — emails pass through it and get marked "done". Logs are for debugging. The decision history is what the system learns from.
A flat JSON file works fine for hundreds or low thousands of decisions. SQLite would make sense if the history grows past ~10k entries and the linear scan becomes noticeable, or if concurrent writes from multiple processes become necessary. Neither applies at current scale.
Few-shot learning via relevance scoring
Rather than sending the entire decision history to the LLM, decision_store.get_relevant_examples() scores each past decision against the current email using three signals:
- Exact sender domain match (+3 points)
- Recipient address match (+2 points)
- Subject keyword overlap (+1 per shared word, stop-words excluded)
The top 5 most relevant examples are injected into the prompt as few-shot demonstrations. This keeps the prompt small while giving the model the most useful context.
Conservative auto-action
Auto-action uses a single confidence threshold with an adaptive learning phase. When the decision history has fewer than bootstrap_min_decisions (default 20) entries, the threshold is raised to 95% — only very obvious classifications get auto-acted. Once enough history accumulates, the configured confidence_threshold (default 75%) takes over. This lets the system start working from day one while being cautious until it has enough examples to learn from.
keep means unread
The keep action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from mark_read, which dismisses low-priority emails without moving them. During scan, queued emails are marked as read to prevent re-processing, but that's a scan-level concern separate from the keep action itself.
Fail-safe classification
If the LLM call fails (Ollama down, model not loaded, timeout), the classifier returns action="keep" with confidence=0. This guarantees the email gets queued for manual review rather than being auto-acted upon. The system never auto-trashes an email it couldn't classify.