Files
youlu-openclaw-workspace/scripts/email_processor/README.md
2026-02-26 21:05:27 -08:00

233 lines
10 KiB
Markdown

# Email Processor
Learning-based mailbox cleanup using Himalaya (IMAP) + Ollama (local LLM). Classifies emails, learns from your decisions over time, and gradually automates common actions.
## Prerequisites
- **Himalaya** — CLI email client, handles IMAP connection and auth.
- **Ollama** — local LLM server.
- **Python 3.8+**
```bash
# Install himalaya (macOS)
brew install himalaya
# Configure himalaya for your IMAP account (first time only)
himalaya account list # should show your account after setup
# Install and start Ollama, pull the model
brew install ollama
ollama pull kamekichi128/qwen3-4b-instruct-2507:latest
# Set up Python venv and install dependencies
cd scripts/email_processor
python3 -m venv venv
source venv/bin/activate
pip install ollama
```
## How It Works
The system has two phases: a **learning phase** where it builds up knowledge from your decisions, and a **steady state** where it handles most emails automatically.
### Learning Phase (first ~20 decisions)
The confidence threshold is automatically raised to 95%. Most emails get queued.
1. **Cron runs `scan`.** For each unseen email, the classifier uses Qwen3's general knowledge (no history yet) to suggest an action. Most come back at 60-80% confidence — below the 95% threshold — so they get saved to `pending_emails.json` with the suggestion attached. A few obvious spam emails might hit 95%+ and get auto-deleted.
2. **You run `review list`.** It prints what's pending:
```
1. [msg_f1d43ea3] Subject: New jobs matching your profile
From: LinkedIn Suggested: delete (82%)
2. [msg_60c56a87] Subject: Your order shipped
From: Amazon Suggested: archive (78%)
3. [msg_ebd24205] Subject: Meeting tomorrow at 3pm
From: Coworker Suggested: keep (70%)
```
3. **You act on them.** Either individually or in bulk:
```bash
./email-processor.sh review 1 delete # agree with suggestion
./email-processor.sh review 2 archive # agree with suggestion
./email-processor.sh review accept # accept all suggestions at once
```
Each command executes via himalaya, appends to `decision_history.json`, and marks the pending entry as done.
4. **Next scan is smarter.** The classifier now has few-shot examples in the prompt:
```
History for linkedin.com: delete 2 times
--- Past decisions ---
From: LinkedIn | Subject: New jobs matching your profile -> delete
From: Amazon | Subject: Your package delivered -> archive
---
```
Confidence scores climb. You keep reviewing. History grows.
### Steady State (20+ decisions)
The threshold drops to the configured 75%. The classifier has rich context.
- **Repeat senders** (LinkedIn, Amazon, Uber) get auto-acted at 85-95% confidence during `scan`. They never touch the pending queue.
- **New or ambiguous senders** may fall below 75% and get queued.
- **You occasionally run `review list`** to handle stragglers — each decision further improves future classifications.
- **`stats` shows your automation rate** climbing: 60%, 70%, 80%+.
The pending queue shrinks over time. It's not a backlog — it's an ever-narrowing set of emails the system hasn't learned to handle yet.
## Usage
All commands are non-interactive — they take arguments, act, and exit. Compatible with cron/OpenClaw.
```bash
# Make the entry script executable (first time)
chmod +x email-processor.sh
# --- Scan ---
./email-processor.sh scan # classify unseen emails
./email-processor.sh scan --recent 30 # classify last 30 days
./email-processor.sh scan --dry-run # classify only, no changes
./email-processor.sh scan --recent 7 --dry-run # combine both
# --- Review ---
./email-processor.sh review list # show pending queue
./email-processor.sh review 1 delete # delete email #1
./email-processor.sh review msg_f1d43ea3 archive # archive by ID
./email-processor.sh review all delete # delete all pending
./email-processor.sh review accept # accept all suggestions
# --- Other ---
./email-processor.sh stats # show decision history
```
Or call Python directly: `python main.py scan --dry-run`
## Actions
| Action | Effect |
|---|---|
| `delete` | Move to Trash (`himalaya message delete`) |
| `archive` | Move to Archive folder |
| `keep` | Leave unread in inbox (no changes) |
| `mark_read` | Add `\Seen` flag, stays in inbox |
| `label:<name>` | Move to named folder (created if needed) |
## Auto-Action Criteria
Scan auto-acts when the classifier's confidence meets the threshold. During the learning phase (fewer than `bootstrap_min_decisions` total decisions, default 20), a higher threshold of 95% is used automatically. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over.
This means on day one, only very obvious emails (spam, clear promotions) get auto-acted. As you review emails and build history, the system gradually handles more on its own.
## Configuration
`config.json` — only Ollama and automation settings. IMAP auth is managed by himalaya's own config.
```json
{
"ollama": {
"host": "http://localhost:11434",
"model": "kamekichi128/qwen3-4b-instruct-2507:latest"
},
"rules": {
"max_body_length": 1000
},
"automation": {
"confidence_threshold": 75,
"bootstrap_min_decisions": 20
}
}
```
| Key | Description |
|---|---|
| `ollama.host` | Ollama server URL. Default `http://localhost:11434`. |
| `ollama.model` | Ollama model to use for classification. |
| `rules.max_body_length` | Max characters of email body sent to the LLM. Longer bodies are truncated. Keeps prompt size and latency down. |
| `automation.confidence_threshold` | Minimum confidence (0-100) for auto-action in steady state. Emails below this get queued for review. Lower = more automation, higher = more manual review. |
| `automation.bootstrap_min_decisions` | Number of decisions needed before leaving the learning phase. During the learning phase, the threshold is raised to 95% regardless of `confidence_threshold`. Set to 0 to skip the learning phase entirely. |
## Testing
```bash
# 1. Verify himalaya can reach your mailbox
himalaya envelope list --page-size 3
# 2. Verify Ollama is running with the model
ollama list # should show kamekichi128/qwen3-4b-instruct-2507:latest
# 3. Dry run — classify recent emails without touching anything
./email-processor.sh scan --recent 7 --dry-run
# 4. Live run — classify and act (auto-act or queue)
./email-processor.sh scan --recent 7
# 5. Check what got queued
./email-processor.sh review list
# 6. Act on a queued email to seed decision history
./email-processor.sh review 1 delete
# 7. Check that the decision was recorded
./email-processor.sh stats
```
## File Structure
```
email_processor/
main.py # Entry point — scan/review/stats subcommands
classifier.py # LLM prompt builder + response parser
decision_store.py # Decision history storage + few-shot retrieval
config.json # Ollama + automation settings
email-processor.sh # Shell wrapper (activates venv, forwards args)
data/
pending_emails.json # Queue of emails awaiting review
decision_history.json # Past decisions (few-shot learning data)
logs/
YYYY-MM-DD.log # Daily processing logs
```
## Design Decisions
### Himalaya instead of raw IMAP
All IMAP operations go through the `himalaya` CLI via subprocess calls. This means:
- No IMAP credentials stored in config.json — himalaya manages its own auth.
- No connection management, reconnect logic, or SSL setup in Python.
- Each action is a single himalaya command (e.g., `himalaya message delete 42`).
The tradeoff is a subprocess spawn per operation, but for email volumes (tens per run, not thousands) this is negligible.
### Non-interactive design
Every command takes its full input as arguments, acts, and exits. No `input()` calls, no interactive loops. This makes the system compatible with cron/OpenClaw and composable with other scripts. The pending queue on disk (`pending_emails.json`) is the shared state between scan and review invocations.
### decision_history.json as the "database"
`data/decision_history.json` is the only persistent state that matters for learning. It's a flat JSON array — every decision (user or auto) is appended as an entry. The classifier reads the whole file on each email to find relevant few-shot examples via relevance scoring.
The pending queue (`pending_emails.json`) is transient — emails pass through it and get marked "done". Logs are for debugging. The decision history is what the system learns from.
A flat JSON file works fine for hundreds or low thousands of decisions. SQLite would make sense if the history grows past ~10k entries and the linear scan becomes noticeable, or if concurrent writes from multiple processes become necessary. Neither applies at current scale.
### Few-shot learning via relevance scoring
Rather than sending the entire decision history to the LLM, `decision_store.get_relevant_examples()` scores each past decision against the current email using three signals:
- Exact sender domain match (+3 points)
- Recipient address match (+2 points)
- Subject keyword overlap (+1 per shared word, stop-words excluded)
The top 5 most relevant examples are injected into the prompt as few-shot demonstrations. This keeps the prompt small while giving the model the most useful context.
### Conservative auto-action
Auto-action uses a single confidence threshold with an adaptive learning phase. When the decision history has fewer than `bootstrap_min_decisions` (default 20) entries, the threshold is raised to 95% — only very obvious classifications get auto-acted. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over. This lets the system start working from day one while being cautious until it has enough examples to learn from.
### `keep` means unread
The `keep` action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from `mark_read`, which dismisses low-priority emails without moving them. During scan, queued emails are marked as read to prevent re-processing, but that's a scan-level concern separate from the `keep` action itself.
### Fail-safe classification
If the LLM call fails (Ollama down, model not loaded, timeout), the classifier returns `action="keep"` with `confidence=0`. This guarantees the email gets queued for manual review rather than being auto-acted upon. The system never auto-trashes an email it couldn't classify.