email processor
This commit is contained in:
3
scripts/email_processor/.gitignore
vendored
Normal file
3
scripts/email_processor/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
venv
|
||||
233
scripts/email_processor/README.md
Normal file
233
scripts/email_processor/README.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# Email Processor
|
||||
|
||||
Learning-based mailbox cleanup using Himalaya (IMAP) + Ollama (local LLM). Classifies emails, learns from your decisions over time, and gradually automates common actions.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Himalaya** — CLI email client, handles IMAP connection and auth.
|
||||
- **Ollama** — local LLM server.
|
||||
- **Python 3.8+**
|
||||
|
||||
```bash
|
||||
# Install himalaya (macOS)
|
||||
brew install himalaya
|
||||
|
||||
# Configure himalaya for your IMAP account (first time only)
|
||||
himalaya account list # should show your account after setup
|
||||
|
||||
# Install and start Ollama, pull the model
|
||||
brew install ollama
|
||||
ollama pull kamekichi128/qwen3-4b-instruct-2507:latest
|
||||
|
||||
# Set up Python venv and install dependencies
|
||||
cd scripts/email_processor
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install ollama
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
The system has two phases: a **learning phase** where it builds up knowledge from your decisions, and a **steady state** where it handles most emails automatically.
|
||||
|
||||
### Learning Phase (first ~20 decisions)
|
||||
|
||||
The confidence threshold is automatically raised to 95%. Most emails get queued.
|
||||
|
||||
1. **Cron runs `scan`.** For each unseen email, the classifier uses Qwen3's general knowledge (no history yet) to suggest an action. Most come back at 60-80% confidence — below the 95% threshold — so they get saved to `pending_emails.json` with the suggestion attached. A few obvious spam emails might hit 95%+ and get auto-deleted.
|
||||
|
||||
2. **You run `review list`.** It prints what's pending:
|
||||
```
|
||||
1. [msg_f1d43ea3] Subject: New jobs matching your profile
|
||||
From: LinkedIn Suggested: delete (82%)
|
||||
2. [msg_60c56a87] Subject: Your order shipped
|
||||
From: Amazon Suggested: archive (78%)
|
||||
3. [msg_ebd24205] Subject: Meeting tomorrow at 3pm
|
||||
From: Coworker Suggested: keep (70%)
|
||||
```
|
||||
|
||||
3. **You act on them.** Either individually or in bulk:
|
||||
```bash
|
||||
./email-processor.sh review 1 delete # agree with suggestion
|
||||
./email-processor.sh review 2 archive # agree with suggestion
|
||||
./email-processor.sh review accept # accept all suggestions at once
|
||||
```
|
||||
Each command executes via himalaya, appends to `decision_history.json`, and marks the pending entry as done.
|
||||
|
||||
4. **Next scan is smarter.** The classifier now has few-shot examples in the prompt:
|
||||
```
|
||||
History for linkedin.com: delete 2 times
|
||||
--- Past decisions ---
|
||||
From: LinkedIn | Subject: New jobs matching your profile -> delete
|
||||
From: Amazon | Subject: Your package delivered -> archive
|
||||
---
|
||||
```
|
||||
Confidence scores climb. You keep reviewing. History grows.
|
||||
|
||||
### Steady State (20+ decisions)
|
||||
|
||||
The threshold drops to the configured 75%. The classifier has rich context.
|
||||
|
||||
- **Repeat senders** (LinkedIn, Amazon, Uber) get auto-acted at 85-95% confidence during `scan`. They never touch the pending queue.
|
||||
- **New or ambiguous senders** may fall below 75% and get queued.
|
||||
- **You occasionally run `review list`** to handle stragglers — each decision further improves future classifications.
|
||||
- **`stats` shows your automation rate** climbing: 60%, 70%, 80%+.
|
||||
|
||||
The pending queue shrinks over time. It's not a backlog — it's an ever-narrowing set of emails the system hasn't learned to handle yet.
|
||||
|
||||
## Usage
|
||||
|
||||
All commands are non-interactive — they take arguments, act, and exit. Compatible with cron/OpenClaw.
|
||||
|
||||
```bash
|
||||
# Make the entry script executable (first time)
|
||||
chmod +x email-processor.sh
|
||||
|
||||
# --- Scan ---
|
||||
./email-processor.sh scan # classify unseen emails
|
||||
./email-processor.sh scan --recent 30 # classify last 30 days
|
||||
./email-processor.sh scan --dry-run # classify only, no changes
|
||||
./email-processor.sh scan --recent 7 --dry-run # combine both
|
||||
|
||||
# --- Review ---
|
||||
./email-processor.sh review list # show pending queue
|
||||
./email-processor.sh review 1 delete # delete email #1
|
||||
./email-processor.sh review msg_f1d43ea3 archive # archive by ID
|
||||
./email-processor.sh review all delete # delete all pending
|
||||
./email-processor.sh review accept # accept all suggestions
|
||||
|
||||
# --- Other ---
|
||||
./email-processor.sh stats # show decision history
|
||||
./email-processor.sh migrate # import old decisions
|
||||
```
|
||||
|
||||
Or call Python directly: `python main.py scan --dry-run`
|
||||
|
||||
## Actions
|
||||
|
||||
| Action | Effect |
|
||||
|---|---|
|
||||
| `delete` | Move to Trash (`himalaya message delete`) |
|
||||
| `archive` | Move to Archive folder |
|
||||
| `keep` | Leave unread in inbox (no changes) |
|
||||
| `mark_read` | Add `\Seen` flag, stays in inbox |
|
||||
| `label:<name>` | Move to named folder (created if needed) |
|
||||
|
||||
## Auto-Action Criteria
|
||||
|
||||
Scan auto-acts when the classifier's confidence meets the threshold. During the learning phase (fewer than `bootstrap_min_decisions` total decisions, default 20), a higher threshold of 95% is used automatically. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over.
|
||||
|
||||
This means on day one, only very obvious emails (spam, clear promotions) get auto-acted. As you review emails and build history, the system gradually handles more on its own.
|
||||
|
||||
## Configuration
|
||||
|
||||
`config.json` — only Ollama and automation settings. IMAP auth is managed by himalaya's own config.
|
||||
|
||||
```json
|
||||
{
|
||||
"ollama": {
|
||||
"host": "http://localhost:11434",
|
||||
"model": "kamekichi128/qwen3-4b-instruct-2507:latest"
|
||||
},
|
||||
"rules": {
|
||||
"max_body_length": 1000
|
||||
},
|
||||
"automation": {
|
||||
"confidence_threshold": 75,
|
||||
"bootstrap_min_decisions": 20
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Key | Description |
|
||||
|---|---|
|
||||
| `ollama.host` | Ollama server URL. Default `http://localhost:11434`. |
|
||||
| `ollama.model` | Ollama model to use for classification. |
|
||||
| `rules.max_body_length` | Max characters of email body sent to the LLM. Longer bodies are truncated. Keeps prompt size and latency down. |
|
||||
| `automation.confidence_threshold` | Minimum confidence (0-100) for auto-action in steady state. Emails below this get queued for review. Lower = more automation, higher = more manual review. |
|
||||
| `automation.bootstrap_min_decisions` | Number of decisions needed before leaving the learning phase. During the learning phase, the threshold is raised to 95% regardless of `confidence_threshold`. Set to 0 to skip the learning phase entirely. |
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# 1. Verify himalaya can reach your mailbox
|
||||
himalaya envelope list --page-size 3
|
||||
|
||||
# 2. Verify Ollama is running with the model
|
||||
ollama list # should show kamekichi128/qwen3-4b-instruct-2507:latest
|
||||
|
||||
# 3. Dry run — classify recent emails without touching anything
|
||||
./email-processor.sh scan --recent 7 --dry-run
|
||||
|
||||
# 4. Live run — classify and act (auto-act or queue)
|
||||
./email-processor.sh scan --recent 7
|
||||
|
||||
# 5. Check what got queued
|
||||
./email-processor.sh review list
|
||||
|
||||
# 6. Act on a queued email to seed decision history
|
||||
./email-processor.sh review 1 delete
|
||||
|
||||
# 7. Check that the decision was recorded
|
||||
./email-processor.sh stats
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
email_processor/
|
||||
main.py # Entry point — scan/review/stats/migrate subcommands
|
||||
classifier.py # LLM prompt builder + response parser
|
||||
decision_store.py # Decision history storage + few-shot retrieval
|
||||
config.json # Ollama + automation settings
|
||||
email-processor.sh # Shell wrapper (activates venv, forwards args)
|
||||
data/
|
||||
pending_emails.json # Queue of emails awaiting review
|
||||
decision_history.json # Past decisions (few-shot learning data)
|
||||
logs/
|
||||
YYYY-MM-DD.log # Daily processing logs
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Himalaya instead of raw IMAP
|
||||
|
||||
All IMAP operations go through the `himalaya` CLI via subprocess calls. This means:
|
||||
- No IMAP credentials stored in config.json — himalaya manages its own auth.
|
||||
- No connection management, reconnect logic, or SSL setup in Python.
|
||||
- Each action is a single himalaya command (e.g., `himalaya message delete 42`).
|
||||
|
||||
The tradeoff is a subprocess spawn per operation, but for email volumes (tens per run, not thousands) this is negligible.
|
||||
|
||||
### Non-interactive design
|
||||
|
||||
Every command takes its full input as arguments, acts, and exits. No `input()` calls, no interactive loops. This makes the system compatible with cron/OpenClaw and composable with other scripts. The pending queue on disk (`pending_emails.json`) is the shared state between scan and review invocations.
|
||||
|
||||
### decision_history.json as the "database"
|
||||
|
||||
`data/decision_history.json` is the only persistent state that matters for learning. It's a flat JSON array — every decision (user or auto) is appended as an entry. The classifier reads the whole file on each email to find relevant few-shot examples via relevance scoring.
|
||||
|
||||
The pending queue (`pending_emails.json`) is transient — emails pass through it and get marked "done". Logs are for debugging. The decision history is what the system learns from.
|
||||
|
||||
A flat JSON file works fine for hundreds or low thousands of decisions. SQLite would make sense if the history grows past ~10k entries and the linear scan becomes noticeable, or if concurrent writes from multiple processes become necessary. Neither applies at current scale.
|
||||
|
||||
### Few-shot learning via relevance scoring
|
||||
|
||||
Rather than sending the entire decision history to the LLM, `decision_store.get_relevant_examples()` scores each past decision against the current email using three signals:
|
||||
- Exact sender domain match (+3 points)
|
||||
- Recipient address match (+2 points)
|
||||
- Subject keyword overlap (+1 per shared word, stop-words excluded)
|
||||
|
||||
The top 5 most relevant examples are injected into the prompt as few-shot demonstrations. This keeps the prompt small while giving the model the most useful context.
|
||||
|
||||
### Conservative auto-action
|
||||
|
||||
Auto-action uses a single confidence threshold with an adaptive learning phase. When the decision history has fewer than `bootstrap_min_decisions` (default 20) entries, the threshold is raised to 95% — only very obvious classifications get auto-acted. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over. This lets the system start working from day one while being cautious until it has enough examples to learn from.
|
||||
|
||||
### `keep` means unread
|
||||
|
||||
The `keep` action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from `mark_read`, which dismisses low-priority emails without moving them. During scan, queued emails are marked as read to prevent re-processing, but that's a scan-level concern separate from the `keep` action itself.
|
||||
|
||||
### Fail-safe classification
|
||||
|
||||
If the LLM call fails (Ollama down, model not loaded, timeout), the classifier returns `action="keep"` with `confidence=0`. This guarantees the email gets queued for manual review rather than being auto-acted upon. The system never auto-trashes an email it couldn't classify.
|
||||
191
scripts/email_processor/classifier.py
Normal file
191
scripts/email_processor/classifier.py
Normal file
@@ -0,0 +1,191 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Classifier - LLM-based email classification with learning.
|
||||
|
||||
This module builds a rich prompt for the local Ollama model (Qwen3) that
|
||||
includes few-shot examples from past user decisions, per-sender statistics,
|
||||
and a list of known labels. The model returns a structured response with
|
||||
an action, confidence score, summary, and reason.
|
||||
|
||||
The prompt structure:
|
||||
1. System instructions (action definitions)
|
||||
2. Known labels (so the model reuses them)
|
||||
3. Sender statistics ("linkedin.com: deleted 8 times, kept 2 times")
|
||||
4. Few-shot examples (top 5 most relevant past decisions)
|
||||
5. The email to classify (subject, sender, recipient, body preview)
|
||||
6. Output format specification
|
||||
"""
|
||||
|
||||
import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
import decision_store
|
||||
|
||||
LOGS_DIR = Path(__file__).parent / "logs"
|
||||
|
||||
|
||||
def _build_prompt(email_data, config):
|
||||
"""Assemble the full classification prompt with learning context.
|
||||
|
||||
The prompt is built in sections, each providing different context to
|
||||
help the model make better decisions. Sections are omitted when there
|
||||
is no relevant data (e.g., no history yet for a new sender).
|
||||
"""
|
||||
max_body = config.get("rules", {}).get("max_body_length", 1000)
|
||||
|
||||
# Gather learning context from decision history
|
||||
examples = decision_store.get_relevant_examples(email_data, n=10)
|
||||
sender_domain = decision_store._extract_domain(email_data.get("sender", ""))
|
||||
sender_stats = decision_store.get_sender_stats(sender_domain) if sender_domain else {}
|
||||
known_labels = decision_store.get_known_labels()
|
||||
|
||||
# /no_think disables Qwen3's chain-of-thought, giving faster + shorter output
|
||||
parts = ["/no_think\n"]
|
||||
|
||||
# Section 1: Action definitions
|
||||
parts.append(
|
||||
"You are an email classifier. Classify the email into one of these actions:\n"
|
||||
"- delete: Spam, ads, promotions, unwanted notifications\n"
|
||||
"- archive: Informational emails worth keeping but not needing attention "
|
||||
"(receipts, shipping updates, automated confirmations)\n"
|
||||
"- keep: Important emails that need attention or action (left unread in inbox)\n"
|
||||
"- mark_read: Low-priority, leave in inbox but mark as read\n"
|
||||
"- label:<name>: Categorize with a specific label\n"
|
||||
)
|
||||
|
||||
# Section 2: Known labels (helps model reuse instead of inventing)
|
||||
if known_labels:
|
||||
parts.append(f"\nLabels used before: {', '.join(sorted(known_labels))}\n")
|
||||
|
||||
# Section 3: Sender statistics (strong signal for repeat senders)
|
||||
if sender_stats:
|
||||
stats_str = ", ".join(
|
||||
f"{action} {count} times" for action, count in sender_stats.items()
|
||||
)
|
||||
parts.append(f"\nHistory for {sender_domain}: {stats_str}\n")
|
||||
|
||||
# Section 4: Few-shot examples (top 5 most relevant past decisions)
|
||||
if examples:
|
||||
parts.append("\n--- Past decisions (learn from these) ---")
|
||||
for ex in examples[:5]:
|
||||
parts.append(
|
||||
f"From: {ex['sender'][:60]} | To: {ex['recipient'][:40]} | "
|
||||
f"Subject: {ex['subject'][:60]} -> {ex['action']}"
|
||||
)
|
||||
parts.append("--- End examples ---\n")
|
||||
|
||||
# Section 5: The email being classified
|
||||
body_preview = email_data.get("body", "")[:max_body]
|
||||
parts.append(
|
||||
f"Now classify this email:\n"
|
||||
f"Subject: {email_data.get('subject', '(No Subject)')}\n"
|
||||
f"From: {email_data.get('sender', '(Unknown)')}\n"
|
||||
f"To: {email_data.get('recipient', '(Unknown)')}\n"
|
||||
f"Body: {body_preview}\n"
|
||||
)
|
||||
|
||||
# Section 6: Required output format
|
||||
parts.append(
|
||||
"Respond in this exact format (nothing else):\n"
|
||||
"Action: [delete|archive|keep|mark_read|label:<name>]\n"
|
||||
"Confidence: [0-100]\n"
|
||||
"Summary: [one sentence summary of the email]\n"
|
||||
"Reason: [brief explanation for your classification]"
|
||||
)
|
||||
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
def _log_llm(prompt, output, email_data, action, confidence, duration):
|
||||
"""Log the full LLM prompt and response to logs/llm_YYYY-MM-DD.log."""
|
||||
LOGS_DIR.mkdir(exist_ok=True)
|
||||
log_file = LOGS_DIR / f"llm_{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
subject = email_data.get("subject", "(No Subject)")[:60]
|
||||
sender = email_data.get("sender", "(Unknown)")[:60]
|
||||
|
||||
with open(log_file, "a", encoding="utf-8") as f:
|
||||
f.write(f"{'=' * 70}\n")
|
||||
f.write(f"[{timestamp}] {subject}\n")
|
||||
f.write(f"From: {sender} | Result: {action} @ {confidence}% | {duration:.1f}s\n")
|
||||
f.write(f"{'-' * 70}\n")
|
||||
f.write(f"PROMPT:\n{prompt}\n")
|
||||
f.write(f"{'-' * 70}\n")
|
||||
f.write(f"RESPONSE:\n{output}\n")
|
||||
f.write(f"{'=' * 70}\n\n")
|
||||
|
||||
|
||||
def _parse_response(output):
|
||||
"""Parse the model's text response into structured fields.
|
||||
|
||||
Expected format (one per line):
|
||||
Action: delete
|
||||
Confidence: 92
|
||||
Summary: Promotional offer from retailer
|
||||
Reason: Clearly a marketing email with discount offer
|
||||
|
||||
Falls back to safe defaults (keep, 50% confidence) on parse failure.
|
||||
"""
|
||||
action = "keep"
|
||||
confidence = 50
|
||||
summary = "No summary"
|
||||
reason = "Unknown"
|
||||
|
||||
for line in output.strip().split("\n"):
|
||||
line = line.strip()
|
||||
if line.startswith("Action:"):
|
||||
raw_action = line.replace("Action:", "").strip().lower()
|
||||
valid_actions = {"delete", "archive", "keep", "mark_read"}
|
||||
if raw_action in valid_actions or raw_action.startswith("label:"):
|
||||
action = raw_action
|
||||
elif line.startswith("Confidence:"):
|
||||
try:
|
||||
confidence = int(line.replace("Confidence:", "").strip().rstrip("%"))
|
||||
confidence = max(0, min(100, confidence)) # clamp to 0-100
|
||||
except ValueError:
|
||||
confidence = 50
|
||||
elif line.startswith("Summary:"):
|
||||
summary = line.replace("Summary:", "").strip()[:200]
|
||||
elif line.startswith("Reason:"):
|
||||
reason = line.replace("Reason:", "").strip()
|
||||
|
||||
return action, confidence, summary, reason
|
||||
|
||||
|
||||
def classify_email(email_data, config):
|
||||
"""Classify an email using the local LLM with few-shot learning context.
|
||||
|
||||
Connects to Ollama, sends the assembled prompt, and parses the response.
|
||||
On any error, falls back to "keep" with 0% confidence so the email
|
||||
gets queued for manual review rather than auto-acted upon.
|
||||
|
||||
Args:
|
||||
email_data: dict with subject, sender, recipient, body keys.
|
||||
config: full config dict (needs ollama.model and rules.max_body_length).
|
||||
|
||||
Returns:
|
||||
Tuple of (action, confidence, summary, reason, duration_seconds).
|
||||
"""
|
||||
import ollama
|
||||
|
||||
prompt = _build_prompt(email_data, config)
|
||||
model = config.get("ollama", {}).get("model", "kamekichi128/qwen3-4b-instruct-2507:latest")
|
||||
|
||||
start_time = time.time()
|
||||
try:
|
||||
# Low temperature for consistent classification
|
||||
response = ollama.generate(model=model, prompt=prompt, options={"temperature": 0.1})
|
||||
output = response["response"]
|
||||
action, confidence, summary, reason = _parse_response(output)
|
||||
except Exception as e:
|
||||
# On failure, default to "keep" with 0 confidence -> always queued
|
||||
output = f"ERROR: {e}"
|
||||
action = "keep"
|
||||
confidence = 0
|
||||
summary = "Classification failed"
|
||||
reason = f"error - {str(e)[:100]}"
|
||||
|
||||
duration = time.time() - start_time
|
||||
_log_llm(prompt, output, email_data, action, confidence, duration)
|
||||
return action, confidence, summary, reason, duration
|
||||
@@ -1,16 +1,14 @@
|
||||
{
|
||||
"imap": {
|
||||
"host": "imap.migadu.com",
|
||||
"port": 993,
|
||||
"email": "youlu@luyanxin.com",
|
||||
"password": "kDkNau2r7m.hV!uk*D4Yr8mC7Dyjx9T"
|
||||
},
|
||||
"ollama": {
|
||||
"host": "http://localhost:11434",
|
||||
"model": "qwen3:4b"
|
||||
"model": "kamekichi128/qwen3-4b-instruct-2507:latest"
|
||||
},
|
||||
"rules": {
|
||||
"max_body_length": 1000,
|
||||
"check_unseen_only": true
|
||||
},
|
||||
"automation": {
|
||||
"confidence_threshold": 75,
|
||||
"bootstrap_min_decisions": 20
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
{
|
||||
"msg_f1d43ea3": {
|
||||
"imap_uid": "2",
|
||||
"subject": "Delivered: \"Voikinfo Bottom Gusset Bags...\"",
|
||||
"sender": "\"Amazon.com - order-update(a)amazon.com\"\r\n <order-update_at_amazon_com_posyo@simplelogin.co>",
|
||||
"recipient": "sho.amazon@ylu17.com",
|
||||
"summary": "Your Amazon package (order #114-1496788-7649829) was delivered today to Argo, Los Angeles, CA and left near the front door or porch.",
|
||||
"email_date": "Wed, 18 Feb 2026 04:15:24 +0000",
|
||||
"status": "pending",
|
||||
"found_at": "2026-02-18T16:18:42.347538"
|
||||
},
|
||||
"msg_60c56a87": {
|
||||
"imap_uid": "3",
|
||||
"subject": "=?UTF-8?b?5L2V5LiN5ruh6Laz6Ieq5bex55qE5Y+j6IW55LmL5qyy?=",
|
||||
"sender": "\"Uber Eats - uber(a)uber.com\" <uber_at_uber_com_kjwzyhxn@simplelogin.co>",
|
||||
"recipient": "uber@ylu17.com",
|
||||
"summary": "Uber Eats has sent a notification that the user's order is ready for pickup.",
|
||||
"email_date": "Wed, 18 Feb 2026 11:36:59 +0000",
|
||||
"status": "pending",
|
||||
"found_at": "2026-02-18T08:05:56.594842"
|
||||
},
|
||||
"msg_ebd24205": {
|
||||
"imap_uid": "4",
|
||||
"subject": "Your order has been shipped (or closed if combined/delivered).",
|
||||
"sender": "\"cd(a)woodenswords.com\"\r\n <cd_at_woodenswords_com_xivwijojc@simplelogin.co>",
|
||||
"recipient": "mail@luyx.org",
|
||||
"summary": "This email confirms that your order has been shipped or closed (if combined/delivered).",
|
||||
"email_date": "Wed, 18 Feb 2026 16:07:58 +0000",
|
||||
"status": "pending",
|
||||
"found_at": "2026-02-18T12:01:19.048091"
|
||||
},
|
||||
"msg_fa73b3bd": {
|
||||
"imap_uid": "6",
|
||||
"subject": "=?UTF-8?Q?Yanxin,_I=E2=80=99m_still_waiting_for_your_response?=",
|
||||
"sender": "\"Arslan (via LinkedIn) - messages-noreply(a)linkedin.com\"\r\n <messages-noreply_at_linkedin_com_ajpnalmwp@simplelogin.co>",
|
||||
"recipient": "Yanxin Lu <acc.linkedin@ylu17.com>",
|
||||
"summary": "Arslan Ahmed, a Senior AI | ML | Full Stack Engineer from Ilford, invited you to connect on February 11, 2026 at 10:08 PM and is waiting for your response.",
|
||||
"email_date": "Wed, 18 Feb 2026 18:53:45 +0000 (UTC)",
|
||||
"status": "pending",
|
||||
"found_at": "2026-02-18T12:04:34.602407"
|
||||
},
|
||||
"msg_59f23736": {
|
||||
"imap_uid": "1",
|
||||
"subject": "New Software Engineer jobs that match your profile",
|
||||
"sender": "\"LinkedIn - jobs-noreply(a)linkedin.com\"\r\n <jobs-noreply_at_linkedin_com_zuwggfxh@simplelogin.co>",
|
||||
"recipient": "Yanxin Lu <acc.linkedin@ylu17.com>",
|
||||
"summary": "LinkedIn has notified the user of new software engineering jobs that match their profile and includes a link to update their top card.",
|
||||
"email_date": "Wed, 18 Feb 2026 02:07:12 +0000 (UTC)",
|
||||
"status": "pending",
|
||||
"found_at": "2026-02-18T16:16:00.784822"
|
||||
}
|
||||
}
|
||||
253
scripts/email_processor/decision_store.py
Normal file
253
scripts/email_processor/decision_store.py
Normal file
@@ -0,0 +1,253 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Decision Store - Manages decision history for learning-based email classification.
|
||||
|
||||
This module persists every user and auto-made decision to a flat JSON file
|
||||
(data/decision_history.json). Past decisions serve as few-shot examples
|
||||
that are injected into the LLM prompt by classifier.py, enabling the
|
||||
system to learn from user behavior over time.
|
||||
|
||||
Storage format: a JSON array of decision entries, each containing sender,
|
||||
recipient, subject, summary, action taken, and whether it was a user or
|
||||
auto decision.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from collections import Counter
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Paths
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SCRIPT_DIR = Path(__file__).parent
|
||||
DATA_DIR = SCRIPT_DIR / "data"
|
||||
HISTORY_FILE = DATA_DIR / "decision_history.json"
|
||||
PENDING_FILE = DATA_DIR / "pending_emails.json"
|
||||
|
||||
# Stop-words excluded from subject keyword matching to reduce noise.
|
||||
_STOP_WORDS = {"re", "fwd", "the", "a", "an", "is", "to", "for", "and", "or", "your", "you"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _load_history():
|
||||
"""Load the full decision history list from disk."""
|
||||
if not HISTORY_FILE.exists():
|
||||
return []
|
||||
with open(HISTORY_FILE, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def _save_history(history):
|
||||
"""Write the full decision history list to disk."""
|
||||
DATA_DIR.mkdir(exist_ok=True)
|
||||
with open(HISTORY_FILE, "w", encoding="utf-8") as f:
|
||||
json.dump(history, f, indent=2, ensure_ascii=False)
|
||||
|
||||
|
||||
def _extract_domain(sender):
|
||||
"""Extract the domain part from a sender string.
|
||||
|
||||
Handles formats like:
|
||||
"Display Name <user@example.com>"
|
||||
user@example.com
|
||||
"""
|
||||
match = re.search(r"[\w.+-]+@([\w.-]+)", sender)
|
||||
return match.group(1).lower() if match else ""
|
||||
|
||||
|
||||
def _extract_email_address(sender):
|
||||
"""Extract the full email address from a sender string."""
|
||||
match = re.search(r"([\w.+-]+@[\w.-]+)", sender)
|
||||
return match.group(1).lower() if match else sender.lower()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def record_decision(email_data, action, source="user"):
|
||||
"""Append a decision to the history file.
|
||||
|
||||
Args:
|
||||
email_data: dict with keys: sender, recipient, subject, summary.
|
||||
action: one of "delete", "archive", "keep", "mark_read",
|
||||
or "label:<name>".
|
||||
source: "user" (manual review) or "auto" (high-confidence).
|
||||
"""
|
||||
history = _load_history()
|
||||
entry = {
|
||||
"timestamp": datetime.now().isoformat(timespec="seconds"),
|
||||
"sender": email_data.get("sender", ""),
|
||||
"sender_domain": _extract_domain(email_data.get("sender", "")),
|
||||
"recipient": email_data.get("recipient", ""),
|
||||
"subject": email_data.get("subject", ""),
|
||||
"summary": email_data.get("summary", ""),
|
||||
"action": action,
|
||||
"source": source,
|
||||
}
|
||||
history.append(entry)
|
||||
_save_history(history)
|
||||
return entry
|
||||
|
||||
|
||||
def get_relevant_examples(email_data, n=10):
|
||||
"""Find the N most relevant past decisions for a given email.
|
||||
|
||||
Relevance is scored by three signals:
|
||||
- Exact sender domain match: +3 points
|
||||
- Recipient string match: +2 points
|
||||
- Subject keyword overlap: +1 point per shared word
|
||||
|
||||
Only entries with score > 0 are considered. Results are returned
|
||||
sorted by descending relevance.
|
||||
"""
|
||||
history = _load_history()
|
||||
if not history:
|
||||
return []
|
||||
|
||||
target_domain = _extract_domain(email_data.get("sender", ""))
|
||||
target_recipient = email_data.get("recipient", "").lower()
|
||||
target_words = (
|
||||
set(re.findall(r"\w+", email_data.get("subject", "").lower())) - _STOP_WORDS
|
||||
)
|
||||
|
||||
scored = []
|
||||
for entry in history:
|
||||
score = 0
|
||||
|
||||
# Signal 1: sender domain match
|
||||
if target_domain and entry.get("sender_domain", "") == target_domain:
|
||||
score += 3
|
||||
|
||||
# Signal 2: recipient substring match
|
||||
if target_recipient and target_recipient in entry.get("recipient", "").lower():
|
||||
score += 2
|
||||
|
||||
# Signal 3: subject keyword overlap
|
||||
entry_words = (
|
||||
set(re.findall(r"\w+", entry.get("subject", "").lower())) - _STOP_WORDS
|
||||
)
|
||||
score += len(target_words & entry_words)
|
||||
|
||||
if score > 0:
|
||||
scored.append((score, entry))
|
||||
|
||||
scored.sort(key=lambda x: x[0], reverse=True)
|
||||
return [entry for _, entry in scored[:n]]
|
||||
|
||||
|
||||
def get_sender_stats(sender_domain):
|
||||
"""Get action distribution for a sender domain.
|
||||
|
||||
Returns a dict like {"delete": 5, "keep": 2, "archive": 1}.
|
||||
"""
|
||||
history = _load_history()
|
||||
actions = Counter()
|
||||
for entry in history:
|
||||
if entry.get("sender_domain", "") == sender_domain:
|
||||
actions[entry["action"]] += 1
|
||||
return dict(actions)
|
||||
|
||||
|
||||
def get_sender_history_count(sender_domain):
|
||||
"""Count total past decisions for a sender domain.
|
||||
|
||||
Used by the scan command to decide whether there is enough history
|
||||
to trust auto-actions for this sender.
|
||||
"""
|
||||
history = _load_history()
|
||||
return sum(1 for e in history if e.get("sender_domain", "") == sender_domain)
|
||||
|
||||
|
||||
def get_known_labels():
|
||||
"""Return the set of all label names used in past "label:<name>" decisions.
|
||||
|
||||
These are offered to the LLM so it can reuse existing labels rather
|
||||
than inventing new ones.
|
||||
"""
|
||||
history = _load_history()
|
||||
labels = set()
|
||||
for entry in history:
|
||||
action = entry.get("action", "")
|
||||
if action.startswith("label:"):
|
||||
labels.add(action[6:])
|
||||
return labels
|
||||
|
||||
|
||||
def get_all_stats():
|
||||
"""Compute aggregate statistics across the full decision history.
|
||||
|
||||
Returns a dict with keys: total, by_action, by_source, top_domains.
|
||||
Returns None if history is empty.
|
||||
"""
|
||||
history = _load_history()
|
||||
if not history:
|
||||
return None
|
||||
|
||||
total = len(history)
|
||||
by_action = Counter(e["action"] for e in history)
|
||||
by_source = Counter(e["source"] for e in history)
|
||||
|
||||
# Top 10 sender domains by decision count
|
||||
domain_counts = Counter(e.get("sender_domain", "") for e in history)
|
||||
top_domains = domain_counts.most_common(10)
|
||||
|
||||
return {
|
||||
"total": total,
|
||||
"by_action": dict(by_action),
|
||||
"by_source": dict(by_source),
|
||||
"top_domains": top_domains,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Migration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def migrate_pending():
|
||||
"""One-time migration: import 'done' entries from pending_emails.json.
|
||||
|
||||
Converts old-style action names ("archived" -> "archive", etc.) and
|
||||
records them as user decisions in the history file. Safe to run
|
||||
multiple times (will create duplicates though, so run once only).
|
||||
"""
|
||||
if not PENDING_FILE.exists():
|
||||
print("No pending_emails.json found, nothing to migrate.")
|
||||
return 0
|
||||
|
||||
with open(PENDING_FILE, "r", encoding="utf-8") as f:
|
||||
pending = json.load(f)
|
||||
|
||||
# Map old action names to new ones
|
||||
action_map = {
|
||||
"archived": "archive",
|
||||
"kept": "keep",
|
||||
"deleted": "delete",
|
||||
}
|
||||
|
||||
migrated = 0
|
||||
for msg_id, data in pending.items():
|
||||
if data.get("status") != "done":
|
||||
continue
|
||||
old_action = data.get("action", "")
|
||||
action = action_map.get(old_action, old_action)
|
||||
if not action:
|
||||
continue
|
||||
|
||||
email_data = {
|
||||
"sender": data.get("sender", ""),
|
||||
"recipient": data.get("recipient", ""),
|
||||
"subject": data.get("subject", ""),
|
||||
"summary": data.get("summary", ""),
|
||||
}
|
||||
record_decision(email_data, action, source="user")
|
||||
migrated += 1
|
||||
|
||||
print(f"Migrated {migrated} decisions from pending_emails.json")
|
||||
return migrated
|
||||
27
scripts/email_processor/email-processor.sh
Executable file
27
scripts/email_processor/email-processor.sh
Executable file
@@ -0,0 +1,27 @@
|
||||
#!/usr/bin/env bash
|
||||
# email-processor — wrapper script for the email processor.
|
||||
#
|
||||
# Usage:
|
||||
# ./email-processor.sh scan # classify unseen emails
|
||||
# ./email-processor.sh scan --recent 30 # last 30 days
|
||||
# ./email-processor.sh scan --dry-run # classify only, no changes
|
||||
# ./email-processor.sh scan --recent 7 --dry-run # combine both
|
||||
# ./email-processor.sh review list # show pending queue
|
||||
# ./email-processor.sh review 1 delete # act on email #1
|
||||
# ./email-processor.sh review all delete # act on all pending
|
||||
# ./email-processor.sh review accept # accept all suggestions
|
||||
# ./email-processor.sh stats # show history stats
|
||||
# ./email-processor.sh migrate # import old decisions
|
||||
#
|
||||
# Requires: Python 3.8+, himalaya, Ollama running with model.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
|
||||
# Activate the virtualenv if it exists
|
||||
if [ -d "$SCRIPT_DIR/venv" ]; then
|
||||
source "$SCRIPT_DIR/venv/bin/activate"
|
||||
fi
|
||||
|
||||
exec python3 "$SCRIPT_DIR/main.py" "$@"
|
||||
@@ -1,50 +0,0 @@
|
||||
[2026-02-15 21:14:02] KEPT: Please confirm your mailbox youlu@luyanxin.com
|
||||
From: "noreply@simplelogin.io" <noreply@simplelogin.io>
|
||||
Analysis: KEEP: Legitimate service confirmation email for mailbox addition (not promotional)
|
||||
|
||||
[2026-02-15 21:15:04] KEPT: =?utf-8?B?RndkOiBHZXQgMTAlIG9mZiB5b3VyIG5leHQgb3JkZXIg4pyF?=
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: KEEP: error - HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=60)
|
||||
|
||||
[2026-02-15 21:15:37] KEPT:
|
||||
=?utf-8?B?RndkOiDigJxzb2Z0d2FyZSBlbmdpbmVlcuKAnTogTWljcm9
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: KEEP: LinkedIn job alert notification for subscribed job search (not promotional)
|
||||
|
||||
[2026-02-15 21:15:52] KEPT: Fwd: Your receipt from OpenRouter, Inc #2231-9732
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: KEEP: This is a legitimate receipt for a payment made to OpenRouter, Inc (a known AI service provider), not promotional content.
|
||||
|
||||
[2026-02-15 21:16:10] KEPT: Fwd: Your ChatGPT code is 217237
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: KEEP: Legitimate security verification code from OpenAI (standard login confirmation)
|
||||
|
||||
[2026-02-15 22:49:44] KEPT (69.0s): =?UTF-8?B?5rWL6K+V6YKu5Lu2?=
|
||||
From: Yanxin Lu <lyx@luyanxin.com>
|
||||
Analysis: KEEP: Test email for delivery verification
|
||||
|
||||
From: Yanxin Lu <lyx@luyanxin.com>
|
||||
Analysis: KEEP: Test email for delivery verification
|
||||
|
||||
[2026-02-15 22:57:03] MOVED_TO_TRASH (68.5s): =?utf-8?B?RndkOiBHZXQgMTAlIG9mZiB5b3VyIG5leHQgb3JkZXIg4pyF?=
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: AD: Forwarded Uber promotional offer
|
||||
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: AD: Forwarded Uber promotional offer
|
||||
|
||||
[2026-02-15 23:00:09] KEPT (120.1s): Fwd: Your ChatGPT code is 217237
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: KEEP: error - HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=120)
|
||||
|
||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
||||
Analysis: KEEP: error - HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=120)
|
||||
|
||||
@@ -1,29 +0,0 @@
|
||||
[2026-02-18 08:04:26] ADDED_TO_PENDING (msg_f1d43ea3) (108.6s): Delivered: "Voikinfo Bottom Gusset Bags..."
|
||||
From: "Amazon.com - order-update(a)amazon.com"
|
||||
<order-update_at_amazon_com_posyo@simplelogin.co>
|
||||
Analysis: KEEP: Standard delivery confirmation from Amazon
|
||||
|
||||
[2026-02-18 08:05:56] ADDED_TO_PENDING (msg_60c56a87) (88.0s): =?UTF-8?b?5L2V5LiN5ruh6Laz6Ieq5bex55qE5Y+j6IW55LmL5qyy?=
|
||||
From: "Uber Eats - uber(a)uber.com" <uber_at_uber_com_kjwzyhxn@simplelogin.co>
|
||||
Analysis: KEEP: The decoded subject line "Your Uber Eats order is ready!" indicates a transactional order update, not an advertisement.
|
||||
|
||||
[2026-02-18 12:01:19] ADDED_TO_PENDING (msg_ebd24205) (66.7s): Your order has been shipped (or closed if combined/delivered
|
||||
From: "cd(a)woodenswords.com"
|
||||
<cd_at_woodenswords_com_xivwijojc@simplelogin.co>
|
||||
Analysis: KEEP: System-generated shipping update notification from an e-commerce store, not promotional content.
|
||||
|
||||
[2026-02-18 12:03:36] MOVED_TO_TRASH (133.4s): =?UTF-8?Q?=E2=80=9Csoftware_engineer=E2=80=9D:_Snap_Inc._-_S
|
||||
From: "LinkedIn Job Alerts - jobalerts-noreply(a)linkedin.com"
|
||||
<jobalerts-noreply_at_linkedin_com_cnrlhok@simplelogin.co>
|
||||
Analysis: AD: This email is a promotional job alert notification from LinkedIn's service for users who have set up job preferences.
|
||||
|
||||
[2026-02-18 12:04:34] ADDED_TO_PENDING (msg_fa73b3bd) (57.3s): =?UTF-8?Q?Yanxin,_I=E2=80=99m_still_waiting_for_your_respons
|
||||
From: "Arslan (via LinkedIn) - messages-noreply(a)linkedin.com"
|
||||
<messages-noreply_at_linkedin_com_ajpnalmwp@simplelogin.co>
|
||||
Analysis: KEEP: This is a standard LinkedIn connection request notification with no promotional content, discounts, or advertisements—only a reminder of an existing invitation.
|
||||
|
||||
[2026-02-18 16:18:42] ADDED_TO_PENDING (msg_f1d43ea3) (102.1s): Delivered: "Voikinfo Bottom Gusset Bags..."
|
||||
From: "Amazon.com - order-update(a)amazon.com"
|
||||
<order-update_at_amazon_com_posyo@simplelogin.co>
|
||||
Analysis: KEEP: Standard delivery confirmation from Amazon, not a promotional message.
|
||||
|
||||
@@ -1,297 +1,704 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Email Processor - Auto filter ads using local Qwen3
|
||||
Moves ad emails to Trash folder (not permanently deleted)
|
||||
Email Processor - Learning-based mailbox cleanup using Himalaya + Ollama.
|
||||
|
||||
Uses himalaya CLI for all IMAP operations (no raw imaplib, no stored
|
||||
credentials). Uses a local Qwen3 model via Ollama for classification,
|
||||
with few-shot learning from past user decisions.
|
||||
|
||||
All commands are non-interactive — they take arguments, mutate files on
|
||||
disk, and exit. Suitable for cron (OpenClaw) and scripting.
|
||||
|
||||
Subcommands:
|
||||
python main.py scan # classify unseen emails
|
||||
python main.py scan --recent 30 # classify last 30 days
|
||||
python main.py scan --dry-run # classify only, no changes
|
||||
python main.py scan --recent 7 --dry-run # combine both
|
||||
python main.py review list # print pending queue
|
||||
python main.py review <num-or-id> <action> # act on one email
|
||||
python main.py review all <action> # act on all pending
|
||||
python main.py review accept # accept all suggestions
|
||||
python main.py stats # show decision history
|
||||
python main.py migrate # import old decisions
|
||||
|
||||
Action mapping (what each classification does to the email):
|
||||
delete -> himalaya message delete <id> (moves to Trash)
|
||||
archive -> himalaya message move Archive <id>
|
||||
keep -> no-op (leave unread in inbox)
|
||||
mark_read -> himalaya flag add <id> seen
|
||||
label:X -> himalaya message move <X> <id>
|
||||
"""
|
||||
|
||||
import json
|
||||
import imaplib
|
||||
import email
|
||||
import os
|
||||
import subprocess
|
||||
import hashlib
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
# Config
|
||||
import classifier
|
||||
import decision_store
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Paths — all relative to the script's own directory
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SCRIPT_DIR = Path(__file__).parent
|
||||
CONFIG_FILE = SCRIPT_DIR / "config.json"
|
||||
LOGS_DIR = SCRIPT_DIR / "logs"
|
||||
DATA_DIR = SCRIPT_DIR / "data"
|
||||
PENDING_FILE = DATA_DIR / "pending_emails.json"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def load_config():
|
||||
"""Load configuration"""
|
||||
"""Load config.json from the script directory.
|
||||
|
||||
Only ollama, rules, and automation settings are needed — himalaya
|
||||
manages its own IMAP config separately.
|
||||
"""
|
||||
with open(CONFIG_FILE) as f:
|
||||
return json.load(f)
|
||||
|
||||
def connect_imap(config):
|
||||
"""Connect to IMAP server"""
|
||||
imap_config = config['imap']
|
||||
mail = imaplib.IMAP4_SSL(imap_config['host'], imap_config['port'])
|
||||
mail.login(imap_config['email'], imap_config['password'])
|
||||
return mail
|
||||
|
||||
def get_unseen_emails(mail):
|
||||
"""Get list of unseen email IDs"""
|
||||
mail.select('INBOX')
|
||||
_, search_data = mail.search(None, 'UNSEEN')
|
||||
email_ids = search_data[0].split()
|
||||
return email_ids
|
||||
# ---------------------------------------------------------------------------
|
||||
# Himalaya CLI wrappers
|
||||
#
|
||||
# All IMAP operations go through himalaya, which handles connection,
|
||||
# auth, and protocol details. We call it as a subprocess and parse
|
||||
# its JSON output.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def fetch_email(mail, email_id):
|
||||
"""Fetch email content"""
|
||||
_, msg_data = mail.fetch(email_id, '(RFC822)')
|
||||
raw_email = msg_data[0][1]
|
||||
msg = email.message_from_bytes(raw_email)
|
||||
|
||||
# Extract subject
|
||||
subject = msg['Subject'] or '(No Subject)'
|
||||
|
||||
# Extract sender
|
||||
sender = msg['From'] or '(Unknown)'
|
||||
|
||||
# Extract recipient
|
||||
recipient = msg['To'] or '(Unknown)'
|
||||
|
||||
# Extract date
|
||||
date = msg['Date'] or datetime.now().isoformat()
|
||||
|
||||
# Extract body
|
||||
body = ""
|
||||
if msg.is_multipart():
|
||||
for part in msg.walk():
|
||||
if part.get_content_type() == "text/plain":
|
||||
try:
|
||||
body = part.get_payload(decode=True).decode('utf-8', errors='ignore')
|
||||
break
|
||||
except:
|
||||
pass
|
||||
def _himalaya(*args):
|
||||
"""Run a himalaya command and return its stdout.
|
||||
|
||||
Raises subprocess.CalledProcessError on failure.
|
||||
"""
|
||||
result = subprocess.run(
|
||||
["himalaya", *args],
|
||||
capture_output=True, text=True, check=True,
|
||||
)
|
||||
return result.stdout
|
||||
|
||||
|
||||
def _himalaya_json(*args):
|
||||
"""Run a himalaya command with JSON output and return parsed result."""
|
||||
return json.loads(_himalaya("-o", "json", *args))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Email fetching via himalaya
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def get_unseen_envelopes():
|
||||
"""Fetch envelope metadata for all unseen emails in INBOX.
|
||||
|
||||
Returns a list of envelope dicts from himalaya's JSON output.
|
||||
Each has keys like: id, subject, from, to, date, flags.
|
||||
"""
|
||||
return _himalaya_json("envelope", "list", "not", "flag", "seen")
|
||||
|
||||
|
||||
def get_recent_envelopes(days):
|
||||
"""Fetch envelope metadata for all emails from the last N days.
|
||||
|
||||
Includes both read and unread emails — useful for testing and
|
||||
bulk-classifying historical mail.
|
||||
"""
|
||||
since = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
|
||||
return _himalaya_json("envelope", "list", "after", since)
|
||||
|
||||
|
||||
def read_message(envelope_id):
|
||||
"""Read the full message body without marking it as seen.
|
||||
|
||||
The --preview flag prevents himalaya from adding the \\Seen flag,
|
||||
so the email stays unread for the actual action to handle.
|
||||
"""
|
||||
# Read plain text, no headers, without marking as seen
|
||||
return _himalaya("message", "read", "--preview", "--no-headers", str(envelope_id))
|
||||
|
||||
|
||||
def build_email_data(envelope, body, config):
|
||||
"""Build the email_data dict expected by classifier and decision_store.
|
||||
|
||||
Combines envelope metadata (from himalaya envelope list) with the
|
||||
message body (from himalaya message read).
|
||||
"""
|
||||
max_body = config.get("rules", {}).get("max_body_length", 1000)
|
||||
|
||||
# himalaya envelope JSON uses "from" as a nested object or string
|
||||
sender = envelope.get("from", {})
|
||||
if isinstance(sender, dict):
|
||||
# Format: {"name": "Display Name", "addr": "user@example.com"}
|
||||
name = sender.get("name", "")
|
||||
addr = sender.get("addr", "")
|
||||
sender_str = f"{name} <{addr}>" if name else addr
|
||||
elif isinstance(sender, list) and sender:
|
||||
first = sender[0]
|
||||
name = first.get("name", "")
|
||||
addr = first.get("addr", "")
|
||||
sender_str = f"{name} <{addr}>" if name else addr
|
||||
else:
|
||||
try:
|
||||
body = msg.get_payload(decode=True).decode('utf-8', errors='ignore')
|
||||
except:
|
||||
pass
|
||||
|
||||
sender_str = str(sender)
|
||||
|
||||
# Same for "to"
|
||||
to = envelope.get("to", {})
|
||||
if isinstance(to, dict):
|
||||
name = to.get("name", "")
|
||||
addr = to.get("addr", "")
|
||||
to_str = f"{name} <{addr}>" if name else addr
|
||||
elif isinstance(to, list) and to:
|
||||
first = to[0]
|
||||
name = first.get("name", "")
|
||||
addr = first.get("addr", "")
|
||||
to_str = f"{name} <{addr}>" if name else addr
|
||||
else:
|
||||
to_str = str(to)
|
||||
|
||||
return {
|
||||
'id': email_id,
|
||||
'subject': subject,
|
||||
'sender': sender,
|
||||
'recipient': recipient,
|
||||
'date': date,
|
||||
'body': body[:300] # Limit body length
|
||||
"id": str(envelope.get("id", "")),
|
||||
"subject": envelope.get("subject", "(No Subject)"),
|
||||
"sender": sender_str,
|
||||
"recipient": to_str,
|
||||
"date": envelope.get("date", ""),
|
||||
"body": body[:max_body],
|
||||
}
|
||||
|
||||
def analyze_with_qwen3(email_data, config):
|
||||
"""Analyze email with local Qwen3 using official library"""
|
||||
import ollama
|
||||
import time
|
||||
|
||||
prompt = f"""/no_think
|
||||
|
||||
Analyze this email and provide two pieces of information:
|
||||
# ---------------------------------------------------------------------------
|
||||
# IMAP actions via himalaya
|
||||
#
|
||||
# Each function executes one himalaya command. Returns True on success.
|
||||
# On failure, prints the error and returns False.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
1. Is this an advertisement/promotional email?
|
||||
2. Summarize the email in one sentence
|
||||
def execute_action(envelope_id, action):
|
||||
"""Dispatch an action string to the appropriate himalaya command.
|
||||
|
||||
Email details:
|
||||
Subject: {email_data['subject']}
|
||||
Sender: {email_data['sender']}
|
||||
Body: {email_data['body'][:300]}
|
||||
Action mapping:
|
||||
"delete" -> himalaya message delete <id>
|
||||
"archive" -> himalaya message move Archive <id>
|
||||
"keep" -> no-op (leave unread in inbox)
|
||||
"mark_read" -> himalaya flag add <id> seen
|
||||
"label:X" -> himalaya message move <X> <id>
|
||||
|
||||
Respond in this exact format:
|
||||
IsAD: [YES or NO]
|
||||
Summary: [one sentence summary]
|
||||
Reason: [brief explanation]
|
||||
"""
|
||||
|
||||
start_time = time.time()
|
||||
model = config['ollama'].get('model', 'qwen3:4b')
|
||||
|
||||
Returns True on success, False on failure.
|
||||
"""
|
||||
eid = str(envelope_id)
|
||||
try:
|
||||
response = ollama.generate(model=model, prompt=prompt, options={'temperature': 0.1})
|
||||
output = response['response']
|
||||
|
||||
# Parse output
|
||||
is_ad = False
|
||||
summary = "No summary"
|
||||
reason = "Unknown"
|
||||
|
||||
for line in output.strip().split('\n'):
|
||||
if line.startswith('IsAD:'):
|
||||
is_ad = 'YES' in line.upper()
|
||||
elif line.startswith('Summary:'):
|
||||
summary = line.replace('Summary:', '').strip()[:200]
|
||||
elif line.startswith('Reason:'):
|
||||
reason = line.replace('Reason:', '').strip()
|
||||
|
||||
if is_ad:
|
||||
result = f"AD: {reason}"
|
||||
if action == "delete":
|
||||
_himalaya("message", "delete", eid)
|
||||
elif action == "archive":
|
||||
_himalaya("message", "move", "Archive", eid)
|
||||
elif action == "keep":
|
||||
pass # leave unread in inbox — no IMAP changes
|
||||
elif action == "mark_read":
|
||||
_himalaya("flag", "add", eid, "seen")
|
||||
elif action.startswith("label:"):
|
||||
folder = action[6:]
|
||||
_himalaya("message", "move", folder, eid)
|
||||
else:
|
||||
result = f"KEEP: {reason}"
|
||||
|
||||
except Exception as e:
|
||||
result = f"KEEP: error - {str(e)[:100]}"
|
||||
summary = "Analysis failed"
|
||||
is_ad = False
|
||||
|
||||
duration = time.time() - start_time
|
||||
return result, summary, is_ad, duration
|
||||
|
||||
def move_to_trash(mail, email_id):
|
||||
"""Move email to Trash folder"""
|
||||
# Copy to Trash
|
||||
result = mail.copy(email_id, 'Trash')
|
||||
if result[0] == 'OK':
|
||||
# Mark original as deleted
|
||||
mail.store(email_id, '+FLAGS', '\\Deleted')
|
||||
print(f" Unknown action: {action}")
|
||||
return False
|
||||
return True
|
||||
return False
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f" Himalaya error: {e.stderr.strip()}")
|
||||
return False
|
||||
|
||||
def log_result(log_file, email_data, analysis, action, duration=None):
|
||||
"""Log processing result with Qwen3 duration"""
|
||||
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
duration_str = f" ({duration:.1f}s)" if duration else ""
|
||||
with open(log_file, 'a') as f:
|
||||
f.write(f"[{timestamp}] {action}{duration_str}: {email_data['subject'][:60]}\n")
|
||||
f.write(f" From: {email_data['sender']}\n")
|
||||
f.write(f" Analysis: {analysis}\n\n")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pending queue — emails awaiting manual review
|
||||
#
|
||||
# Stored as a JSON dict in data/pending_emails.json, keyed by msg_id.
|
||||
# Each entry tracks the envelope ID (for himalaya), classifier suggestion,
|
||||
# and status (pending/done).
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def load_pending():
|
||||
"""Load pending emails from JSON file"""
|
||||
"""Load the pending queue from disk."""
|
||||
if not PENDING_FILE.exists():
|
||||
return {}
|
||||
with open(PENDING_FILE, 'r', encoding='utf-8') as f:
|
||||
with open(PENDING_FILE, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def save_pending(pending):
|
||||
"""Save pending emails to JSON file"""
|
||||
"""Write the pending queue to disk."""
|
||||
DATA_DIR.mkdir(exist_ok=True)
|
||||
with open(PENDING_FILE, 'w', encoding='utf-8') as f:
|
||||
with open(PENDING_FILE, "w", encoding="utf-8") as f:
|
||||
json.dump(pending, f, indent=2, ensure_ascii=False)
|
||||
|
||||
def add_to_pending(email_data, summary, imap_uid, recipient):
|
||||
"""Add email to pending queue"""
|
||||
|
||||
def add_to_pending(email_data, summary, reason, action_suggestion, confidence):
|
||||
"""Add an email to the pending queue for manual review.
|
||||
|
||||
Stores the classifier's suggestion and confidence alongside the
|
||||
email metadata so the user can see what the model thought.
|
||||
"""
|
||||
pending = load_pending()
|
||||
|
||||
# Generate unique ID
|
||||
import hashlib
|
||||
msg_id = f"msg_{hashlib.md5(f'{imap_uid}_{email_data['subject']}'.encode()).hexdigest()[:8]}"
|
||||
|
||||
# Extract date from email
|
||||
email_date = email_data.get('date', datetime.now().isoformat())
|
||||
|
||||
|
||||
# Generate a stable ID from envelope ID + subject
|
||||
eid = str(email_data["id"])
|
||||
key = f"{eid}_{email_data['subject']}"
|
||||
msg_id = f"msg_{hashlib.md5(key.encode()).hexdigest()[:8]}"
|
||||
|
||||
pending[msg_id] = {
|
||||
"imap_uid": str(imap_uid),
|
||||
"subject": email_data['subject'],
|
||||
"sender": email_data['sender'],
|
||||
"recipient": recipient,
|
||||
"envelope_id": eid,
|
||||
"subject": email_data["subject"],
|
||||
"sender": email_data["sender"],
|
||||
"recipient": email_data.get("recipient", ""),
|
||||
"summary": summary,
|
||||
"email_date": email_date,
|
||||
"reason": reason,
|
||||
"suggested_action": action_suggestion,
|
||||
"confidence": confidence,
|
||||
"email_date": email_data.get("date", ""),
|
||||
"status": "pending",
|
||||
"found_at": datetime.now().isoformat()
|
||||
"found_at": datetime.now().isoformat(),
|
||||
}
|
||||
|
||||
save_pending(pending)
|
||||
return msg_id
|
||||
|
||||
def main():
|
||||
"""Main processing function"""
|
||||
print("📧 Email Processor Starting...")
|
||||
|
||||
# Load config
|
||||
config = load_config()
|
||||
|
||||
# Setup logging
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Logging
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def log_result(log_file, email_data, action, detail, duration=None):
|
||||
"""Append a one-line log entry for a processed email."""
|
||||
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
dur = f" ({duration:.1f}s)" if duration else ""
|
||||
with open(log_file, "a") as f:
|
||||
f.write(f"[{timestamp}] {action}{dur}: {email_data['subject'][:60]}\n")
|
||||
f.write(f" From: {email_data['sender']}\n")
|
||||
f.write(f" Detail: {detail}\n\n")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand: scan
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def cmd_scan(config, recent=None, dry_run=False):
|
||||
"""Fetch emails, classify each one, then auto-act or queue.
|
||||
|
||||
Auto-action is based on a single confidence threshold. When the
|
||||
decision history has fewer than 20 entries, a higher threshold (95%)
|
||||
is used to be conservative during the learning phase. Once enough
|
||||
history accumulates, the configured threshold takes over.
|
||||
|
||||
Args:
|
||||
config: full config dict.
|
||||
recent: if set, fetch emails from last N days (not just unseen).
|
||||
dry_run: if True, classify and print but skip all actions.
|
||||
"""
|
||||
mode = "DRY RUN" if dry_run else "Scan"
|
||||
print(f"Email Processor - {mode}")
|
||||
print("=" * 50)
|
||||
|
||||
LOGS_DIR.mkdir(exist_ok=True)
|
||||
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||
|
||||
try:
|
||||
# Connect to IMAP
|
||||
print("Connecting to IMAP...")
|
||||
mail = connect_imap(config)
|
||||
print("✅ Connected")
|
||||
|
||||
# Get unseen emails
|
||||
email_ids = get_unseen_emails(mail)
|
||||
print(f"Found {len(email_ids)} unread emails")
|
||||
|
||||
if not email_ids:
|
||||
print("No new emails to process")
|
||||
mail.logout()
|
||||
return
|
||||
|
||||
# Process each email
|
||||
processed = 0
|
||||
moved_to_trash = 0
|
||||
added_to_pending = 0
|
||||
|
||||
for email_id in email_ids:
|
||||
print(f"\nProcessing email {email_id.decode()}...")
|
||||
|
||||
# Fetch email
|
||||
email_data = fetch_email(mail, email_id)
|
||||
print(f" Subject: {email_data['subject'][:50]}")
|
||||
|
||||
# Analyze with Qwen3 (one call for both ad detection and summary)
|
||||
analysis, summary, is_ad, duration = analyze_with_qwen3(email_data, config)
|
||||
print(f" Analysis: {analysis[:100]}")
|
||||
print(f" Summary: {summary[:60]}...")
|
||||
print(f" Qwen3 time: {duration:.1f}s")
|
||||
|
||||
# Check if analysis was successful (not an error)
|
||||
if 'error -' in analysis.lower():
|
||||
# Analysis failed - keep email unread for retry
|
||||
print(f" -> Analysis failed, keeping unread for retry")
|
||||
log_result(log_file, email_data, analysis, "FAILED_RETRY", duration)
|
||||
# Don't increment processed count - will retry next time
|
||||
continue
|
||||
|
||||
# Analysis successful - determine action
|
||||
if is_ad:
|
||||
print(" -> Moving to Trash")
|
||||
if move_to_trash(mail, email_id):
|
||||
log_result(log_file, email_data, analysis, "MOVED_TO_TRASH", duration)
|
||||
moved_to_trash += 1
|
||||
else:
|
||||
log_result(log_file, email_data, analysis, "MOVE_FAILED", duration)
|
||||
|
||||
# Load automation threshold
|
||||
automation = config.get("automation", {})
|
||||
configured_threshold = automation.get("confidence_threshold", 75)
|
||||
|
||||
# Adaptive threshold: be conservative when history is thin
|
||||
stats = decision_store.get_all_stats()
|
||||
total_decisions = stats["total"] if stats else 0
|
||||
bootstrap_min = automation.get("bootstrap_min_decisions", 20)
|
||||
if total_decisions < bootstrap_min:
|
||||
confidence_threshold = 95
|
||||
print(f"Learning phase ({total_decisions}/{bootstrap_min} decisions) — threshold: 95%\n")
|
||||
else:
|
||||
confidence_threshold = configured_threshold
|
||||
|
||||
# Fetch envelopes via himalaya
|
||||
if recent:
|
||||
envelopes = get_recent_envelopes(recent)
|
||||
print(f"Found {len(envelopes)} emails from last {recent} days\n")
|
||||
else:
|
||||
envelopes = get_unseen_envelopes()
|
||||
print(f"Found {len(envelopes)} unread emails\n")
|
||||
|
||||
if not envelopes:
|
||||
print("No new emails to process.")
|
||||
return
|
||||
|
||||
auto_acted = 0
|
||||
queued = 0
|
||||
|
||||
for envelope in envelopes:
|
||||
eid = envelope.get("id", "?")
|
||||
print(f"[{eid}] ", end="", flush=True)
|
||||
|
||||
# Read message body without marking as seen
|
||||
try:
|
||||
body = read_message(eid)
|
||||
except subprocess.CalledProcessError:
|
||||
body = ""
|
||||
|
||||
email_data = build_email_data(envelope, body, config)
|
||||
print(f"{email_data['subject'][:55]}")
|
||||
|
||||
# Run the LLM classifier (includes few-shot examples from history)
|
||||
action, confidence, summary, reason, duration = classifier.classify_email(
|
||||
email_data, config
|
||||
)
|
||||
|
||||
print(f" -> {action} (confidence: {confidence}%, {duration:.1f}s)")
|
||||
print(f" {reason[:80]}")
|
||||
|
||||
# Auto-act if confidence meets threshold
|
||||
can_auto = confidence >= confidence_threshold
|
||||
|
||||
if dry_run:
|
||||
# Dry run: log what would happen, touch nothing
|
||||
log_result(log_file, email_data, f"DRYRUN:{action}@{confidence}%", reason, duration)
|
||||
if can_auto:
|
||||
print(f" -> Would AUTO-execute: {action}")
|
||||
auto_acted += 1
|
||||
else:
|
||||
# Non-ad email - add to pending queue
|
||||
print(" -> Adding to pending queue")
|
||||
|
||||
# Add to pending
|
||||
msg_internal_id = add_to_pending(
|
||||
email_data,
|
||||
summary,
|
||||
email_id.decode(),
|
||||
email_data.get('recipient', 'youlu@luyanxin.com')
|
||||
print(f" -> Would queue for review")
|
||||
queued += 1
|
||||
elif can_auto:
|
||||
# Auto-execute the action via himalaya
|
||||
success = execute_action(eid, action)
|
||||
if success:
|
||||
decision_store.record_decision(
|
||||
{**email_data, "summary": summary}, action, source="auto"
|
||||
)
|
||||
|
||||
# Mark as read (so it won't be processed again)
|
||||
mail.store(email_id, '+FLAGS', '\\Seen')
|
||||
|
||||
log_result(log_file, email_data, analysis, f"ADDED_TO_PENDING ({msg_internal_id})", duration)
|
||||
added_to_pending += 1
|
||||
|
||||
processed += 1
|
||||
|
||||
# Expunge deleted emails
|
||||
mail.expunge()
|
||||
mail.logout()
|
||||
|
||||
# Summary
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Total emails checked: {len(email_ids)}")
|
||||
print(f"Successfully processed: {processed} emails")
|
||||
print(f" - Moved to trash (ads): {moved_to_trash}")
|
||||
print(f" - Added to pending queue: {added_to_pending}")
|
||||
print(f"Failed (will retry next time): {len(email_ids) - processed}")
|
||||
print(f"\n📁 Pending queue: {PENDING_FILE}")
|
||||
print(f"📝 Log: {log_file}")
|
||||
print(f"\n💡 Run 'python process_queue.py' to view and process pending emails")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error: {e}")
|
||||
log_result(log_file, email_data, f"AUTO:{action}", reason, duration)
|
||||
print(f" ** AUTO-executed: {action}")
|
||||
auto_acted += 1
|
||||
else:
|
||||
# Himalaya action failed — fall back to queuing
|
||||
log_result(log_file, email_data, "AUTO_FAILED", reason, duration)
|
||||
print(f" !! Auto-action failed, queuing instead")
|
||||
add_to_pending(email_data, summary, reason, action, confidence)
|
||||
queued += 1
|
||||
else:
|
||||
# Not enough confidence or history — queue for manual review
|
||||
add_to_pending(email_data, summary, reason, action, confidence)
|
||||
# Mark as read to prevent re-processing on next scan
|
||||
if not dry_run:
|
||||
try:
|
||||
_himalaya("flag", "add", str(eid), "seen")
|
||||
except subprocess.CalledProcessError:
|
||||
pass
|
||||
log_result(log_file, email_data, f"QUEUED:{action}@{confidence}%", reason, duration)
|
||||
print(f" -> Queued (confidence {confidence}% < {confidence_threshold}%)")
|
||||
queued += 1
|
||||
|
||||
# Print run summary
|
||||
print(f"\n{'=' * 50}")
|
||||
print(f"Processed: {len(envelopes)} emails")
|
||||
print(f" Auto-acted: {auto_acted}")
|
||||
print(f" Queued for review: {queued}")
|
||||
print(f"\nRun 'python main.py review list' to see pending emails")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand: review
|
||||
#
|
||||
# Non-interactive: each invocation takes arguments, acts, and exits.
|
||||
# No input() calls. Compatible with cron and scripting.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _get_pending_items():
|
||||
"""Return only pending (not done) items, sorted by found_at."""
|
||||
pending = load_pending()
|
||||
items = {k: v for k, v in pending.items() if v.get("status") == "pending"}
|
||||
sorted_items = sorted(items.items(), key=lambda x: x[1].get("found_at", ""))
|
||||
return sorted_items
|
||||
|
||||
|
||||
def cmd_review_list():
|
||||
"""Print the pending queue and exit.
|
||||
|
||||
Shows each email with its number, ID, subject, sender, summary,
|
||||
and the classifier's suggested action with confidence.
|
||||
"""
|
||||
sorted_items = _get_pending_items()
|
||||
|
||||
if not sorted_items:
|
||||
print("No pending emails to review.")
|
||||
return
|
||||
|
||||
print(f"Pending emails: {len(sorted_items)}")
|
||||
print("=" * 60)
|
||||
|
||||
for i, (msg_id, data) in enumerate(sorted_items, 1):
|
||||
suggested = data.get("suggested_action", "?")
|
||||
conf = data.get("confidence", "?")
|
||||
print(f"\n {i}. [{msg_id}]")
|
||||
print(f" Subject: {data.get('subject', 'N/A')[:55]}")
|
||||
print(f" From: {data.get('sender', 'N/A')[:55]}")
|
||||
print(f" To: {data.get('recipient', 'N/A')[:40]}")
|
||||
print(f" Summary: {data.get('summary', 'N/A')[:70]}")
|
||||
print(f" Suggested: {suggested} ({conf}% confidence)")
|
||||
|
||||
print(f"\n{'=' * 60}")
|
||||
print("Usage:")
|
||||
print(" python main.py review <number> <action>")
|
||||
print(" python main.py review all <action>")
|
||||
print(" python main.py review accept")
|
||||
print("Actions: delete / archive / keep / mark_read / label:<name>")
|
||||
|
||||
|
||||
def cmd_review_act(selector, action):
|
||||
"""Execute an action on one or more pending emails.
|
||||
|
||||
Args:
|
||||
selector: a 1-based number, a msg_id string, or "all".
|
||||
action: one of delete/archive/keep/mark_read/label:<name>.
|
||||
"""
|
||||
# Validate action
|
||||
valid_actions = {"delete", "archive", "keep", "mark_read"}
|
||||
if action not in valid_actions and not action.startswith("label:"):
|
||||
print(f"Invalid action: {action}")
|
||||
print(f"Valid: {', '.join(sorted(valid_actions))}, label:<name>")
|
||||
sys.exit(1)
|
||||
|
||||
sorted_items = _get_pending_items()
|
||||
if not sorted_items:
|
||||
print("No pending emails to review.")
|
||||
return
|
||||
|
||||
# Resolve targets
|
||||
if selector == "all":
|
||||
targets = sorted_items
|
||||
else:
|
||||
target = _resolve_target(selector, sorted_items)
|
||||
if target is None:
|
||||
sys.exit(1)
|
||||
targets = [target]
|
||||
|
||||
LOGS_DIR.mkdir(exist_ok=True)
|
||||
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||
|
||||
# Execute action on each target
|
||||
for msg_id, data in targets:
|
||||
eid = data.get("envelope_id") or data.get("imap_uid")
|
||||
if not eid:
|
||||
print(f" {msg_id}: No envelope ID, skipping")
|
||||
continue
|
||||
|
||||
success = execute_action(eid, action)
|
||||
if success:
|
||||
# Record decision for future learning
|
||||
decision_store.record_decision(data, action, source="user")
|
||||
|
||||
# Mark as done in pending queue
|
||||
pending = load_pending()
|
||||
pending[msg_id]["status"] = "done"
|
||||
pending[msg_id]["action"] = action
|
||||
pending[msg_id]["processed_at"] = datetime.now().isoformat()
|
||||
save_pending(pending)
|
||||
|
||||
log_result(log_file, data, f"REVIEW:{action}", data.get("reason", ""))
|
||||
print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})")
|
||||
else:
|
||||
log_result(log_file, data, f"REVIEW_FAILED:{action}", data.get("reason", ""))
|
||||
print(f" {msg_id}: {action} -> FAILED")
|
||||
|
||||
|
||||
def cmd_review_accept():
|
||||
"""Accept all classifier suggestions for pending emails.
|
||||
|
||||
For each pending email, executes the suggested_action that the
|
||||
classifier assigned during scan. Records each as a "user" decision
|
||||
since the user explicitly chose to accept.
|
||||
"""
|
||||
sorted_items = _get_pending_items()
|
||||
if not sorted_items:
|
||||
print("No pending emails to review.")
|
||||
return
|
||||
|
||||
LOGS_DIR.mkdir(exist_ok=True)
|
||||
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||
|
||||
for msg_id, data in sorted_items:
|
||||
action = data.get("suggested_action")
|
||||
if not action:
|
||||
print(f" {msg_id}: No suggestion, skipping")
|
||||
continue
|
||||
|
||||
eid = data.get("envelope_id") or data.get("imap_uid")
|
||||
if not eid:
|
||||
print(f" {msg_id}: No envelope ID, skipping")
|
||||
continue
|
||||
|
||||
success = execute_action(eid, action)
|
||||
if success:
|
||||
decision_store.record_decision(data, action, source="user")
|
||||
|
||||
pending = load_pending()
|
||||
pending[msg_id]["status"] = "done"
|
||||
pending[msg_id]["action"] = action
|
||||
pending[msg_id]["processed_at"] = datetime.now().isoformat()
|
||||
save_pending(pending)
|
||||
|
||||
log_result(log_file, data, f"ACCEPT:{action}", data.get("reason", ""))
|
||||
print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})")
|
||||
else:
|
||||
log_result(log_file, data, f"ACCEPT_FAILED:{action}", data.get("reason", ""))
|
||||
print(f" {msg_id}: {action} -> FAILED")
|
||||
|
||||
|
||||
def _resolve_target(selector, sorted_items):
|
||||
"""Resolve a selector (number or msg_id) to a (msg_id, data) tuple.
|
||||
|
||||
Returns None and prints an error if the selector is invalid.
|
||||
"""
|
||||
# Try as 1-based index
|
||||
try:
|
||||
idx = int(selector) - 1
|
||||
if 0 <= idx < len(sorted_items):
|
||||
return sorted_items[idx]
|
||||
else:
|
||||
print(f"Invalid number. Range: 1-{len(sorted_items)}")
|
||||
return None
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Try as msg_id
|
||||
for msg_id, data in sorted_items:
|
||||
if msg_id == selector:
|
||||
return (msg_id, data)
|
||||
|
||||
print(f"Not found: {selector}")
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand: stats
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def cmd_stats():
|
||||
"""Print a summary of the decision history.
|
||||
|
||||
Shows total decisions, user vs. auto breakdown, action distribution,
|
||||
top sender domains, and custom labels.
|
||||
"""
|
||||
stats = decision_store.get_all_stats()
|
||||
|
||||
if not stats:
|
||||
print("No decision history yet.")
|
||||
print("Run 'python main.py scan' and 'python main.py review' to build history.")
|
||||
return
|
||||
|
||||
print("Decision History Stats")
|
||||
print("=" * 50)
|
||||
print(f"Total decisions: {stats['total']}")
|
||||
|
||||
# User vs. auto breakdown
|
||||
print(f"\nBy source:")
|
||||
for source, count in sorted(stats["by_source"].items()):
|
||||
pct = count / stats["total"] * 100
|
||||
print(f" {source}: {count} ({pct:.0f}%)")
|
||||
|
||||
auto = stats["by_source"].get("auto", 0)
|
||||
if stats["total"] > 0:
|
||||
print(f" Automation rate: {auto / stats['total'] * 100:.0f}%")
|
||||
|
||||
# Action distribution
|
||||
print(f"\nBy action:")
|
||||
for action, count in sorted(stats["by_action"].items(), key=lambda x: -x[1]):
|
||||
print(f" {action}: {count}")
|
||||
|
||||
# Top sender domains with per-domain action counts
|
||||
print(f"\nTop sender domains:")
|
||||
for domain, count in stats["top_domains"]:
|
||||
domain_stats = decision_store.get_sender_stats(domain)
|
||||
detail = ", ".join(
|
||||
f"{a}:{c}" for a, c in sorted(domain_stats.items(), key=lambda x: -x[1])
|
||||
)
|
||||
print(f" {domain}: {count} ({detail})")
|
||||
|
||||
# Custom labels
|
||||
labels = decision_store.get_known_labels()
|
||||
if labels:
|
||||
print(f"\nKnown labels: {', '.join(sorted(labels))}")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand: migrate
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def cmd_migrate():
|
||||
"""Import old pending_emails.json 'done' entries into decision history.
|
||||
|
||||
Run once after upgrading from the old system. Converts old action
|
||||
names (archived/kept/deleted) to new ones (archive/keep/delete).
|
||||
"""
|
||||
decision_store.migrate_pending()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Entry point & argument parsing
|
||||
#
|
||||
# Simple hand-rolled parser — no external dependencies. Supports:
|
||||
# main.py [subcommand] [--recent N] [--dry-run] [review-args...]
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
args = sys.argv[1:]
|
||||
subcommand = "scan"
|
||||
recent = None
|
||||
dry_run = False
|
||||
extra_args = [] # for review subcommand arguments
|
||||
|
||||
# Parse args
|
||||
i = 0
|
||||
while i < len(args):
|
||||
if args[i] == "--recent" and i + 1 < len(args):
|
||||
recent = int(args[i + 1])
|
||||
i += 2
|
||||
elif args[i] == "--dry-run":
|
||||
dry_run = True
|
||||
i += 1
|
||||
elif not args[i].startswith("--") and subcommand == "scan" and not extra_args:
|
||||
# First positional arg is the subcommand
|
||||
subcommand = args[i]
|
||||
i += 1
|
||||
elif not args[i].startswith("--"):
|
||||
# Remaining positional args go to the subcommand
|
||||
extra_args.append(args[i])
|
||||
i += 1
|
||||
else:
|
||||
print(f"Unknown flag: {args[i]}")
|
||||
sys.exit(1)
|
||||
|
||||
config = load_config()
|
||||
|
||||
if subcommand == "scan":
|
||||
cmd_scan(config, recent=recent, dry_run=dry_run)
|
||||
|
||||
elif subcommand == "review":
|
||||
if not extra_args or extra_args[0] == "list":
|
||||
cmd_review_list()
|
||||
elif extra_args[0] == "accept":
|
||||
cmd_review_accept()
|
||||
elif len(extra_args) == 2:
|
||||
cmd_review_act(extra_args[0], extra_args[1])
|
||||
else:
|
||||
print("Usage:")
|
||||
print(" python main.py review list")
|
||||
print(" python main.py review <number-or-id> <action>")
|
||||
print(" python main.py review all <action>")
|
||||
print(" python main.py review accept")
|
||||
sys.exit(1)
|
||||
|
||||
elif subcommand == "stats":
|
||||
cmd_stats()
|
||||
|
||||
elif subcommand == "migrate":
|
||||
cmd_migrate()
|
||||
|
||||
else:
|
||||
print(f"Unknown subcommand: {subcommand}")
|
||||
print("Usage: python main.py [scan|review|stats|migrate] [--recent N] [--dry-run]")
|
||||
sys.exit(1)
|
||||
|
||||
@@ -1,28 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Move specific email to trash"""
|
||||
import imaplib
|
||||
import email
|
||||
|
||||
# Connect
|
||||
mail = imaplib.IMAP4_SSL('imap.migadu.com', 993)
|
||||
mail.login('youlu@luyanxin.com', 'kDkNau2r7m.hV!uk*D4Yr8mC7Dyjx9T')
|
||||
mail.select('INBOX')
|
||||
|
||||
# Search for the email with "10% off" in subject
|
||||
_, search_data = mail.search(None, 'SUBJECT', '"10% off"')
|
||||
email_ids = search_data[0].split()
|
||||
|
||||
print(f"Found {len(email_ids)} emails with '10% off' in subject")
|
||||
|
||||
for email_id in email_ids:
|
||||
# Copy to Trash
|
||||
result = mail.copy(email_id, 'Trash')
|
||||
if result[0] == 'OK':
|
||||
mail.store(email_id, '+FLAGS', '\\Deleted')
|
||||
print(f"✅ Moved email {email_id.decode()} to Trash")
|
||||
else:
|
||||
print(f"❌ Failed to move email {email_id.decode()}")
|
||||
|
||||
mail.expunge()
|
||||
mail.logout()
|
||||
print("Done!")
|
||||
@@ -1,214 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Email Queue Processor - Handle user commands for pending emails
|
||||
Reads pending_emails.json and executes user commands (archive/keep/reply)
|
||||
"""
|
||||
|
||||
import json
|
||||
import imaplib
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
SCRIPT_DIR = Path(__file__).parent
|
||||
DATA_FILE = SCRIPT_DIR / "data" / "pending_emails.json"
|
||||
|
||||
def load_pending():
|
||||
"""Load pending emails from JSON file"""
|
||||
if not DATA_FILE.exists():
|
||||
return {}
|
||||
with open(DATA_FILE, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
def save_pending(pending):
|
||||
"""Save pending emails to JSON file"""
|
||||
DATA_FILE.parent.mkdir(exist_ok=True)
|
||||
with open(DATA_FILE, 'w', encoding='utf-8') as f:
|
||||
json.dump(pending, f, indent=2, ensure_ascii=False)
|
||||
|
||||
def connect_imap(config):
|
||||
"""Connect to IMAP server"""
|
||||
mail = imaplib.IMAP4_SSL(config['imap']['host'], config['imap']['port'])
|
||||
mail.login(config['imap']['email'], config['imap']['password'])
|
||||
return mail
|
||||
|
||||
def show_pending_list():
|
||||
"""Display all pending emails"""
|
||||
pending = load_pending()
|
||||
|
||||
if not pending:
|
||||
print("📭 没有待处理的邮件")
|
||||
return
|
||||
|
||||
print(f"\n📧 待处理邮件列表 ({len(pending)} 封)")
|
||||
print("=" * 60)
|
||||
|
||||
# Sort by email_date
|
||||
sorted_items = sorted(
|
||||
pending.items(),
|
||||
key=lambda x: x[1].get('email_date', '')
|
||||
)
|
||||
|
||||
for msg_id, data in sorted_items:
|
||||
if data.get('status') == 'pending':
|
||||
print(f"\n🆔 {msg_id}")
|
||||
print(f" 主题: {data.get('subject', 'N/A')[:50]}")
|
||||
print(f" 发件人: {data.get('sender', 'N/A')}")
|
||||
print(f" 收件人: {data.get('recipient', 'N/A')}")
|
||||
print(f" 时间: {data.get('email_date', 'N/A')}")
|
||||
print(f" 摘要: {data.get('summary', 'N/A')[:80]}")
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("\n可用指令:")
|
||||
print(" • 归档 [ID] - 移动到 Archive 文件夹")
|
||||
print(" • 保留 [ID] - 标记已读,留在收件箱")
|
||||
print(" • 删除 [ID] - 移动到 Trash")
|
||||
print(" • 全部处理 - 列出所有并批量操作")
|
||||
|
||||
def archive_email(config, msg_id):
|
||||
"""Archive a specific email by ID"""
|
||||
pending = load_pending()
|
||||
|
||||
if msg_id not in pending:
|
||||
print(f"❌ 未找到邮件 ID: {msg_id}")
|
||||
return False
|
||||
|
||||
email_data = pending[msg_id]
|
||||
uid = email_data.get('imap_uid')
|
||||
|
||||
if not uid:
|
||||
print(f"❌ 邮件 {msg_id} 没有 UID")
|
||||
return False
|
||||
|
||||
try:
|
||||
mail = connect_imap(config)
|
||||
mail.select('INBOX')
|
||||
|
||||
# Copy to Archive
|
||||
result = mail.copy(uid, 'Archive')
|
||||
if result[0] == 'OK':
|
||||
# Mark original as deleted
|
||||
mail.store(uid, '+FLAGS', '\\Deleted')
|
||||
mail.expunge()
|
||||
|
||||
# Update status
|
||||
pending[msg_id]['status'] = 'done'
|
||||
pending[msg_id]['action'] = 'archived'
|
||||
pending[msg_id]['processed_at'] = datetime.now().isoformat()
|
||||
save_pending(pending)
|
||||
|
||||
print(f"✅ 已归档: {email_data.get('subject', 'N/A')[:40]}")
|
||||
return True
|
||||
else:
|
||||
print(f"❌ 归档失败: {result}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {e}")
|
||||
return False
|
||||
finally:
|
||||
try:
|
||||
mail.logout()
|
||||
except:
|
||||
pass
|
||||
|
||||
def keep_email(config, msg_id):
|
||||
"""Keep email in inbox, mark as read"""
|
||||
pending = load_pending()
|
||||
|
||||
if msg_id not in pending:
|
||||
print(f"❌ 未找到邮件 ID: {msg_id}")
|
||||
return False
|
||||
|
||||
email_data = pending[msg_id]
|
||||
uid = email_data.get('imap_uid')
|
||||
|
||||
if not uid:
|
||||
print(f"❌ 邮件 {msg_id} 没有 UID")
|
||||
return False
|
||||
|
||||
try:
|
||||
mail = connect_imap(config)
|
||||
mail.select('INBOX')
|
||||
|
||||
# Mark as read (Seen)
|
||||
mail.store(uid, '+FLAGS', '\\Seen')
|
||||
|
||||
# Update status
|
||||
pending[msg_id]['status'] = 'done'
|
||||
pending[msg_id]['action'] = 'kept'
|
||||
pending[msg_id]['processed_at'] = datetime.now().isoformat()
|
||||
save_pending(pending)
|
||||
|
||||
print(f"✅ 已保留: {email_data.get('subject', 'N/A')[:40]}")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {e}")
|
||||
return False
|
||||
finally:
|
||||
try:
|
||||
mail.logout()
|
||||
except:
|
||||
pass
|
||||
|
||||
def delete_email(config, msg_id):
|
||||
"""Move email to Trash"""
|
||||
pending = load_pending()
|
||||
|
||||
if msg_id not in pending:
|
||||
print(f"❌ 未找到邮件 ID: {msg_id}")
|
||||
return False
|
||||
|
||||
email_data = pending[msg_id]
|
||||
uid = email_data.get('imap_uid')
|
||||
|
||||
if not uid:
|
||||
print(f"❌ 邮件 {msg_id} 没有 UID")
|
||||
return False
|
||||
|
||||
try:
|
||||
mail = connect_imap(config)
|
||||
mail.select('INBOX')
|
||||
|
||||
# Copy to Trash
|
||||
result = mail.copy(uid, 'Trash')
|
||||
if result[0] == 'OK':
|
||||
mail.store(uid, '+FLAGS', '\\Deleted')
|
||||
mail.expunge()
|
||||
|
||||
# Update status
|
||||
pending[msg_id]['status'] = 'done'
|
||||
pending[msg_id]['action'] = 'deleted'
|
||||
pending[msg_id]['processed_at'] = datetime.now().isoformat()
|
||||
save_pending(pending)
|
||||
|
||||
print(f"✅ 已删除: {email_data.get('subject', 'N/A')[:40]}")
|
||||
return True
|
||||
else:
|
||||
print(f"❌ 删除失败: {result}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {e}")
|
||||
return False
|
||||
finally:
|
||||
try:
|
||||
mail.logout()
|
||||
except:
|
||||
pass
|
||||
|
||||
def main():
|
||||
"""Main function - show pending list"""
|
||||
import json
|
||||
|
||||
# Load config
|
||||
config_file = Path(__file__).parent / "config.json"
|
||||
with open(config_file) as f:
|
||||
config = json.load(f)
|
||||
|
||||
show_pending_list()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,38 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Test single email analysis"""
|
||||
import requests
|
||||
import json
|
||||
|
||||
email_data = {
|
||||
"subject": "Fwd: Get 10% off your next order 🎉",
|
||||
"sender": "crac1017@hotmail.com",
|
||||
"body": "Get 10% off your next order! Limited time offer. Shop now and save!"
|
||||
}
|
||||
|
||||
prompt = f"""Analyze this email and determine if it's an advertisement/promotional email.
|
||||
|
||||
Subject: {email_data['subject']}
|
||||
Sender: {email_data['sender']}
|
||||
Body preview: {email_data['body'][:200]}
|
||||
|
||||
Is this an advertisement or promotional email? Answer with ONLY:
|
||||
- "AD: [brief reason]" if it's an ad/promo
|
||||
- "KEEP: [brief reason]" if it's important/legitimate
|
||||
|
||||
Be conservative - only mark as AD if clearly promotional."""
|
||||
|
||||
print("Sending to Qwen3...")
|
||||
try:
|
||||
response = requests.post(
|
||||
"http://localhost:11434/api/generate",
|
||||
json={
|
||||
"model": "qwen3:4b",
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
},
|
||||
timeout=120
|
||||
)
|
||||
result = response.json()
|
||||
print(f"Result: {result.get('response', 'error')}")
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
@@ -1 +1 @@
|
||||
python3
|
||||
python3.13
|
||||
@@ -1 +1 @@
|
||||
/usr/bin/python3
|
||||
python3.13
|
||||
@@ -1 +0,0 @@
|
||||
python3
|
||||
@@ -1 +0,0 @@
|
||||
lib
|
||||
@@ -1,5 +1,5 @@
|
||||
home = /usr/bin
|
||||
home = /opt/homebrew/opt/python@3.13/bin
|
||||
include-system-site-packages = false
|
||||
version = 3.12.3
|
||||
executable = /usr/bin/python3.12
|
||||
command = /usr/bin/python3 -m venv /home/lyx/.openclaw/workspace/scripts/email_processor/venv
|
||||
version = 3.13.0
|
||||
executable = /opt/homebrew/Cellar/python@3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/bin/python3.13
|
||||
command = /opt/homebrew/opt/python@3.13/bin/python3.13 -m venv /Users/ylu/Documents/me/youlu-openclaw-workspace/scripts/email_processor/venv
|
||||
|
||||
Reference in New Issue
Block a user