email processor
This commit is contained in:
3
scripts/email_processor/.gitignore
vendored
Normal file
3
scripts/email_processor/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
venv
|
||||||
233
scripts/email_processor/README.md
Normal file
233
scripts/email_processor/README.md
Normal file
@@ -0,0 +1,233 @@
|
|||||||
|
# Email Processor
|
||||||
|
|
||||||
|
Learning-based mailbox cleanup using Himalaya (IMAP) + Ollama (local LLM). Classifies emails, learns from your decisions over time, and gradually automates common actions.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- **Himalaya** — CLI email client, handles IMAP connection and auth.
|
||||||
|
- **Ollama** — local LLM server.
|
||||||
|
- **Python 3.8+**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install himalaya (macOS)
|
||||||
|
brew install himalaya
|
||||||
|
|
||||||
|
# Configure himalaya for your IMAP account (first time only)
|
||||||
|
himalaya account list # should show your account after setup
|
||||||
|
|
||||||
|
# Install and start Ollama, pull the model
|
||||||
|
brew install ollama
|
||||||
|
ollama pull kamekichi128/qwen3-4b-instruct-2507:latest
|
||||||
|
|
||||||
|
# Set up Python venv and install dependencies
|
||||||
|
cd scripts/email_processor
|
||||||
|
python3 -m venv venv
|
||||||
|
source venv/bin/activate
|
||||||
|
pip install ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
The system has two phases: a **learning phase** where it builds up knowledge from your decisions, and a **steady state** where it handles most emails automatically.
|
||||||
|
|
||||||
|
### Learning Phase (first ~20 decisions)
|
||||||
|
|
||||||
|
The confidence threshold is automatically raised to 95%. Most emails get queued.
|
||||||
|
|
||||||
|
1. **Cron runs `scan`.** For each unseen email, the classifier uses Qwen3's general knowledge (no history yet) to suggest an action. Most come back at 60-80% confidence — below the 95% threshold — so they get saved to `pending_emails.json` with the suggestion attached. A few obvious spam emails might hit 95%+ and get auto-deleted.
|
||||||
|
|
||||||
|
2. **You run `review list`.** It prints what's pending:
|
||||||
|
```
|
||||||
|
1. [msg_f1d43ea3] Subject: New jobs matching your profile
|
||||||
|
From: LinkedIn Suggested: delete (82%)
|
||||||
|
2. [msg_60c56a87] Subject: Your order shipped
|
||||||
|
From: Amazon Suggested: archive (78%)
|
||||||
|
3. [msg_ebd24205] Subject: Meeting tomorrow at 3pm
|
||||||
|
From: Coworker Suggested: keep (70%)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **You act on them.** Either individually or in bulk:
|
||||||
|
```bash
|
||||||
|
./email-processor.sh review 1 delete # agree with suggestion
|
||||||
|
./email-processor.sh review 2 archive # agree with suggestion
|
||||||
|
./email-processor.sh review accept # accept all suggestions at once
|
||||||
|
```
|
||||||
|
Each command executes via himalaya, appends to `decision_history.json`, and marks the pending entry as done.
|
||||||
|
|
||||||
|
4. **Next scan is smarter.** The classifier now has few-shot examples in the prompt:
|
||||||
|
```
|
||||||
|
History for linkedin.com: delete 2 times
|
||||||
|
--- Past decisions ---
|
||||||
|
From: LinkedIn | Subject: New jobs matching your profile -> delete
|
||||||
|
From: Amazon | Subject: Your package delivered -> archive
|
||||||
|
---
|
||||||
|
```
|
||||||
|
Confidence scores climb. You keep reviewing. History grows.
|
||||||
|
|
||||||
|
### Steady State (20+ decisions)
|
||||||
|
|
||||||
|
The threshold drops to the configured 75%. The classifier has rich context.
|
||||||
|
|
||||||
|
- **Repeat senders** (LinkedIn, Amazon, Uber) get auto-acted at 85-95% confidence during `scan`. They never touch the pending queue.
|
||||||
|
- **New or ambiguous senders** may fall below 75% and get queued.
|
||||||
|
- **You occasionally run `review list`** to handle stragglers — each decision further improves future classifications.
|
||||||
|
- **`stats` shows your automation rate** climbing: 60%, 70%, 80%+.
|
||||||
|
|
||||||
|
The pending queue shrinks over time. It's not a backlog — it's an ever-narrowing set of emails the system hasn't learned to handle yet.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
All commands are non-interactive — they take arguments, act, and exit. Compatible with cron/OpenClaw.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Make the entry script executable (first time)
|
||||||
|
chmod +x email-processor.sh
|
||||||
|
|
||||||
|
# --- Scan ---
|
||||||
|
./email-processor.sh scan # classify unseen emails
|
||||||
|
./email-processor.sh scan --recent 30 # classify last 30 days
|
||||||
|
./email-processor.sh scan --dry-run # classify only, no changes
|
||||||
|
./email-processor.sh scan --recent 7 --dry-run # combine both
|
||||||
|
|
||||||
|
# --- Review ---
|
||||||
|
./email-processor.sh review list # show pending queue
|
||||||
|
./email-processor.sh review 1 delete # delete email #1
|
||||||
|
./email-processor.sh review msg_f1d43ea3 archive # archive by ID
|
||||||
|
./email-processor.sh review all delete # delete all pending
|
||||||
|
./email-processor.sh review accept # accept all suggestions
|
||||||
|
|
||||||
|
# --- Other ---
|
||||||
|
./email-processor.sh stats # show decision history
|
||||||
|
./email-processor.sh migrate # import old decisions
|
||||||
|
```
|
||||||
|
|
||||||
|
Or call Python directly: `python main.py scan --dry-run`
|
||||||
|
|
||||||
|
## Actions
|
||||||
|
|
||||||
|
| Action | Effect |
|
||||||
|
|---|---|
|
||||||
|
| `delete` | Move to Trash (`himalaya message delete`) |
|
||||||
|
| `archive` | Move to Archive folder |
|
||||||
|
| `keep` | Leave unread in inbox (no changes) |
|
||||||
|
| `mark_read` | Add `\Seen` flag, stays in inbox |
|
||||||
|
| `label:<name>` | Move to named folder (created if needed) |
|
||||||
|
|
||||||
|
## Auto-Action Criteria
|
||||||
|
|
||||||
|
Scan auto-acts when the classifier's confidence meets the threshold. During the learning phase (fewer than `bootstrap_min_decisions` total decisions, default 20), a higher threshold of 95% is used automatically. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over.
|
||||||
|
|
||||||
|
This means on day one, only very obvious emails (spam, clear promotions) get auto-acted. As you review emails and build history, the system gradually handles more on its own.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
`config.json` — only Ollama and automation settings. IMAP auth is managed by himalaya's own config.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ollama": {
|
||||||
|
"host": "http://localhost:11434",
|
||||||
|
"model": "kamekichi128/qwen3-4b-instruct-2507:latest"
|
||||||
|
},
|
||||||
|
"rules": {
|
||||||
|
"max_body_length": 1000
|
||||||
|
},
|
||||||
|
"automation": {
|
||||||
|
"confidence_threshold": 75,
|
||||||
|
"bootstrap_min_decisions": 20
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
| Key | Description |
|
||||||
|
|---|---|
|
||||||
|
| `ollama.host` | Ollama server URL. Default `http://localhost:11434`. |
|
||||||
|
| `ollama.model` | Ollama model to use for classification. |
|
||||||
|
| `rules.max_body_length` | Max characters of email body sent to the LLM. Longer bodies are truncated. Keeps prompt size and latency down. |
|
||||||
|
| `automation.confidence_threshold` | Minimum confidence (0-100) for auto-action in steady state. Emails below this get queued for review. Lower = more automation, higher = more manual review. |
|
||||||
|
| `automation.bootstrap_min_decisions` | Number of decisions needed before leaving the learning phase. During the learning phase, the threshold is raised to 95% regardless of `confidence_threshold`. Set to 0 to skip the learning phase entirely. |
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Verify himalaya can reach your mailbox
|
||||||
|
himalaya envelope list --page-size 3
|
||||||
|
|
||||||
|
# 2. Verify Ollama is running with the model
|
||||||
|
ollama list # should show kamekichi128/qwen3-4b-instruct-2507:latest
|
||||||
|
|
||||||
|
# 3. Dry run — classify recent emails without touching anything
|
||||||
|
./email-processor.sh scan --recent 7 --dry-run
|
||||||
|
|
||||||
|
# 4. Live run — classify and act (auto-act or queue)
|
||||||
|
./email-processor.sh scan --recent 7
|
||||||
|
|
||||||
|
# 5. Check what got queued
|
||||||
|
./email-processor.sh review list
|
||||||
|
|
||||||
|
# 6. Act on a queued email to seed decision history
|
||||||
|
./email-processor.sh review 1 delete
|
||||||
|
|
||||||
|
# 7. Check that the decision was recorded
|
||||||
|
./email-processor.sh stats
|
||||||
|
```
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
email_processor/
|
||||||
|
main.py # Entry point — scan/review/stats/migrate subcommands
|
||||||
|
classifier.py # LLM prompt builder + response parser
|
||||||
|
decision_store.py # Decision history storage + few-shot retrieval
|
||||||
|
config.json # Ollama + automation settings
|
||||||
|
email-processor.sh # Shell wrapper (activates venv, forwards args)
|
||||||
|
data/
|
||||||
|
pending_emails.json # Queue of emails awaiting review
|
||||||
|
decision_history.json # Past decisions (few-shot learning data)
|
||||||
|
logs/
|
||||||
|
YYYY-MM-DD.log # Daily processing logs
|
||||||
|
```
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
### Himalaya instead of raw IMAP
|
||||||
|
|
||||||
|
All IMAP operations go through the `himalaya` CLI via subprocess calls. This means:
|
||||||
|
- No IMAP credentials stored in config.json — himalaya manages its own auth.
|
||||||
|
- No connection management, reconnect logic, or SSL setup in Python.
|
||||||
|
- Each action is a single himalaya command (e.g., `himalaya message delete 42`).
|
||||||
|
|
||||||
|
The tradeoff is a subprocess spawn per operation, but for email volumes (tens per run, not thousands) this is negligible.
|
||||||
|
|
||||||
|
### Non-interactive design
|
||||||
|
|
||||||
|
Every command takes its full input as arguments, acts, and exits. No `input()` calls, no interactive loops. This makes the system compatible with cron/OpenClaw and composable with other scripts. The pending queue on disk (`pending_emails.json`) is the shared state between scan and review invocations.
|
||||||
|
|
||||||
|
### decision_history.json as the "database"
|
||||||
|
|
||||||
|
`data/decision_history.json` is the only persistent state that matters for learning. It's a flat JSON array — every decision (user or auto) is appended as an entry. The classifier reads the whole file on each email to find relevant few-shot examples via relevance scoring.
|
||||||
|
|
||||||
|
The pending queue (`pending_emails.json`) is transient — emails pass through it and get marked "done". Logs are for debugging. The decision history is what the system learns from.
|
||||||
|
|
||||||
|
A flat JSON file works fine for hundreds or low thousands of decisions. SQLite would make sense if the history grows past ~10k entries and the linear scan becomes noticeable, or if concurrent writes from multiple processes become necessary. Neither applies at current scale.
|
||||||
|
|
||||||
|
### Few-shot learning via relevance scoring
|
||||||
|
|
||||||
|
Rather than sending the entire decision history to the LLM, `decision_store.get_relevant_examples()` scores each past decision against the current email using three signals:
|
||||||
|
- Exact sender domain match (+3 points)
|
||||||
|
- Recipient address match (+2 points)
|
||||||
|
- Subject keyword overlap (+1 per shared word, stop-words excluded)
|
||||||
|
|
||||||
|
The top 5 most relevant examples are injected into the prompt as few-shot demonstrations. This keeps the prompt small while giving the model the most useful context.
|
||||||
|
|
||||||
|
### Conservative auto-action
|
||||||
|
|
||||||
|
Auto-action uses a single confidence threshold with an adaptive learning phase. When the decision history has fewer than `bootstrap_min_decisions` (default 20) entries, the threshold is raised to 95% — only very obvious classifications get auto-acted. Once enough history accumulates, the configured `confidence_threshold` (default 75%) takes over. This lets the system start working from day one while being cautious until it has enough examples to learn from.
|
||||||
|
|
||||||
|
### `keep` means unread
|
||||||
|
|
||||||
|
The `keep` action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from `mark_read`, which dismisses low-priority emails without moving them. During scan, queued emails are marked as read to prevent re-processing, but that's a scan-level concern separate from the `keep` action itself.
|
||||||
|
|
||||||
|
### Fail-safe classification
|
||||||
|
|
||||||
|
If the LLM call fails (Ollama down, model not loaded, timeout), the classifier returns `action="keep"` with `confidence=0`. This guarantees the email gets queued for manual review rather than being auto-acted upon. The system never auto-trashes an email it couldn't classify.
|
||||||
191
scripts/email_processor/classifier.py
Normal file
191
scripts/email_processor/classifier.py
Normal file
@@ -0,0 +1,191 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Classifier - LLM-based email classification with learning.
|
||||||
|
|
||||||
|
This module builds a rich prompt for the local Ollama model (Qwen3) that
|
||||||
|
includes few-shot examples from past user decisions, per-sender statistics,
|
||||||
|
and a list of known labels. The model returns a structured response with
|
||||||
|
an action, confidence score, summary, and reason.
|
||||||
|
|
||||||
|
The prompt structure:
|
||||||
|
1. System instructions (action definitions)
|
||||||
|
2. Known labels (so the model reuses them)
|
||||||
|
3. Sender statistics ("linkedin.com: deleted 8 times, kept 2 times")
|
||||||
|
4. Few-shot examples (top 5 most relevant past decisions)
|
||||||
|
5. The email to classify (subject, sender, recipient, body preview)
|
||||||
|
6. Output format specification
|
||||||
|
"""
|
||||||
|
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import decision_store
|
||||||
|
|
||||||
|
LOGS_DIR = Path(__file__).parent / "logs"
|
||||||
|
|
||||||
|
|
||||||
|
def _build_prompt(email_data, config):
|
||||||
|
"""Assemble the full classification prompt with learning context.
|
||||||
|
|
||||||
|
The prompt is built in sections, each providing different context to
|
||||||
|
help the model make better decisions. Sections are omitted when there
|
||||||
|
is no relevant data (e.g., no history yet for a new sender).
|
||||||
|
"""
|
||||||
|
max_body = config.get("rules", {}).get("max_body_length", 1000)
|
||||||
|
|
||||||
|
# Gather learning context from decision history
|
||||||
|
examples = decision_store.get_relevant_examples(email_data, n=10)
|
||||||
|
sender_domain = decision_store._extract_domain(email_data.get("sender", ""))
|
||||||
|
sender_stats = decision_store.get_sender_stats(sender_domain) if sender_domain else {}
|
||||||
|
known_labels = decision_store.get_known_labels()
|
||||||
|
|
||||||
|
# /no_think disables Qwen3's chain-of-thought, giving faster + shorter output
|
||||||
|
parts = ["/no_think\n"]
|
||||||
|
|
||||||
|
# Section 1: Action definitions
|
||||||
|
parts.append(
|
||||||
|
"You are an email classifier. Classify the email into one of these actions:\n"
|
||||||
|
"- delete: Spam, ads, promotions, unwanted notifications\n"
|
||||||
|
"- archive: Informational emails worth keeping but not needing attention "
|
||||||
|
"(receipts, shipping updates, automated confirmations)\n"
|
||||||
|
"- keep: Important emails that need attention or action (left unread in inbox)\n"
|
||||||
|
"- mark_read: Low-priority, leave in inbox but mark as read\n"
|
||||||
|
"- label:<name>: Categorize with a specific label\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Section 2: Known labels (helps model reuse instead of inventing)
|
||||||
|
if known_labels:
|
||||||
|
parts.append(f"\nLabels used before: {', '.join(sorted(known_labels))}\n")
|
||||||
|
|
||||||
|
# Section 3: Sender statistics (strong signal for repeat senders)
|
||||||
|
if sender_stats:
|
||||||
|
stats_str = ", ".join(
|
||||||
|
f"{action} {count} times" for action, count in sender_stats.items()
|
||||||
|
)
|
||||||
|
parts.append(f"\nHistory for {sender_domain}: {stats_str}\n")
|
||||||
|
|
||||||
|
# Section 4: Few-shot examples (top 5 most relevant past decisions)
|
||||||
|
if examples:
|
||||||
|
parts.append("\n--- Past decisions (learn from these) ---")
|
||||||
|
for ex in examples[:5]:
|
||||||
|
parts.append(
|
||||||
|
f"From: {ex['sender'][:60]} | To: {ex['recipient'][:40]} | "
|
||||||
|
f"Subject: {ex['subject'][:60]} -> {ex['action']}"
|
||||||
|
)
|
||||||
|
parts.append("--- End examples ---\n")
|
||||||
|
|
||||||
|
# Section 5: The email being classified
|
||||||
|
body_preview = email_data.get("body", "")[:max_body]
|
||||||
|
parts.append(
|
||||||
|
f"Now classify this email:\n"
|
||||||
|
f"Subject: {email_data.get('subject', '(No Subject)')}\n"
|
||||||
|
f"From: {email_data.get('sender', '(Unknown)')}\n"
|
||||||
|
f"To: {email_data.get('recipient', '(Unknown)')}\n"
|
||||||
|
f"Body: {body_preview}\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Section 6: Required output format
|
||||||
|
parts.append(
|
||||||
|
"Respond in this exact format (nothing else):\n"
|
||||||
|
"Action: [delete|archive|keep|mark_read|label:<name>]\n"
|
||||||
|
"Confidence: [0-100]\n"
|
||||||
|
"Summary: [one sentence summary of the email]\n"
|
||||||
|
"Reason: [brief explanation for your classification]"
|
||||||
|
)
|
||||||
|
|
||||||
|
return "\n".join(parts)
|
||||||
|
|
||||||
|
|
||||||
|
def _log_llm(prompt, output, email_data, action, confidence, duration):
|
||||||
|
"""Log the full LLM prompt and response to logs/llm_YYYY-MM-DD.log."""
|
||||||
|
LOGS_DIR.mkdir(exist_ok=True)
|
||||||
|
log_file = LOGS_DIR / f"llm_{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||||
|
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
subject = email_data.get("subject", "(No Subject)")[:60]
|
||||||
|
sender = email_data.get("sender", "(Unknown)")[:60]
|
||||||
|
|
||||||
|
with open(log_file, "a", encoding="utf-8") as f:
|
||||||
|
f.write(f"{'=' * 70}\n")
|
||||||
|
f.write(f"[{timestamp}] {subject}\n")
|
||||||
|
f.write(f"From: {sender} | Result: {action} @ {confidence}% | {duration:.1f}s\n")
|
||||||
|
f.write(f"{'-' * 70}\n")
|
||||||
|
f.write(f"PROMPT:\n{prompt}\n")
|
||||||
|
f.write(f"{'-' * 70}\n")
|
||||||
|
f.write(f"RESPONSE:\n{output}\n")
|
||||||
|
f.write(f"{'=' * 70}\n\n")
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_response(output):
|
||||||
|
"""Parse the model's text response into structured fields.
|
||||||
|
|
||||||
|
Expected format (one per line):
|
||||||
|
Action: delete
|
||||||
|
Confidence: 92
|
||||||
|
Summary: Promotional offer from retailer
|
||||||
|
Reason: Clearly a marketing email with discount offer
|
||||||
|
|
||||||
|
Falls back to safe defaults (keep, 50% confidence) on parse failure.
|
||||||
|
"""
|
||||||
|
action = "keep"
|
||||||
|
confidence = 50
|
||||||
|
summary = "No summary"
|
||||||
|
reason = "Unknown"
|
||||||
|
|
||||||
|
for line in output.strip().split("\n"):
|
||||||
|
line = line.strip()
|
||||||
|
if line.startswith("Action:"):
|
||||||
|
raw_action = line.replace("Action:", "").strip().lower()
|
||||||
|
valid_actions = {"delete", "archive", "keep", "mark_read"}
|
||||||
|
if raw_action in valid_actions or raw_action.startswith("label:"):
|
||||||
|
action = raw_action
|
||||||
|
elif line.startswith("Confidence:"):
|
||||||
|
try:
|
||||||
|
confidence = int(line.replace("Confidence:", "").strip().rstrip("%"))
|
||||||
|
confidence = max(0, min(100, confidence)) # clamp to 0-100
|
||||||
|
except ValueError:
|
||||||
|
confidence = 50
|
||||||
|
elif line.startswith("Summary:"):
|
||||||
|
summary = line.replace("Summary:", "").strip()[:200]
|
||||||
|
elif line.startswith("Reason:"):
|
||||||
|
reason = line.replace("Reason:", "").strip()
|
||||||
|
|
||||||
|
return action, confidence, summary, reason
|
||||||
|
|
||||||
|
|
||||||
|
def classify_email(email_data, config):
|
||||||
|
"""Classify an email using the local LLM with few-shot learning context.
|
||||||
|
|
||||||
|
Connects to Ollama, sends the assembled prompt, and parses the response.
|
||||||
|
On any error, falls back to "keep" with 0% confidence so the email
|
||||||
|
gets queued for manual review rather than auto-acted upon.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
email_data: dict with subject, sender, recipient, body keys.
|
||||||
|
config: full config dict (needs ollama.model and rules.max_body_length).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (action, confidence, summary, reason, duration_seconds).
|
||||||
|
"""
|
||||||
|
import ollama
|
||||||
|
|
||||||
|
prompt = _build_prompt(email_data, config)
|
||||||
|
model = config.get("ollama", {}).get("model", "kamekichi128/qwen3-4b-instruct-2507:latest")
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
try:
|
||||||
|
# Low temperature for consistent classification
|
||||||
|
response = ollama.generate(model=model, prompt=prompt, options={"temperature": 0.1})
|
||||||
|
output = response["response"]
|
||||||
|
action, confidence, summary, reason = _parse_response(output)
|
||||||
|
except Exception as e:
|
||||||
|
# On failure, default to "keep" with 0 confidence -> always queued
|
||||||
|
output = f"ERROR: {e}"
|
||||||
|
action = "keep"
|
||||||
|
confidence = 0
|
||||||
|
summary = "Classification failed"
|
||||||
|
reason = f"error - {str(e)[:100]}"
|
||||||
|
|
||||||
|
duration = time.time() - start_time
|
||||||
|
_log_llm(prompt, output, email_data, action, confidence, duration)
|
||||||
|
return action, confidence, summary, reason, duration
|
||||||
@@ -1,16 +1,14 @@
|
|||||||
{
|
{
|
||||||
"imap": {
|
|
||||||
"host": "imap.migadu.com",
|
|
||||||
"port": 993,
|
|
||||||
"email": "youlu@luyanxin.com",
|
|
||||||
"password": "kDkNau2r7m.hV!uk*D4Yr8mC7Dyjx9T"
|
|
||||||
},
|
|
||||||
"ollama": {
|
"ollama": {
|
||||||
"host": "http://localhost:11434",
|
"host": "http://localhost:11434",
|
||||||
"model": "qwen3:4b"
|
"model": "kamekichi128/qwen3-4b-instruct-2507:latest"
|
||||||
},
|
},
|
||||||
"rules": {
|
"rules": {
|
||||||
"max_body_length": 1000,
|
"max_body_length": 1000,
|
||||||
"check_unseen_only": true
|
"check_unseen_only": true
|
||||||
|
},
|
||||||
|
"automation": {
|
||||||
|
"confidence_threshold": 75,
|
||||||
|
"bootstrap_min_decisions": 20
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,52 +0,0 @@
|
|||||||
{
|
|
||||||
"msg_f1d43ea3": {
|
|
||||||
"imap_uid": "2",
|
|
||||||
"subject": "Delivered: \"Voikinfo Bottom Gusset Bags...\"",
|
|
||||||
"sender": "\"Amazon.com - order-update(a)amazon.com\"\r\n <order-update_at_amazon_com_posyo@simplelogin.co>",
|
|
||||||
"recipient": "sho.amazon@ylu17.com",
|
|
||||||
"summary": "Your Amazon package (order #114-1496788-7649829) was delivered today to Argo, Los Angeles, CA and left near the front door or porch.",
|
|
||||||
"email_date": "Wed, 18 Feb 2026 04:15:24 +0000",
|
|
||||||
"status": "pending",
|
|
||||||
"found_at": "2026-02-18T16:18:42.347538"
|
|
||||||
},
|
|
||||||
"msg_60c56a87": {
|
|
||||||
"imap_uid": "3",
|
|
||||||
"subject": "=?UTF-8?b?5L2V5LiN5ruh6Laz6Ieq5bex55qE5Y+j6IW55LmL5qyy?=",
|
|
||||||
"sender": "\"Uber Eats - uber(a)uber.com\" <uber_at_uber_com_kjwzyhxn@simplelogin.co>",
|
|
||||||
"recipient": "uber@ylu17.com",
|
|
||||||
"summary": "Uber Eats has sent a notification that the user's order is ready for pickup.",
|
|
||||||
"email_date": "Wed, 18 Feb 2026 11:36:59 +0000",
|
|
||||||
"status": "pending",
|
|
||||||
"found_at": "2026-02-18T08:05:56.594842"
|
|
||||||
},
|
|
||||||
"msg_ebd24205": {
|
|
||||||
"imap_uid": "4",
|
|
||||||
"subject": "Your order has been shipped (or closed if combined/delivered).",
|
|
||||||
"sender": "\"cd(a)woodenswords.com\"\r\n <cd_at_woodenswords_com_xivwijojc@simplelogin.co>",
|
|
||||||
"recipient": "mail@luyx.org",
|
|
||||||
"summary": "This email confirms that your order has been shipped or closed (if combined/delivered).",
|
|
||||||
"email_date": "Wed, 18 Feb 2026 16:07:58 +0000",
|
|
||||||
"status": "pending",
|
|
||||||
"found_at": "2026-02-18T12:01:19.048091"
|
|
||||||
},
|
|
||||||
"msg_fa73b3bd": {
|
|
||||||
"imap_uid": "6",
|
|
||||||
"subject": "=?UTF-8?Q?Yanxin,_I=E2=80=99m_still_waiting_for_your_response?=",
|
|
||||||
"sender": "\"Arslan (via LinkedIn) - messages-noreply(a)linkedin.com\"\r\n <messages-noreply_at_linkedin_com_ajpnalmwp@simplelogin.co>",
|
|
||||||
"recipient": "Yanxin Lu <acc.linkedin@ylu17.com>",
|
|
||||||
"summary": "Arslan Ahmed, a Senior AI | ML | Full Stack Engineer from Ilford, invited you to connect on February 11, 2026 at 10:08 PM and is waiting for your response.",
|
|
||||||
"email_date": "Wed, 18 Feb 2026 18:53:45 +0000 (UTC)",
|
|
||||||
"status": "pending",
|
|
||||||
"found_at": "2026-02-18T12:04:34.602407"
|
|
||||||
},
|
|
||||||
"msg_59f23736": {
|
|
||||||
"imap_uid": "1",
|
|
||||||
"subject": "New Software Engineer jobs that match your profile",
|
|
||||||
"sender": "\"LinkedIn - jobs-noreply(a)linkedin.com\"\r\n <jobs-noreply_at_linkedin_com_zuwggfxh@simplelogin.co>",
|
|
||||||
"recipient": "Yanxin Lu <acc.linkedin@ylu17.com>",
|
|
||||||
"summary": "LinkedIn has notified the user of new software engineering jobs that match their profile and includes a link to update their top card.",
|
|
||||||
"email_date": "Wed, 18 Feb 2026 02:07:12 +0000 (UTC)",
|
|
||||||
"status": "pending",
|
|
||||||
"found_at": "2026-02-18T16:16:00.784822"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
253
scripts/email_processor/decision_store.py
Normal file
253
scripts/email_processor/decision_store.py
Normal file
@@ -0,0 +1,253 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Decision Store - Manages decision history for learning-based email classification.
|
||||||
|
|
||||||
|
This module persists every user and auto-made decision to a flat JSON file
|
||||||
|
(data/decision_history.json). Past decisions serve as few-shot examples
|
||||||
|
that are injected into the LLM prompt by classifier.py, enabling the
|
||||||
|
system to learn from user behavior over time.
|
||||||
|
|
||||||
|
Storage format: a JSON array of decision entries, each containing sender,
|
||||||
|
recipient, subject, summary, action taken, and whether it was a user or
|
||||||
|
auto decision.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import Counter
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Paths
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).parent
|
||||||
|
DATA_DIR = SCRIPT_DIR / "data"
|
||||||
|
HISTORY_FILE = DATA_DIR / "decision_history.json"
|
||||||
|
PENDING_FILE = DATA_DIR / "pending_emails.json"
|
||||||
|
|
||||||
|
# Stop-words excluded from subject keyword matching to reduce noise.
|
||||||
|
_STOP_WORDS = {"re", "fwd", "the", "a", "an", "is", "to", "for", "and", "or", "your", "you"}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Internal helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _load_history():
|
||||||
|
"""Load the full decision history list from disk."""
|
||||||
|
if not HISTORY_FILE.exists():
|
||||||
|
return []
|
||||||
|
with open(HISTORY_FILE, "r", encoding="utf-8") as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
|
||||||
|
def _save_history(history):
|
||||||
|
"""Write the full decision history list to disk."""
|
||||||
|
DATA_DIR.mkdir(exist_ok=True)
|
||||||
|
with open(HISTORY_FILE, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(history, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_domain(sender):
|
||||||
|
"""Extract the domain part from a sender string.
|
||||||
|
|
||||||
|
Handles formats like:
|
||||||
|
"Display Name <user@example.com>"
|
||||||
|
user@example.com
|
||||||
|
"""
|
||||||
|
match = re.search(r"[\w.+-]+@([\w.-]+)", sender)
|
||||||
|
return match.group(1).lower() if match else ""
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_email_address(sender):
|
||||||
|
"""Extract the full email address from a sender string."""
|
||||||
|
match = re.search(r"([\w.+-]+@[\w.-]+)", sender)
|
||||||
|
return match.group(1).lower() if match else sender.lower()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Public API
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def record_decision(email_data, action, source="user"):
|
||||||
|
"""Append a decision to the history file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
email_data: dict with keys: sender, recipient, subject, summary.
|
||||||
|
action: one of "delete", "archive", "keep", "mark_read",
|
||||||
|
or "label:<name>".
|
||||||
|
source: "user" (manual review) or "auto" (high-confidence).
|
||||||
|
"""
|
||||||
|
history = _load_history()
|
||||||
|
entry = {
|
||||||
|
"timestamp": datetime.now().isoformat(timespec="seconds"),
|
||||||
|
"sender": email_data.get("sender", ""),
|
||||||
|
"sender_domain": _extract_domain(email_data.get("sender", "")),
|
||||||
|
"recipient": email_data.get("recipient", ""),
|
||||||
|
"subject": email_data.get("subject", ""),
|
||||||
|
"summary": email_data.get("summary", ""),
|
||||||
|
"action": action,
|
||||||
|
"source": source,
|
||||||
|
}
|
||||||
|
history.append(entry)
|
||||||
|
_save_history(history)
|
||||||
|
return entry
|
||||||
|
|
||||||
|
|
||||||
|
def get_relevant_examples(email_data, n=10):
|
||||||
|
"""Find the N most relevant past decisions for a given email.
|
||||||
|
|
||||||
|
Relevance is scored by three signals:
|
||||||
|
- Exact sender domain match: +3 points
|
||||||
|
- Recipient string match: +2 points
|
||||||
|
- Subject keyword overlap: +1 point per shared word
|
||||||
|
|
||||||
|
Only entries with score > 0 are considered. Results are returned
|
||||||
|
sorted by descending relevance.
|
||||||
|
"""
|
||||||
|
history = _load_history()
|
||||||
|
if not history:
|
||||||
|
return []
|
||||||
|
|
||||||
|
target_domain = _extract_domain(email_data.get("sender", ""))
|
||||||
|
target_recipient = email_data.get("recipient", "").lower()
|
||||||
|
target_words = (
|
||||||
|
set(re.findall(r"\w+", email_data.get("subject", "").lower())) - _STOP_WORDS
|
||||||
|
)
|
||||||
|
|
||||||
|
scored = []
|
||||||
|
for entry in history:
|
||||||
|
score = 0
|
||||||
|
|
||||||
|
# Signal 1: sender domain match
|
||||||
|
if target_domain and entry.get("sender_domain", "") == target_domain:
|
||||||
|
score += 3
|
||||||
|
|
||||||
|
# Signal 2: recipient substring match
|
||||||
|
if target_recipient and target_recipient in entry.get("recipient", "").lower():
|
||||||
|
score += 2
|
||||||
|
|
||||||
|
# Signal 3: subject keyword overlap
|
||||||
|
entry_words = (
|
||||||
|
set(re.findall(r"\w+", entry.get("subject", "").lower())) - _STOP_WORDS
|
||||||
|
)
|
||||||
|
score += len(target_words & entry_words)
|
||||||
|
|
||||||
|
if score > 0:
|
||||||
|
scored.append((score, entry))
|
||||||
|
|
||||||
|
scored.sort(key=lambda x: x[0], reverse=True)
|
||||||
|
return [entry for _, entry in scored[:n]]
|
||||||
|
|
||||||
|
|
||||||
|
def get_sender_stats(sender_domain):
|
||||||
|
"""Get action distribution for a sender domain.
|
||||||
|
|
||||||
|
Returns a dict like {"delete": 5, "keep": 2, "archive": 1}.
|
||||||
|
"""
|
||||||
|
history = _load_history()
|
||||||
|
actions = Counter()
|
||||||
|
for entry in history:
|
||||||
|
if entry.get("sender_domain", "") == sender_domain:
|
||||||
|
actions[entry["action"]] += 1
|
||||||
|
return dict(actions)
|
||||||
|
|
||||||
|
|
||||||
|
def get_sender_history_count(sender_domain):
|
||||||
|
"""Count total past decisions for a sender domain.
|
||||||
|
|
||||||
|
Used by the scan command to decide whether there is enough history
|
||||||
|
to trust auto-actions for this sender.
|
||||||
|
"""
|
||||||
|
history = _load_history()
|
||||||
|
return sum(1 for e in history if e.get("sender_domain", "") == sender_domain)
|
||||||
|
|
||||||
|
|
||||||
|
def get_known_labels():
|
||||||
|
"""Return the set of all label names used in past "label:<name>" decisions.
|
||||||
|
|
||||||
|
These are offered to the LLM so it can reuse existing labels rather
|
||||||
|
than inventing new ones.
|
||||||
|
"""
|
||||||
|
history = _load_history()
|
||||||
|
labels = set()
|
||||||
|
for entry in history:
|
||||||
|
action = entry.get("action", "")
|
||||||
|
if action.startswith("label:"):
|
||||||
|
labels.add(action[6:])
|
||||||
|
return labels
|
||||||
|
|
||||||
|
|
||||||
|
def get_all_stats():
|
||||||
|
"""Compute aggregate statistics across the full decision history.
|
||||||
|
|
||||||
|
Returns a dict with keys: total, by_action, by_source, top_domains.
|
||||||
|
Returns None if history is empty.
|
||||||
|
"""
|
||||||
|
history = _load_history()
|
||||||
|
if not history:
|
||||||
|
return None
|
||||||
|
|
||||||
|
total = len(history)
|
||||||
|
by_action = Counter(e["action"] for e in history)
|
||||||
|
by_source = Counter(e["source"] for e in history)
|
||||||
|
|
||||||
|
# Top 10 sender domains by decision count
|
||||||
|
domain_counts = Counter(e.get("sender_domain", "") for e in history)
|
||||||
|
top_domains = domain_counts.most_common(10)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total": total,
|
||||||
|
"by_action": dict(by_action),
|
||||||
|
"by_source": dict(by_source),
|
||||||
|
"top_domains": top_domains,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Migration
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def migrate_pending():
|
||||||
|
"""One-time migration: import 'done' entries from pending_emails.json.
|
||||||
|
|
||||||
|
Converts old-style action names ("archived" -> "archive", etc.) and
|
||||||
|
records them as user decisions in the history file. Safe to run
|
||||||
|
multiple times (will create duplicates though, so run once only).
|
||||||
|
"""
|
||||||
|
if not PENDING_FILE.exists():
|
||||||
|
print("No pending_emails.json found, nothing to migrate.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
with open(PENDING_FILE, "r", encoding="utf-8") as f:
|
||||||
|
pending = json.load(f)
|
||||||
|
|
||||||
|
# Map old action names to new ones
|
||||||
|
action_map = {
|
||||||
|
"archived": "archive",
|
||||||
|
"kept": "keep",
|
||||||
|
"deleted": "delete",
|
||||||
|
}
|
||||||
|
|
||||||
|
migrated = 0
|
||||||
|
for msg_id, data in pending.items():
|
||||||
|
if data.get("status") != "done":
|
||||||
|
continue
|
||||||
|
old_action = data.get("action", "")
|
||||||
|
action = action_map.get(old_action, old_action)
|
||||||
|
if not action:
|
||||||
|
continue
|
||||||
|
|
||||||
|
email_data = {
|
||||||
|
"sender": data.get("sender", ""),
|
||||||
|
"recipient": data.get("recipient", ""),
|
||||||
|
"subject": data.get("subject", ""),
|
||||||
|
"summary": data.get("summary", ""),
|
||||||
|
}
|
||||||
|
record_decision(email_data, action, source="user")
|
||||||
|
migrated += 1
|
||||||
|
|
||||||
|
print(f"Migrated {migrated} decisions from pending_emails.json")
|
||||||
|
return migrated
|
||||||
27
scripts/email_processor/email-processor.sh
Executable file
27
scripts/email_processor/email-processor.sh
Executable file
@@ -0,0 +1,27 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# email-processor — wrapper script for the email processor.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ./email-processor.sh scan # classify unseen emails
|
||||||
|
# ./email-processor.sh scan --recent 30 # last 30 days
|
||||||
|
# ./email-processor.sh scan --dry-run # classify only, no changes
|
||||||
|
# ./email-processor.sh scan --recent 7 --dry-run # combine both
|
||||||
|
# ./email-processor.sh review list # show pending queue
|
||||||
|
# ./email-processor.sh review 1 delete # act on email #1
|
||||||
|
# ./email-processor.sh review all delete # act on all pending
|
||||||
|
# ./email-processor.sh review accept # accept all suggestions
|
||||||
|
# ./email-processor.sh stats # show history stats
|
||||||
|
# ./email-processor.sh migrate # import old decisions
|
||||||
|
#
|
||||||
|
# Requires: Python 3.8+, himalaya, Ollama running with model.
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||||
|
|
||||||
|
# Activate the virtualenv if it exists
|
||||||
|
if [ -d "$SCRIPT_DIR/venv" ]; then
|
||||||
|
source "$SCRIPT_DIR/venv/bin/activate"
|
||||||
|
fi
|
||||||
|
|
||||||
|
exec python3 "$SCRIPT_DIR/main.py" "$@"
|
||||||
@@ -1,50 +0,0 @@
|
|||||||
[2026-02-15 21:14:02] KEPT: Please confirm your mailbox youlu@luyanxin.com
|
|
||||||
From: "noreply@simplelogin.io" <noreply@simplelogin.io>
|
|
||||||
Analysis: KEEP: Legitimate service confirmation email for mailbox addition (not promotional)
|
|
||||||
|
|
||||||
[2026-02-15 21:15:04] KEPT: =?utf-8?B?RndkOiBHZXQgMTAlIG9mZiB5b3VyIG5leHQgb3JkZXIg4pyF?=
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: KEEP: error - HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=60)
|
|
||||||
|
|
||||||
[2026-02-15 21:15:37] KEPT:
|
|
||||||
=?utf-8?B?RndkOiDigJxzb2Z0d2FyZSBlbmdpbmVlcuKAnTogTWljcm9
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: KEEP: LinkedIn job alert notification for subscribed job search (not promotional)
|
|
||||||
|
|
||||||
[2026-02-15 21:15:52] KEPT: Fwd: Your receipt from OpenRouter, Inc #2231-9732
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: KEEP: This is a legitimate receipt for a payment made to OpenRouter, Inc (a known AI service provider), not promotional content.
|
|
||||||
|
|
||||||
[2026-02-15 21:16:10] KEPT: Fwd: Your ChatGPT code is 217237
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: KEEP: Legitimate security verification code from OpenAI (standard login confirmation)
|
|
||||||
|
|
||||||
[2026-02-15 22:49:44] KEPT (69.0s): =?UTF-8?B?5rWL6K+V6YKu5Lu2?=
|
|
||||||
From: Yanxin Lu <lyx@luyanxin.com>
|
|
||||||
Analysis: KEEP: Test email for delivery verification
|
|
||||||
|
|
||||||
From: Yanxin Lu <lyx@luyanxin.com>
|
|
||||||
Analysis: KEEP: Test email for delivery verification
|
|
||||||
|
|
||||||
[2026-02-15 22:57:03] MOVED_TO_TRASH (68.5s): =?utf-8?B?RndkOiBHZXQgMTAlIG9mZiB5b3VyIG5leHQgb3JkZXIg4pyF?=
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: AD: Forwarded Uber promotional offer
|
|
||||||
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: AD: Forwarded Uber promotional offer
|
|
||||||
|
|
||||||
[2026-02-15 23:00:09] KEPT (120.1s): Fwd: Your ChatGPT code is 217237
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: KEEP: error - HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=120)
|
|
||||||
|
|
||||||
From: "Yanxin Lu - crac1017(a)hotmail.com"
|
|
||||||
<crac1017_at_hotmail_com_fndbbu@simplelogin.co>
|
|
||||||
Analysis: KEEP: error - HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=120)
|
|
||||||
|
|
||||||
@@ -1,29 +0,0 @@
|
|||||||
[2026-02-18 08:04:26] ADDED_TO_PENDING (msg_f1d43ea3) (108.6s): Delivered: "Voikinfo Bottom Gusset Bags..."
|
|
||||||
From: "Amazon.com - order-update(a)amazon.com"
|
|
||||||
<order-update_at_amazon_com_posyo@simplelogin.co>
|
|
||||||
Analysis: KEEP: Standard delivery confirmation from Amazon
|
|
||||||
|
|
||||||
[2026-02-18 08:05:56] ADDED_TO_PENDING (msg_60c56a87) (88.0s): =?UTF-8?b?5L2V5LiN5ruh6Laz6Ieq5bex55qE5Y+j6IW55LmL5qyy?=
|
|
||||||
From: "Uber Eats - uber(a)uber.com" <uber_at_uber_com_kjwzyhxn@simplelogin.co>
|
|
||||||
Analysis: KEEP: The decoded subject line "Your Uber Eats order is ready!" indicates a transactional order update, not an advertisement.
|
|
||||||
|
|
||||||
[2026-02-18 12:01:19] ADDED_TO_PENDING (msg_ebd24205) (66.7s): Your order has been shipped (or closed if combined/delivered
|
|
||||||
From: "cd(a)woodenswords.com"
|
|
||||||
<cd_at_woodenswords_com_xivwijojc@simplelogin.co>
|
|
||||||
Analysis: KEEP: System-generated shipping update notification from an e-commerce store, not promotional content.
|
|
||||||
|
|
||||||
[2026-02-18 12:03:36] MOVED_TO_TRASH (133.4s): =?UTF-8?Q?=E2=80=9Csoftware_engineer=E2=80=9D:_Snap_Inc._-_S
|
|
||||||
From: "LinkedIn Job Alerts - jobalerts-noreply(a)linkedin.com"
|
|
||||||
<jobalerts-noreply_at_linkedin_com_cnrlhok@simplelogin.co>
|
|
||||||
Analysis: AD: This email is a promotional job alert notification from LinkedIn's service for users who have set up job preferences.
|
|
||||||
|
|
||||||
[2026-02-18 12:04:34] ADDED_TO_PENDING (msg_fa73b3bd) (57.3s): =?UTF-8?Q?Yanxin,_I=E2=80=99m_still_waiting_for_your_respons
|
|
||||||
From: "Arslan (via LinkedIn) - messages-noreply(a)linkedin.com"
|
|
||||||
<messages-noreply_at_linkedin_com_ajpnalmwp@simplelogin.co>
|
|
||||||
Analysis: KEEP: This is a standard LinkedIn connection request notification with no promotional content, discounts, or advertisements—only a reminder of an existing invitation.
|
|
||||||
|
|
||||||
[2026-02-18 16:18:42] ADDED_TO_PENDING (msg_f1d43ea3) (102.1s): Delivered: "Voikinfo Bottom Gusset Bags..."
|
|
||||||
From: "Amazon.com - order-update(a)amazon.com"
|
|
||||||
<order-update_at_amazon_com_posyo@simplelogin.co>
|
|
||||||
Analysis: KEEP: Standard delivery confirmation from Amazon, not a promotional message.
|
|
||||||
|
|
||||||
@@ -1,297 +1,704 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""
|
"""
|
||||||
Email Processor - Auto filter ads using local Qwen3
|
Email Processor - Learning-based mailbox cleanup using Himalaya + Ollama.
|
||||||
Moves ad emails to Trash folder (not permanently deleted)
|
|
||||||
|
Uses himalaya CLI for all IMAP operations (no raw imaplib, no stored
|
||||||
|
credentials). Uses a local Qwen3 model via Ollama for classification,
|
||||||
|
with few-shot learning from past user decisions.
|
||||||
|
|
||||||
|
All commands are non-interactive — they take arguments, mutate files on
|
||||||
|
disk, and exit. Suitable for cron (OpenClaw) and scripting.
|
||||||
|
|
||||||
|
Subcommands:
|
||||||
|
python main.py scan # classify unseen emails
|
||||||
|
python main.py scan --recent 30 # classify last 30 days
|
||||||
|
python main.py scan --dry-run # classify only, no changes
|
||||||
|
python main.py scan --recent 7 --dry-run # combine both
|
||||||
|
python main.py review list # print pending queue
|
||||||
|
python main.py review <num-or-id> <action> # act on one email
|
||||||
|
python main.py review all <action> # act on all pending
|
||||||
|
python main.py review accept # accept all suggestions
|
||||||
|
python main.py stats # show decision history
|
||||||
|
python main.py migrate # import old decisions
|
||||||
|
|
||||||
|
Action mapping (what each classification does to the email):
|
||||||
|
delete -> himalaya message delete <id> (moves to Trash)
|
||||||
|
archive -> himalaya message move Archive <id>
|
||||||
|
keep -> no-op (leave unread in inbox)
|
||||||
|
mark_read -> himalaya flag add <id> seen
|
||||||
|
label:X -> himalaya message move <X> <id>
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
import imaplib
|
import subprocess
|
||||||
import email
|
import hashlib
|
||||||
import os
|
|
||||||
import sys
|
import sys
|
||||||
from datetime import datetime
|
from datetime import datetime, timedelta
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
# Config
|
import classifier
|
||||||
|
import decision_store
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Paths — all relative to the script's own directory
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
SCRIPT_DIR = Path(__file__).parent
|
SCRIPT_DIR = Path(__file__).parent
|
||||||
CONFIG_FILE = SCRIPT_DIR / "config.json"
|
CONFIG_FILE = SCRIPT_DIR / "config.json"
|
||||||
LOGS_DIR = SCRIPT_DIR / "logs"
|
LOGS_DIR = SCRIPT_DIR / "logs"
|
||||||
DATA_DIR = SCRIPT_DIR / "data"
|
DATA_DIR = SCRIPT_DIR / "data"
|
||||||
PENDING_FILE = DATA_DIR / "pending_emails.json"
|
PENDING_FILE = DATA_DIR / "pending_emails.json"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Config
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def load_config():
|
def load_config():
|
||||||
"""Load configuration"""
|
"""Load config.json from the script directory.
|
||||||
|
|
||||||
|
Only ollama, rules, and automation settings are needed — himalaya
|
||||||
|
manages its own IMAP config separately.
|
||||||
|
"""
|
||||||
with open(CONFIG_FILE) as f:
|
with open(CONFIG_FILE) as f:
|
||||||
return json.load(f)
|
return json.load(f)
|
||||||
|
|
||||||
def connect_imap(config):
|
|
||||||
"""Connect to IMAP server"""
|
|
||||||
imap_config = config['imap']
|
|
||||||
mail = imaplib.IMAP4_SSL(imap_config['host'], imap_config['port'])
|
|
||||||
mail.login(imap_config['email'], imap_config['password'])
|
|
||||||
return mail
|
|
||||||
|
|
||||||
def get_unseen_emails(mail):
|
# ---------------------------------------------------------------------------
|
||||||
"""Get list of unseen email IDs"""
|
# Himalaya CLI wrappers
|
||||||
mail.select('INBOX')
|
#
|
||||||
_, search_data = mail.search(None, 'UNSEEN')
|
# All IMAP operations go through himalaya, which handles connection,
|
||||||
email_ids = search_data[0].split()
|
# auth, and protocol details. We call it as a subprocess and parse
|
||||||
return email_ids
|
# its JSON output.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def fetch_email(mail, email_id):
|
def _himalaya(*args):
|
||||||
"""Fetch email content"""
|
"""Run a himalaya command and return its stdout.
|
||||||
_, msg_data = mail.fetch(email_id, '(RFC822)')
|
|
||||||
raw_email = msg_data[0][1]
|
|
||||||
msg = email.message_from_bytes(raw_email)
|
|
||||||
|
|
||||||
# Extract subject
|
Raises subprocess.CalledProcessError on failure.
|
||||||
subject = msg['Subject'] or '(No Subject)'
|
"""
|
||||||
|
result = subprocess.run(
|
||||||
|
["himalaya", *args],
|
||||||
|
capture_output=True, text=True, check=True,
|
||||||
|
)
|
||||||
|
return result.stdout
|
||||||
|
|
||||||
# Extract sender
|
|
||||||
sender = msg['From'] or '(Unknown)'
|
|
||||||
|
|
||||||
# Extract recipient
|
def _himalaya_json(*args):
|
||||||
recipient = msg['To'] or '(Unknown)'
|
"""Run a himalaya command with JSON output and return parsed result."""
|
||||||
|
return json.loads(_himalaya("-o", "json", *args))
|
||||||
|
|
||||||
# Extract date
|
|
||||||
date = msg['Date'] or datetime.now().isoformat()
|
|
||||||
|
|
||||||
# Extract body
|
# ---------------------------------------------------------------------------
|
||||||
body = ""
|
# Email fetching via himalaya
|
||||||
if msg.is_multipart():
|
# ---------------------------------------------------------------------------
|
||||||
for part in msg.walk():
|
|
||||||
if part.get_content_type() == "text/plain":
|
def get_unseen_envelopes():
|
||||||
try:
|
"""Fetch envelope metadata for all unseen emails in INBOX.
|
||||||
body = part.get_payload(decode=True).decode('utf-8', errors='ignore')
|
|
||||||
break
|
Returns a list of envelope dicts from himalaya's JSON output.
|
||||||
except:
|
Each has keys like: id, subject, from, to, date, flags.
|
||||||
pass
|
"""
|
||||||
|
return _himalaya_json("envelope", "list", "not", "flag", "seen")
|
||||||
|
|
||||||
|
|
||||||
|
def get_recent_envelopes(days):
|
||||||
|
"""Fetch envelope metadata for all emails from the last N days.
|
||||||
|
|
||||||
|
Includes both read and unread emails — useful for testing and
|
||||||
|
bulk-classifying historical mail.
|
||||||
|
"""
|
||||||
|
since = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
|
||||||
|
return _himalaya_json("envelope", "list", "after", since)
|
||||||
|
|
||||||
|
|
||||||
|
def read_message(envelope_id):
|
||||||
|
"""Read the full message body without marking it as seen.
|
||||||
|
|
||||||
|
The --preview flag prevents himalaya from adding the \\Seen flag,
|
||||||
|
so the email stays unread for the actual action to handle.
|
||||||
|
"""
|
||||||
|
# Read plain text, no headers, without marking as seen
|
||||||
|
return _himalaya("message", "read", "--preview", "--no-headers", str(envelope_id))
|
||||||
|
|
||||||
|
|
||||||
|
def build_email_data(envelope, body, config):
|
||||||
|
"""Build the email_data dict expected by classifier and decision_store.
|
||||||
|
|
||||||
|
Combines envelope metadata (from himalaya envelope list) with the
|
||||||
|
message body (from himalaya message read).
|
||||||
|
"""
|
||||||
|
max_body = config.get("rules", {}).get("max_body_length", 1000)
|
||||||
|
|
||||||
|
# himalaya envelope JSON uses "from" as a nested object or string
|
||||||
|
sender = envelope.get("from", {})
|
||||||
|
if isinstance(sender, dict):
|
||||||
|
# Format: {"name": "Display Name", "addr": "user@example.com"}
|
||||||
|
name = sender.get("name", "")
|
||||||
|
addr = sender.get("addr", "")
|
||||||
|
sender_str = f"{name} <{addr}>" if name else addr
|
||||||
|
elif isinstance(sender, list) and sender:
|
||||||
|
first = sender[0]
|
||||||
|
name = first.get("name", "")
|
||||||
|
addr = first.get("addr", "")
|
||||||
|
sender_str = f"{name} <{addr}>" if name else addr
|
||||||
else:
|
else:
|
||||||
try:
|
sender_str = str(sender)
|
||||||
body = msg.get_payload(decode=True).decode('utf-8', errors='ignore')
|
|
||||||
except:
|
# Same for "to"
|
||||||
pass
|
to = envelope.get("to", {})
|
||||||
|
if isinstance(to, dict):
|
||||||
|
name = to.get("name", "")
|
||||||
|
addr = to.get("addr", "")
|
||||||
|
to_str = f"{name} <{addr}>" if name else addr
|
||||||
|
elif isinstance(to, list) and to:
|
||||||
|
first = to[0]
|
||||||
|
name = first.get("name", "")
|
||||||
|
addr = first.get("addr", "")
|
||||||
|
to_str = f"{name} <{addr}>" if name else addr
|
||||||
|
else:
|
||||||
|
to_str = str(to)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': email_id,
|
"id": str(envelope.get("id", "")),
|
||||||
'subject': subject,
|
"subject": envelope.get("subject", "(No Subject)"),
|
||||||
'sender': sender,
|
"sender": sender_str,
|
||||||
'recipient': recipient,
|
"recipient": to_str,
|
||||||
'date': date,
|
"date": envelope.get("date", ""),
|
||||||
'body': body[:300] # Limit body length
|
"body": body[:max_body],
|
||||||
}
|
}
|
||||||
|
|
||||||
def analyze_with_qwen3(email_data, config):
|
|
||||||
"""Analyze email with local Qwen3 using official library"""
|
|
||||||
import ollama
|
|
||||||
import time
|
|
||||||
|
|
||||||
prompt = f"""/no_think
|
# ---------------------------------------------------------------------------
|
||||||
|
# IMAP actions via himalaya
|
||||||
|
#
|
||||||
|
# Each function executes one himalaya command. Returns True on success.
|
||||||
|
# On failure, prints the error and returns False.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
Analyze this email and provide two pieces of information:
|
def execute_action(envelope_id, action):
|
||||||
|
"""Dispatch an action string to the appropriate himalaya command.
|
||||||
|
|
||||||
1. Is this an advertisement/promotional email?
|
Action mapping:
|
||||||
2. Summarize the email in one sentence
|
"delete" -> himalaya message delete <id>
|
||||||
|
"archive" -> himalaya message move Archive <id>
|
||||||
Email details:
|
"keep" -> no-op (leave unread in inbox)
|
||||||
Subject: {email_data['subject']}
|
"mark_read" -> himalaya flag add <id> seen
|
||||||
Sender: {email_data['sender']}
|
"label:X" -> himalaya message move <X> <id>
|
||||||
Body: {email_data['body'][:300]}
|
|
||||||
|
|
||||||
Respond in this exact format:
|
|
||||||
IsAD: [YES or NO]
|
|
||||||
Summary: [one sentence summary]
|
|
||||||
Reason: [brief explanation]
|
|
||||||
"""
|
|
||||||
|
|
||||||
start_time = time.time()
|
|
||||||
model = config['ollama'].get('model', 'qwen3:4b')
|
|
||||||
|
|
||||||
|
Returns True on success, False on failure.
|
||||||
|
"""
|
||||||
|
eid = str(envelope_id)
|
||||||
try:
|
try:
|
||||||
response = ollama.generate(model=model, prompt=prompt, options={'temperature': 0.1})
|
if action == "delete":
|
||||||
output = response['response']
|
_himalaya("message", "delete", eid)
|
||||||
|
elif action == "archive":
|
||||||
# Parse output
|
_himalaya("message", "move", "Archive", eid)
|
||||||
is_ad = False
|
elif action == "keep":
|
||||||
summary = "No summary"
|
pass # leave unread in inbox — no IMAP changes
|
||||||
reason = "Unknown"
|
elif action == "mark_read":
|
||||||
|
_himalaya("flag", "add", eid, "seen")
|
||||||
for line in output.strip().split('\n'):
|
elif action.startswith("label:"):
|
||||||
if line.startswith('IsAD:'):
|
folder = action[6:]
|
||||||
is_ad = 'YES' in line.upper()
|
_himalaya("message", "move", folder, eid)
|
||||||
elif line.startswith('Summary:'):
|
|
||||||
summary = line.replace('Summary:', '').strip()[:200]
|
|
||||||
elif line.startswith('Reason:'):
|
|
||||||
reason = line.replace('Reason:', '').strip()
|
|
||||||
|
|
||||||
if is_ad:
|
|
||||||
result = f"AD: {reason}"
|
|
||||||
else:
|
else:
|
||||||
result = f"KEEP: {reason}"
|
print(f" Unknown action: {action}")
|
||||||
|
return False
|
||||||
except Exception as e:
|
|
||||||
result = f"KEEP: error - {str(e)[:100]}"
|
|
||||||
summary = "Analysis failed"
|
|
||||||
is_ad = False
|
|
||||||
|
|
||||||
duration = time.time() - start_time
|
|
||||||
return result, summary, is_ad, duration
|
|
||||||
|
|
||||||
def move_to_trash(mail, email_id):
|
|
||||||
"""Move email to Trash folder"""
|
|
||||||
# Copy to Trash
|
|
||||||
result = mail.copy(email_id, 'Trash')
|
|
||||||
if result[0] == 'OK':
|
|
||||||
# Mark original as deleted
|
|
||||||
mail.store(email_id, '+FLAGS', '\\Deleted')
|
|
||||||
return True
|
return True
|
||||||
return False
|
except subprocess.CalledProcessError as e:
|
||||||
|
print(f" Himalaya error: {e.stderr.strip()}")
|
||||||
|
return False
|
||||||
|
|
||||||
def log_result(log_file, email_data, analysis, action, duration=None):
|
|
||||||
"""Log processing result with Qwen3 duration"""
|
# ---------------------------------------------------------------------------
|
||||||
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
# Pending queue — emails awaiting manual review
|
||||||
duration_str = f" ({duration:.1f}s)" if duration else ""
|
#
|
||||||
with open(log_file, 'a') as f:
|
# Stored as a JSON dict in data/pending_emails.json, keyed by msg_id.
|
||||||
f.write(f"[{timestamp}] {action}{duration_str}: {email_data['subject'][:60]}\n")
|
# Each entry tracks the envelope ID (for himalaya), classifier suggestion,
|
||||||
f.write(f" From: {email_data['sender']}\n")
|
# and status (pending/done).
|
||||||
f.write(f" Analysis: {analysis}\n\n")
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def load_pending():
|
def load_pending():
|
||||||
"""Load pending emails from JSON file"""
|
"""Load the pending queue from disk."""
|
||||||
if not PENDING_FILE.exists():
|
if not PENDING_FILE.exists():
|
||||||
return {}
|
return {}
|
||||||
with open(PENDING_FILE, 'r', encoding='utf-8') as f:
|
with open(PENDING_FILE, "r", encoding="utf-8") as f:
|
||||||
return json.load(f)
|
return json.load(f)
|
||||||
|
|
||||||
|
|
||||||
def save_pending(pending):
|
def save_pending(pending):
|
||||||
"""Save pending emails to JSON file"""
|
"""Write the pending queue to disk."""
|
||||||
DATA_DIR.mkdir(exist_ok=True)
|
DATA_DIR.mkdir(exist_ok=True)
|
||||||
with open(PENDING_FILE, 'w', encoding='utf-8') as f:
|
with open(PENDING_FILE, "w", encoding="utf-8") as f:
|
||||||
json.dump(pending, f, indent=2, ensure_ascii=False)
|
json.dump(pending, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
def add_to_pending(email_data, summary, imap_uid, recipient):
|
|
||||||
"""Add email to pending queue"""
|
def add_to_pending(email_data, summary, reason, action_suggestion, confidence):
|
||||||
|
"""Add an email to the pending queue for manual review.
|
||||||
|
|
||||||
|
Stores the classifier's suggestion and confidence alongside the
|
||||||
|
email metadata so the user can see what the model thought.
|
||||||
|
"""
|
||||||
pending = load_pending()
|
pending = load_pending()
|
||||||
|
|
||||||
# Generate unique ID
|
# Generate a stable ID from envelope ID + subject
|
||||||
import hashlib
|
eid = str(email_data["id"])
|
||||||
msg_id = f"msg_{hashlib.md5(f'{imap_uid}_{email_data['subject']}'.encode()).hexdigest()[:8]}"
|
key = f"{eid}_{email_data['subject']}"
|
||||||
|
msg_id = f"msg_{hashlib.md5(key.encode()).hexdigest()[:8]}"
|
||||||
# Extract date from email
|
|
||||||
email_date = email_data.get('date', datetime.now().isoformat())
|
|
||||||
|
|
||||||
pending[msg_id] = {
|
pending[msg_id] = {
|
||||||
"imap_uid": str(imap_uid),
|
"envelope_id": eid,
|
||||||
"subject": email_data['subject'],
|
"subject": email_data["subject"],
|
||||||
"sender": email_data['sender'],
|
"sender": email_data["sender"],
|
||||||
"recipient": recipient,
|
"recipient": email_data.get("recipient", ""),
|
||||||
"summary": summary,
|
"summary": summary,
|
||||||
"email_date": email_date,
|
"reason": reason,
|
||||||
|
"suggested_action": action_suggestion,
|
||||||
|
"confidence": confidence,
|
||||||
|
"email_date": email_data.get("date", ""),
|
||||||
"status": "pending",
|
"status": "pending",
|
||||||
"found_at": datetime.now().isoformat()
|
"found_at": datetime.now().isoformat(),
|
||||||
}
|
}
|
||||||
|
|
||||||
save_pending(pending)
|
save_pending(pending)
|
||||||
return msg_id
|
return msg_id
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main processing function"""
|
|
||||||
print("📧 Email Processor Starting...")
|
|
||||||
|
|
||||||
# Load config
|
# ---------------------------------------------------------------------------
|
||||||
config = load_config()
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def log_result(log_file, email_data, action, detail, duration=None):
|
||||||
|
"""Append a one-line log entry for a processed email."""
|
||||||
|
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
dur = f" ({duration:.1f}s)" if duration else ""
|
||||||
|
with open(log_file, "a") as f:
|
||||||
|
f.write(f"[{timestamp}] {action}{dur}: {email_data['subject'][:60]}\n")
|
||||||
|
f.write(f" From: {email_data['sender']}\n")
|
||||||
|
f.write(f" Detail: {detail}\n\n")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: scan
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_scan(config, recent=None, dry_run=False):
|
||||||
|
"""Fetch emails, classify each one, then auto-act or queue.
|
||||||
|
|
||||||
|
Auto-action is based on a single confidence threshold. When the
|
||||||
|
decision history has fewer than 20 entries, a higher threshold (95%)
|
||||||
|
is used to be conservative during the learning phase. Once enough
|
||||||
|
history accumulates, the configured threshold takes over.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
config: full config dict.
|
||||||
|
recent: if set, fetch emails from last N days (not just unseen).
|
||||||
|
dry_run: if True, classify and print but skip all actions.
|
||||||
|
"""
|
||||||
|
mode = "DRY RUN" if dry_run else "Scan"
|
||||||
|
print(f"Email Processor - {mode}")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
# Setup logging
|
|
||||||
LOGS_DIR.mkdir(exist_ok=True)
|
LOGS_DIR.mkdir(exist_ok=True)
|
||||||
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||||
|
|
||||||
try:
|
# Load automation threshold
|
||||||
# Connect to IMAP
|
automation = config.get("automation", {})
|
||||||
print("Connecting to IMAP...")
|
configured_threshold = automation.get("confidence_threshold", 75)
|
||||||
mail = connect_imap(config)
|
|
||||||
print("✅ Connected")
|
|
||||||
|
|
||||||
# Get unseen emails
|
# Adaptive threshold: be conservative when history is thin
|
||||||
email_ids = get_unseen_emails(mail)
|
stats = decision_store.get_all_stats()
|
||||||
print(f"Found {len(email_ids)} unread emails")
|
total_decisions = stats["total"] if stats else 0
|
||||||
|
bootstrap_min = automation.get("bootstrap_min_decisions", 20)
|
||||||
|
if total_decisions < bootstrap_min:
|
||||||
|
confidence_threshold = 95
|
||||||
|
print(f"Learning phase ({total_decisions}/{bootstrap_min} decisions) — threshold: 95%\n")
|
||||||
|
else:
|
||||||
|
confidence_threshold = configured_threshold
|
||||||
|
|
||||||
if not email_ids:
|
# Fetch envelopes via himalaya
|
||||||
print("No new emails to process")
|
if recent:
|
||||||
mail.logout()
|
envelopes = get_recent_envelopes(recent)
|
||||||
return
|
print(f"Found {len(envelopes)} emails from last {recent} days\n")
|
||||||
|
else:
|
||||||
|
envelopes = get_unseen_envelopes()
|
||||||
|
print(f"Found {len(envelopes)} unread emails\n")
|
||||||
|
|
||||||
# Process each email
|
if not envelopes:
|
||||||
processed = 0
|
print("No new emails to process.")
|
||||||
moved_to_trash = 0
|
return
|
||||||
added_to_pending = 0
|
|
||||||
|
|
||||||
for email_id in email_ids:
|
auto_acted = 0
|
||||||
print(f"\nProcessing email {email_id.decode()}...")
|
queued = 0
|
||||||
|
|
||||||
# Fetch email
|
for envelope in envelopes:
|
||||||
email_data = fetch_email(mail, email_id)
|
eid = envelope.get("id", "?")
|
||||||
print(f" Subject: {email_data['subject'][:50]}")
|
print(f"[{eid}] ", end="", flush=True)
|
||||||
|
|
||||||
# Analyze with Qwen3 (one call for both ad detection and summary)
|
# Read message body without marking as seen
|
||||||
analysis, summary, is_ad, duration = analyze_with_qwen3(email_data, config)
|
try:
|
||||||
print(f" Analysis: {analysis[:100]}")
|
body = read_message(eid)
|
||||||
print(f" Summary: {summary[:60]}...")
|
except subprocess.CalledProcessError:
|
||||||
print(f" Qwen3 time: {duration:.1f}s")
|
body = ""
|
||||||
|
|
||||||
# Check if analysis was successful (not an error)
|
email_data = build_email_data(envelope, body, config)
|
||||||
if 'error -' in analysis.lower():
|
print(f"{email_data['subject'][:55]}")
|
||||||
# Analysis failed - keep email unread for retry
|
|
||||||
print(f" -> Analysis failed, keeping unread for retry")
|
|
||||||
log_result(log_file, email_data, analysis, "FAILED_RETRY", duration)
|
|
||||||
# Don't increment processed count - will retry next time
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Analysis successful - determine action
|
# Run the LLM classifier (includes few-shot examples from history)
|
||||||
if is_ad:
|
action, confidence, summary, reason, duration = classifier.classify_email(
|
||||||
print(" -> Moving to Trash")
|
email_data, config
|
||||||
if move_to_trash(mail, email_id):
|
)
|
||||||
log_result(log_file, email_data, analysis, "MOVED_TO_TRASH", duration)
|
|
||||||
moved_to_trash += 1
|
print(f" -> {action} (confidence: {confidence}%, {duration:.1f}s)")
|
||||||
else:
|
print(f" {reason[:80]}")
|
||||||
log_result(log_file, email_data, analysis, "MOVE_FAILED", duration)
|
|
||||||
|
# Auto-act if confidence meets threshold
|
||||||
|
can_auto = confidence >= confidence_threshold
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
# Dry run: log what would happen, touch nothing
|
||||||
|
log_result(log_file, email_data, f"DRYRUN:{action}@{confidence}%", reason, duration)
|
||||||
|
if can_auto:
|
||||||
|
print(f" -> Would AUTO-execute: {action}")
|
||||||
|
auto_acted += 1
|
||||||
else:
|
else:
|
||||||
# Non-ad email - add to pending queue
|
print(f" -> Would queue for review")
|
||||||
print(" -> Adding to pending queue")
|
queued += 1
|
||||||
|
elif can_auto:
|
||||||
# Add to pending
|
# Auto-execute the action via himalaya
|
||||||
msg_internal_id = add_to_pending(
|
success = execute_action(eid, action)
|
||||||
email_data,
|
if success:
|
||||||
summary,
|
decision_store.record_decision(
|
||||||
email_id.decode(),
|
{**email_data, "summary": summary}, action, source="auto"
|
||||||
email_data.get('recipient', 'youlu@luyanxin.com')
|
|
||||||
)
|
)
|
||||||
|
log_result(log_file, email_data, f"AUTO:{action}", reason, duration)
|
||||||
|
print(f" ** AUTO-executed: {action}")
|
||||||
|
auto_acted += 1
|
||||||
|
else:
|
||||||
|
# Himalaya action failed — fall back to queuing
|
||||||
|
log_result(log_file, email_data, "AUTO_FAILED", reason, duration)
|
||||||
|
print(f" !! Auto-action failed, queuing instead")
|
||||||
|
add_to_pending(email_data, summary, reason, action, confidence)
|
||||||
|
queued += 1
|
||||||
|
else:
|
||||||
|
# Not enough confidence or history — queue for manual review
|
||||||
|
add_to_pending(email_data, summary, reason, action, confidence)
|
||||||
|
# Mark as read to prevent re-processing on next scan
|
||||||
|
if not dry_run:
|
||||||
|
try:
|
||||||
|
_himalaya("flag", "add", str(eid), "seen")
|
||||||
|
except subprocess.CalledProcessError:
|
||||||
|
pass
|
||||||
|
log_result(log_file, email_data, f"QUEUED:{action}@{confidence}%", reason, duration)
|
||||||
|
print(f" -> Queued (confidence {confidence}% < {confidence_threshold}%)")
|
||||||
|
queued += 1
|
||||||
|
|
||||||
# Mark as read (so it won't be processed again)
|
# Print run summary
|
||||||
mail.store(email_id, '+FLAGS', '\\Seen')
|
print(f"\n{'=' * 50}")
|
||||||
|
print(f"Processed: {len(envelopes)} emails")
|
||||||
|
print(f" Auto-acted: {auto_acted}")
|
||||||
|
print(f" Queued for review: {queued}")
|
||||||
|
print(f"\nRun 'python main.py review list' to see pending emails")
|
||||||
|
|
||||||
log_result(log_file, email_data, analysis, f"ADDED_TO_PENDING ({msg_internal_id})", duration)
|
|
||||||
added_to_pending += 1
|
|
||||||
|
|
||||||
processed += 1
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: review
|
||||||
|
#
|
||||||
|
# Non-interactive: each invocation takes arguments, acts, and exits.
|
||||||
|
# No input() calls. Compatible with cron and scripting.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
# Expunge deleted emails
|
def _get_pending_items():
|
||||||
mail.expunge()
|
"""Return only pending (not done) items, sorted by found_at."""
|
||||||
mail.logout()
|
pending = load_pending()
|
||||||
|
items = {k: v for k, v in pending.items() if v.get("status") == "pending"}
|
||||||
|
sorted_items = sorted(items.items(), key=lambda x: x[1].get("found_at", ""))
|
||||||
|
return sorted_items
|
||||||
|
|
||||||
# Summary
|
|
||||||
print(f"\n{'='*50}")
|
|
||||||
print(f"Total emails checked: {len(email_ids)}")
|
|
||||||
print(f"Successfully processed: {processed} emails")
|
|
||||||
print(f" - Moved to trash (ads): {moved_to_trash}")
|
|
||||||
print(f" - Added to pending queue: {added_to_pending}")
|
|
||||||
print(f"Failed (will retry next time): {len(email_ids) - processed}")
|
|
||||||
print(f"\n📁 Pending queue: {PENDING_FILE}")
|
|
||||||
print(f"📝 Log: {log_file}")
|
|
||||||
print(f"\n💡 Run 'python process_queue.py' to view and process pending emails")
|
|
||||||
|
|
||||||
except Exception as e:
|
def cmd_review_list():
|
||||||
print(f"❌ Error: {e}")
|
"""Print the pending queue and exit.
|
||||||
|
|
||||||
|
Shows each email with its number, ID, subject, sender, summary,
|
||||||
|
and the classifier's suggested action with confidence.
|
||||||
|
"""
|
||||||
|
sorted_items = _get_pending_items()
|
||||||
|
|
||||||
|
if not sorted_items:
|
||||||
|
print("No pending emails to review.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"Pending emails: {len(sorted_items)}")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
for i, (msg_id, data) in enumerate(sorted_items, 1):
|
||||||
|
suggested = data.get("suggested_action", "?")
|
||||||
|
conf = data.get("confidence", "?")
|
||||||
|
print(f"\n {i}. [{msg_id}]")
|
||||||
|
print(f" Subject: {data.get('subject', 'N/A')[:55]}")
|
||||||
|
print(f" From: {data.get('sender', 'N/A')[:55]}")
|
||||||
|
print(f" To: {data.get('recipient', 'N/A')[:40]}")
|
||||||
|
print(f" Summary: {data.get('summary', 'N/A')[:70]}")
|
||||||
|
print(f" Suggested: {suggested} ({conf}% confidence)")
|
||||||
|
|
||||||
|
print(f"\n{'=' * 60}")
|
||||||
|
print("Usage:")
|
||||||
|
print(" python main.py review <number> <action>")
|
||||||
|
print(" python main.py review all <action>")
|
||||||
|
print(" python main.py review accept")
|
||||||
|
print("Actions: delete / archive / keep / mark_read / label:<name>")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_review_act(selector, action):
|
||||||
|
"""Execute an action on one or more pending emails.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
selector: a 1-based number, a msg_id string, or "all".
|
||||||
|
action: one of delete/archive/keep/mark_read/label:<name>.
|
||||||
|
"""
|
||||||
|
# Validate action
|
||||||
|
valid_actions = {"delete", "archive", "keep", "mark_read"}
|
||||||
|
if action not in valid_actions and not action.startswith("label:"):
|
||||||
|
print(f"Invalid action: {action}")
|
||||||
|
print(f"Valid: {', '.join(sorted(valid_actions))}, label:<name>")
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
|
sorted_items = _get_pending_items()
|
||||||
|
if not sorted_items:
|
||||||
|
print("No pending emails to review.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Resolve targets
|
||||||
|
if selector == "all":
|
||||||
|
targets = sorted_items
|
||||||
|
else:
|
||||||
|
target = _resolve_target(selector, sorted_items)
|
||||||
|
if target is None:
|
||||||
|
sys.exit(1)
|
||||||
|
targets = [target]
|
||||||
|
|
||||||
|
LOGS_DIR.mkdir(exist_ok=True)
|
||||||
|
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||||
|
|
||||||
|
# Execute action on each target
|
||||||
|
for msg_id, data in targets:
|
||||||
|
eid = data.get("envelope_id") or data.get("imap_uid")
|
||||||
|
if not eid:
|
||||||
|
print(f" {msg_id}: No envelope ID, skipping")
|
||||||
|
continue
|
||||||
|
|
||||||
|
success = execute_action(eid, action)
|
||||||
|
if success:
|
||||||
|
# Record decision for future learning
|
||||||
|
decision_store.record_decision(data, action, source="user")
|
||||||
|
|
||||||
|
# Mark as done in pending queue
|
||||||
|
pending = load_pending()
|
||||||
|
pending[msg_id]["status"] = "done"
|
||||||
|
pending[msg_id]["action"] = action
|
||||||
|
pending[msg_id]["processed_at"] = datetime.now().isoformat()
|
||||||
|
save_pending(pending)
|
||||||
|
|
||||||
|
log_result(log_file, data, f"REVIEW:{action}", data.get("reason", ""))
|
||||||
|
print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})")
|
||||||
|
else:
|
||||||
|
log_result(log_file, data, f"REVIEW_FAILED:{action}", data.get("reason", ""))
|
||||||
|
print(f" {msg_id}: {action} -> FAILED")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_review_accept():
|
||||||
|
"""Accept all classifier suggestions for pending emails.
|
||||||
|
|
||||||
|
For each pending email, executes the suggested_action that the
|
||||||
|
classifier assigned during scan. Records each as a "user" decision
|
||||||
|
since the user explicitly chose to accept.
|
||||||
|
"""
|
||||||
|
sorted_items = _get_pending_items()
|
||||||
|
if not sorted_items:
|
||||||
|
print("No pending emails to review.")
|
||||||
|
return
|
||||||
|
|
||||||
|
LOGS_DIR.mkdir(exist_ok=True)
|
||||||
|
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
|
||||||
|
|
||||||
|
for msg_id, data in sorted_items:
|
||||||
|
action = data.get("suggested_action")
|
||||||
|
if not action:
|
||||||
|
print(f" {msg_id}: No suggestion, skipping")
|
||||||
|
continue
|
||||||
|
|
||||||
|
eid = data.get("envelope_id") or data.get("imap_uid")
|
||||||
|
if not eid:
|
||||||
|
print(f" {msg_id}: No envelope ID, skipping")
|
||||||
|
continue
|
||||||
|
|
||||||
|
success = execute_action(eid, action)
|
||||||
|
if success:
|
||||||
|
decision_store.record_decision(data, action, source="user")
|
||||||
|
|
||||||
|
pending = load_pending()
|
||||||
|
pending[msg_id]["status"] = "done"
|
||||||
|
pending[msg_id]["action"] = action
|
||||||
|
pending[msg_id]["processed_at"] = datetime.now().isoformat()
|
||||||
|
save_pending(pending)
|
||||||
|
|
||||||
|
log_result(log_file, data, f"ACCEPT:{action}", data.get("reason", ""))
|
||||||
|
print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})")
|
||||||
|
else:
|
||||||
|
log_result(log_file, data, f"ACCEPT_FAILED:{action}", data.get("reason", ""))
|
||||||
|
print(f" {msg_id}: {action} -> FAILED")
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_target(selector, sorted_items):
|
||||||
|
"""Resolve a selector (number or msg_id) to a (msg_id, data) tuple.
|
||||||
|
|
||||||
|
Returns None and prints an error if the selector is invalid.
|
||||||
|
"""
|
||||||
|
# Try as 1-based index
|
||||||
|
try:
|
||||||
|
idx = int(selector) - 1
|
||||||
|
if 0 <= idx < len(sorted_items):
|
||||||
|
return sorted_items[idx]
|
||||||
|
else:
|
||||||
|
print(f"Invalid number. Range: 1-{len(sorted_items)}")
|
||||||
|
return None
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Try as msg_id
|
||||||
|
for msg_id, data in sorted_items:
|
||||||
|
if msg_id == selector:
|
||||||
|
return (msg_id, data)
|
||||||
|
|
||||||
|
print(f"Not found: {selector}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: stats
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_stats():
|
||||||
|
"""Print a summary of the decision history.
|
||||||
|
|
||||||
|
Shows total decisions, user vs. auto breakdown, action distribution,
|
||||||
|
top sender domains, and custom labels.
|
||||||
|
"""
|
||||||
|
stats = decision_store.get_all_stats()
|
||||||
|
|
||||||
|
if not stats:
|
||||||
|
print("No decision history yet.")
|
||||||
|
print("Run 'python main.py scan' and 'python main.py review' to build history.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("Decision History Stats")
|
||||||
|
print("=" * 50)
|
||||||
|
print(f"Total decisions: {stats['total']}")
|
||||||
|
|
||||||
|
# User vs. auto breakdown
|
||||||
|
print(f"\nBy source:")
|
||||||
|
for source, count in sorted(stats["by_source"].items()):
|
||||||
|
pct = count / stats["total"] * 100
|
||||||
|
print(f" {source}: {count} ({pct:.0f}%)")
|
||||||
|
|
||||||
|
auto = stats["by_source"].get("auto", 0)
|
||||||
|
if stats["total"] > 0:
|
||||||
|
print(f" Automation rate: {auto / stats['total'] * 100:.0f}%")
|
||||||
|
|
||||||
|
# Action distribution
|
||||||
|
print(f"\nBy action:")
|
||||||
|
for action, count in sorted(stats["by_action"].items(), key=lambda x: -x[1]):
|
||||||
|
print(f" {action}: {count}")
|
||||||
|
|
||||||
|
# Top sender domains with per-domain action counts
|
||||||
|
print(f"\nTop sender domains:")
|
||||||
|
for domain, count in stats["top_domains"]:
|
||||||
|
domain_stats = decision_store.get_sender_stats(domain)
|
||||||
|
detail = ", ".join(
|
||||||
|
f"{a}:{c}" for a, c in sorted(domain_stats.items(), key=lambda x: -x[1])
|
||||||
|
)
|
||||||
|
print(f" {domain}: {count} ({detail})")
|
||||||
|
|
||||||
|
# Custom labels
|
||||||
|
labels = decision_store.get_known_labels()
|
||||||
|
if labels:
|
||||||
|
print(f"\nKnown labels: {', '.join(sorted(labels))}")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: migrate
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_migrate():
|
||||||
|
"""Import old pending_emails.json 'done' entries into decision history.
|
||||||
|
|
||||||
|
Run once after upgrading from the old system. Converts old action
|
||||||
|
names (archived/kept/deleted) to new ones (archive/keep/delete).
|
||||||
|
"""
|
||||||
|
decision_store.migrate_pending()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Entry point & argument parsing
|
||||||
|
#
|
||||||
|
# Simple hand-rolled parser — no external dependencies. Supports:
|
||||||
|
# main.py [subcommand] [--recent N] [--dry-run] [review-args...]
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
args = sys.argv[1:]
|
||||||
|
subcommand = "scan"
|
||||||
|
recent = None
|
||||||
|
dry_run = False
|
||||||
|
extra_args = [] # for review subcommand arguments
|
||||||
|
|
||||||
|
# Parse args
|
||||||
|
i = 0
|
||||||
|
while i < len(args):
|
||||||
|
if args[i] == "--recent" and i + 1 < len(args):
|
||||||
|
recent = int(args[i + 1])
|
||||||
|
i += 2
|
||||||
|
elif args[i] == "--dry-run":
|
||||||
|
dry_run = True
|
||||||
|
i += 1
|
||||||
|
elif not args[i].startswith("--") and subcommand == "scan" and not extra_args:
|
||||||
|
# First positional arg is the subcommand
|
||||||
|
subcommand = args[i]
|
||||||
|
i += 1
|
||||||
|
elif not args[i].startswith("--"):
|
||||||
|
# Remaining positional args go to the subcommand
|
||||||
|
extra_args.append(args[i])
|
||||||
|
i += 1
|
||||||
|
else:
|
||||||
|
print(f"Unknown flag: {args[i]}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
config = load_config()
|
||||||
|
|
||||||
|
if subcommand == "scan":
|
||||||
|
cmd_scan(config, recent=recent, dry_run=dry_run)
|
||||||
|
|
||||||
|
elif subcommand == "review":
|
||||||
|
if not extra_args or extra_args[0] == "list":
|
||||||
|
cmd_review_list()
|
||||||
|
elif extra_args[0] == "accept":
|
||||||
|
cmd_review_accept()
|
||||||
|
elif len(extra_args) == 2:
|
||||||
|
cmd_review_act(extra_args[0], extra_args[1])
|
||||||
|
else:
|
||||||
|
print("Usage:")
|
||||||
|
print(" python main.py review list")
|
||||||
|
print(" python main.py review <number-or-id> <action>")
|
||||||
|
print(" python main.py review all <action>")
|
||||||
|
print(" python main.py review accept")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
elif subcommand == "stats":
|
||||||
|
cmd_stats()
|
||||||
|
|
||||||
|
elif subcommand == "migrate":
|
||||||
|
cmd_migrate()
|
||||||
|
|
||||||
|
else:
|
||||||
|
print(f"Unknown subcommand: {subcommand}")
|
||||||
|
print("Usage: python main.py [scan|review|stats|migrate] [--recent N] [--dry-run]")
|
||||||
|
sys.exit(1)
|
||||||
|
|||||||
@@ -1,28 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""Move specific email to trash"""
|
|
||||||
import imaplib
|
|
||||||
import email
|
|
||||||
|
|
||||||
# Connect
|
|
||||||
mail = imaplib.IMAP4_SSL('imap.migadu.com', 993)
|
|
||||||
mail.login('youlu@luyanxin.com', 'kDkNau2r7m.hV!uk*D4Yr8mC7Dyjx9T')
|
|
||||||
mail.select('INBOX')
|
|
||||||
|
|
||||||
# Search for the email with "10% off" in subject
|
|
||||||
_, search_data = mail.search(None, 'SUBJECT', '"10% off"')
|
|
||||||
email_ids = search_data[0].split()
|
|
||||||
|
|
||||||
print(f"Found {len(email_ids)} emails with '10% off' in subject")
|
|
||||||
|
|
||||||
for email_id in email_ids:
|
|
||||||
# Copy to Trash
|
|
||||||
result = mail.copy(email_id, 'Trash')
|
|
||||||
if result[0] == 'OK':
|
|
||||||
mail.store(email_id, '+FLAGS', '\\Deleted')
|
|
||||||
print(f"✅ Moved email {email_id.decode()} to Trash")
|
|
||||||
else:
|
|
||||||
print(f"❌ Failed to move email {email_id.decode()}")
|
|
||||||
|
|
||||||
mail.expunge()
|
|
||||||
mail.logout()
|
|
||||||
print("Done!")
|
|
||||||
@@ -1,214 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Email Queue Processor - Handle user commands for pending emails
|
|
||||||
Reads pending_emails.json and executes user commands (archive/keep/reply)
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import imaplib
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
from datetime import datetime
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
SCRIPT_DIR = Path(__file__).parent
|
|
||||||
DATA_FILE = SCRIPT_DIR / "data" / "pending_emails.json"
|
|
||||||
|
|
||||||
def load_pending():
|
|
||||||
"""Load pending emails from JSON file"""
|
|
||||||
if not DATA_FILE.exists():
|
|
||||||
return {}
|
|
||||||
with open(DATA_FILE, 'r', encoding='utf-8') as f:
|
|
||||||
return json.load(f)
|
|
||||||
|
|
||||||
def save_pending(pending):
|
|
||||||
"""Save pending emails to JSON file"""
|
|
||||||
DATA_FILE.parent.mkdir(exist_ok=True)
|
|
||||||
with open(DATA_FILE, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(pending, f, indent=2, ensure_ascii=False)
|
|
||||||
|
|
||||||
def connect_imap(config):
|
|
||||||
"""Connect to IMAP server"""
|
|
||||||
mail = imaplib.IMAP4_SSL(config['imap']['host'], config['imap']['port'])
|
|
||||||
mail.login(config['imap']['email'], config['imap']['password'])
|
|
||||||
return mail
|
|
||||||
|
|
||||||
def show_pending_list():
|
|
||||||
"""Display all pending emails"""
|
|
||||||
pending = load_pending()
|
|
||||||
|
|
||||||
if not pending:
|
|
||||||
print("📭 没有待处理的邮件")
|
|
||||||
return
|
|
||||||
|
|
||||||
print(f"\n📧 待处理邮件列表 ({len(pending)} 封)")
|
|
||||||
print("=" * 60)
|
|
||||||
|
|
||||||
# Sort by email_date
|
|
||||||
sorted_items = sorted(
|
|
||||||
pending.items(),
|
|
||||||
key=lambda x: x[1].get('email_date', '')
|
|
||||||
)
|
|
||||||
|
|
||||||
for msg_id, data in sorted_items:
|
|
||||||
if data.get('status') == 'pending':
|
|
||||||
print(f"\n🆔 {msg_id}")
|
|
||||||
print(f" 主题: {data.get('subject', 'N/A')[:50]}")
|
|
||||||
print(f" 发件人: {data.get('sender', 'N/A')}")
|
|
||||||
print(f" 收件人: {data.get('recipient', 'N/A')}")
|
|
||||||
print(f" 时间: {data.get('email_date', 'N/A')}")
|
|
||||||
print(f" 摘要: {data.get('summary', 'N/A')[:80]}")
|
|
||||||
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
print("\n可用指令:")
|
|
||||||
print(" • 归档 [ID] - 移动到 Archive 文件夹")
|
|
||||||
print(" • 保留 [ID] - 标记已读,留在收件箱")
|
|
||||||
print(" • 删除 [ID] - 移动到 Trash")
|
|
||||||
print(" • 全部处理 - 列出所有并批量操作")
|
|
||||||
|
|
||||||
def archive_email(config, msg_id):
|
|
||||||
"""Archive a specific email by ID"""
|
|
||||||
pending = load_pending()
|
|
||||||
|
|
||||||
if msg_id not in pending:
|
|
||||||
print(f"❌ 未找到邮件 ID: {msg_id}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
email_data = pending[msg_id]
|
|
||||||
uid = email_data.get('imap_uid')
|
|
||||||
|
|
||||||
if not uid:
|
|
||||||
print(f"❌ 邮件 {msg_id} 没有 UID")
|
|
||||||
return False
|
|
||||||
|
|
||||||
try:
|
|
||||||
mail = connect_imap(config)
|
|
||||||
mail.select('INBOX')
|
|
||||||
|
|
||||||
# Copy to Archive
|
|
||||||
result = mail.copy(uid, 'Archive')
|
|
||||||
if result[0] == 'OK':
|
|
||||||
# Mark original as deleted
|
|
||||||
mail.store(uid, '+FLAGS', '\\Deleted')
|
|
||||||
mail.expunge()
|
|
||||||
|
|
||||||
# Update status
|
|
||||||
pending[msg_id]['status'] = 'done'
|
|
||||||
pending[msg_id]['action'] = 'archived'
|
|
||||||
pending[msg_id]['processed_at'] = datetime.now().isoformat()
|
|
||||||
save_pending(pending)
|
|
||||||
|
|
||||||
print(f"✅ 已归档: {email_data.get('subject', 'N/A')[:40]}")
|
|
||||||
return True
|
|
||||||
else:
|
|
||||||
print(f"❌ 归档失败: {result}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ 错误: {e}")
|
|
||||||
return False
|
|
||||||
finally:
|
|
||||||
try:
|
|
||||||
mail.logout()
|
|
||||||
except:
|
|
||||||
pass
|
|
||||||
|
|
||||||
def keep_email(config, msg_id):
|
|
||||||
"""Keep email in inbox, mark as read"""
|
|
||||||
pending = load_pending()
|
|
||||||
|
|
||||||
if msg_id not in pending:
|
|
||||||
print(f"❌ 未找到邮件 ID: {msg_id}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
email_data = pending[msg_id]
|
|
||||||
uid = email_data.get('imap_uid')
|
|
||||||
|
|
||||||
if not uid:
|
|
||||||
print(f"❌ 邮件 {msg_id} 没有 UID")
|
|
||||||
return False
|
|
||||||
|
|
||||||
try:
|
|
||||||
mail = connect_imap(config)
|
|
||||||
mail.select('INBOX')
|
|
||||||
|
|
||||||
# Mark as read (Seen)
|
|
||||||
mail.store(uid, '+FLAGS', '\\Seen')
|
|
||||||
|
|
||||||
# Update status
|
|
||||||
pending[msg_id]['status'] = 'done'
|
|
||||||
pending[msg_id]['action'] = 'kept'
|
|
||||||
pending[msg_id]['processed_at'] = datetime.now().isoformat()
|
|
||||||
save_pending(pending)
|
|
||||||
|
|
||||||
print(f"✅ 已保留: {email_data.get('subject', 'N/A')[:40]}")
|
|
||||||
return True
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ 错误: {e}")
|
|
||||||
return False
|
|
||||||
finally:
|
|
||||||
try:
|
|
||||||
mail.logout()
|
|
||||||
except:
|
|
||||||
pass
|
|
||||||
|
|
||||||
def delete_email(config, msg_id):
|
|
||||||
"""Move email to Trash"""
|
|
||||||
pending = load_pending()
|
|
||||||
|
|
||||||
if msg_id not in pending:
|
|
||||||
print(f"❌ 未找到邮件 ID: {msg_id}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
email_data = pending[msg_id]
|
|
||||||
uid = email_data.get('imap_uid')
|
|
||||||
|
|
||||||
if not uid:
|
|
||||||
print(f"❌ 邮件 {msg_id} 没有 UID")
|
|
||||||
return False
|
|
||||||
|
|
||||||
try:
|
|
||||||
mail = connect_imap(config)
|
|
||||||
mail.select('INBOX')
|
|
||||||
|
|
||||||
# Copy to Trash
|
|
||||||
result = mail.copy(uid, 'Trash')
|
|
||||||
if result[0] == 'OK':
|
|
||||||
mail.store(uid, '+FLAGS', '\\Deleted')
|
|
||||||
mail.expunge()
|
|
||||||
|
|
||||||
# Update status
|
|
||||||
pending[msg_id]['status'] = 'done'
|
|
||||||
pending[msg_id]['action'] = 'deleted'
|
|
||||||
pending[msg_id]['processed_at'] = datetime.now().isoformat()
|
|
||||||
save_pending(pending)
|
|
||||||
|
|
||||||
print(f"✅ 已删除: {email_data.get('subject', 'N/A')[:40]}")
|
|
||||||
return True
|
|
||||||
else:
|
|
||||||
print(f"❌ 删除失败: {result}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ 错误: {e}")
|
|
||||||
return False
|
|
||||||
finally:
|
|
||||||
try:
|
|
||||||
mail.logout()
|
|
||||||
except:
|
|
||||||
pass
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function - show pending list"""
|
|
||||||
import json
|
|
||||||
|
|
||||||
# Load config
|
|
||||||
config_file = Path(__file__).parent / "config.json"
|
|
||||||
with open(config_file) as f:
|
|
||||||
config = json.load(f)
|
|
||||||
|
|
||||||
show_pending_list()
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@@ -1,38 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""Test single email analysis"""
|
|
||||||
import requests
|
|
||||||
import json
|
|
||||||
|
|
||||||
email_data = {
|
|
||||||
"subject": "Fwd: Get 10% off your next order 🎉",
|
|
||||||
"sender": "crac1017@hotmail.com",
|
|
||||||
"body": "Get 10% off your next order! Limited time offer. Shop now and save!"
|
|
||||||
}
|
|
||||||
|
|
||||||
prompt = f"""Analyze this email and determine if it's an advertisement/promotional email.
|
|
||||||
|
|
||||||
Subject: {email_data['subject']}
|
|
||||||
Sender: {email_data['sender']}
|
|
||||||
Body preview: {email_data['body'][:200]}
|
|
||||||
|
|
||||||
Is this an advertisement or promotional email? Answer with ONLY:
|
|
||||||
- "AD: [brief reason]" if it's an ad/promo
|
|
||||||
- "KEEP: [brief reason]" if it's important/legitimate
|
|
||||||
|
|
||||||
Be conservative - only mark as AD if clearly promotional."""
|
|
||||||
|
|
||||||
print("Sending to Qwen3...")
|
|
||||||
try:
|
|
||||||
response = requests.post(
|
|
||||||
"http://localhost:11434/api/generate",
|
|
||||||
json={
|
|
||||||
"model": "qwen3:4b",
|
|
||||||
"prompt": prompt,
|
|
||||||
"stream": False
|
|
||||||
},
|
|
||||||
timeout=120
|
|
||||||
)
|
|
||||||
result = response.json()
|
|
||||||
print(f"Result: {result.get('response', 'error')}")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Error: {e}")
|
|
||||||
@@ -1 +1 @@
|
|||||||
python3
|
python3.13
|
||||||
@@ -1 +1 @@
|
|||||||
/usr/bin/python3
|
python3.13
|
||||||
@@ -1 +0,0 @@
|
|||||||
python3
|
|
||||||
@@ -1 +0,0 @@
|
|||||||
lib
|
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
home = /usr/bin
|
home = /opt/homebrew/opt/python@3.13/bin
|
||||||
include-system-site-packages = false
|
include-system-site-packages = false
|
||||||
version = 3.12.3
|
version = 3.13.0
|
||||||
executable = /usr/bin/python3.12
|
executable = /opt/homebrew/Cellar/python@3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/bin/python3.13
|
||||||
command = /usr/bin/python3 -m venv /home/lyx/.openclaw/workspace/scripts/email_processor/venv
|
command = /opt/homebrew/opt/python@3.13/bin/python3.13 -m venv /Users/ylu/Documents/me/youlu-openclaw-workspace/scripts/email_processor/venv
|
||||||
|
|||||||
Reference in New Issue
Block a user