Clean up stale comments, dead code, and code quality issues

- Remove dead code: unused PENDING_FILE, _extract_domain(), sender_domain
  field, imap_uid fallback, check_unseen_only config key
- Fix stale comments: removed tag references in README and docstrings,
  top_domains -> top_senders, 1-based number -> scan_index number
- Make _extract_email_address public (used by 3 modules)
- Extract _format_address helper to deduplicate from/to parsing
- Batch pending queue disk I/O in review act/accept (load once, save once)
- Reuse cleared pending dict in scan instead of redundant disk load
- Track envelope IDs during scan loop to catch duplicates
- Fix default confidence_threshold 75 -> 85 to match config and docs
- Update get_relevant_examples default n=10 -> n=5 to match caller
- Add graceful error for --recent with non-numeric value
This commit is contained in:
Yanxin Lu
2026-03-05 15:28:05 -08:00
parent 361e983b0f
commit 723c47bbb3
5 changed files with 70 additions and 83 deletions

View File

@@ -35,11 +35,11 @@ The system separates **classification** (what the LLM does) from **confidence**
``` ```
1. [msg_f1d43ea3] Subject: New jobs matching your profile 1. [msg_f1d43ea3] Subject: New jobs matching your profile
From: LinkedIn From: LinkedIn
Tags: [promotion, social, notification] Tags: [promotion, social, newsletter]
Suggested: delete (50%) Suggested: delete (50%)
2. [msg_60c56a87] Subject: Your order shipped 2. [msg_60c56a87] Subject: Your order shipped
From: Amazon From: Amazon
Tags: [shipping, confirmation, notification] Tags: [shipping, confirmation, receipt]
Suggested: archive (50%) Suggested: archive (50%)
``` ```
@@ -147,16 +147,16 @@ Example: sender `noreply@example.com` has 8 entries with action `delete` and 4 e
Look at the subject lines, summaries, and current tags of the entries that got different actions. Identify the pattern — what makes the "delete" emails different from the "keep" emails? Look at the subject lines, summaries, and current tags of the entries that got different actions. Identify the pattern — what makes the "delete" emails different from the "keep" emails?
Example: Example:
- Deleted emails: subjects like "50% off sale", "Weekly deals" → tags: `[promotion, notification, newsletter]` - Deleted emails: subjects like "50% off sale", "Weekly deals" → tags: `[promotion, account, newsletter]`
- Kept emails: subjects like "Your password was changed", "New login from Chrome" → tags: `[security, notification, update]` - Kept emails: subjects like "Your password was changed", "New login from Chrome" → tags: `[security, account, alert]`
The shared tag `notification` is causing these to match as the same signature, dragging confidence down. The shared tag `account` is causing these to match as the same signature, dragging confidence down.
**Step 3: Determine if a new tag would fix it.** **Step 3: Determine if a new tag would fix it.**
Ask: is there a category that applies to one group but not the other? In the example above, an `account` tag would distinguish password/login emails from promotional emails. Check if the tag already exists in `TAG_TAXONOMY` in `classifier.py` — it might just be that the LLM isn't using it consistently. Ask: is there a category that applies to one group but not the other? In the example above, the LLM is assigning `account` to both promotional and security emails from the same service. Check if the problem is LLM consistency (the tag exists but the model uses it too broadly) or a missing tag (no existing tag can distinguish the two types).
If the tag already exists: the problem is LLM consistency, not the taxonomy. Consider adjusting the prompt or few-shot examples. If the tag exists but is overused: the problem is LLM consistency, not the taxonomy. Consider adjusting the prompt or few-shot examples.
If the tag doesn't exist: propose a new tag. If the tag doesn't exist: propose a new tag.
@@ -169,7 +169,7 @@ Before adding, check that the new tag:
**Step 5: Add the tag to `TAG_TAXONOMY` in `classifier.py`.** **Step 5: Add the tag to `TAG_TAXONOMY` in `classifier.py`.**
Add the new tag to the `TAG_TAXONOMY` list in `classifier.py:30-38`. Keep the list organized by category. The LLM prompt automatically picks up the updated list on the next scan. Add the new tag to the `TAG_TAXONOMY` list in `classifier.py:30-37`. Keep the list organized by category. The LLM prompt automatically picks up the updated list on the next scan.
**Step 6: Decide whether to wipe history.** **Step 6: Decide whether to wipe history.**
@@ -191,7 +191,7 @@ Check the logs for the affected senders:
- **Only add tags, never rename.** Renaming `billing` to `finance` means old entries with `billing` never match new entries with `finance`. If you must rename, keep both in the taxonomy. - **Only add tags, never rename.** Renaming `billing` to `finance` means old entries with `billing` never match new entries with `finance`. If you must rename, keep both in the taxonomy.
- **Avoid deleting tags.** Old entries with deleted tags become slightly less useful (fewer matching tags) but don't cause incorrect matches. Only delete a tag if it's actively causing confusion (e.g., the LLM uses it inconsistently and it's hurting overlap calculations). - **Avoid deleting tags.** Old entries with deleted tags become slightly less useful (fewer matching tags) but don't cause incorrect matches. Only delete a tag if it's actively causing confusion (e.g., the LLM uses it inconsistently and it's hurting overlap calculations).
- **Keep the taxonomy small.** More tags = more choices for the LLM = more inconsistency. The taxonomy should have the minimum number of tags needed to distinguish email types that deserve different actions. 20-30 tags is a reasonable range. - **Keep the taxonomy small.** More tags = more choices for the LLM = more inconsistency. The taxonomy should have the minimum number of tags needed to distinguish email types that deserve different actions. 10-20 tags is a reasonable range.
## Configuration ## Configuration
@@ -298,7 +298,7 @@ The top 5 most relevant examples are injected into the prompt as few-shot demons
### Fixed tag taxonomy ### Fixed tag taxonomy
Tags are defined in `classifier.py` as `TAG_TAXONOMY` — a manually curated list of 21 categories. The LLM must pick from this list (invalid tags are silently dropped). The taxonomy should stay fixed to keep history matching stable. See "Refining the Tag Taxonomy" above for when and how to update it. Tags are defined in `classifier.py` as `TAG_TAXONOMY` — a manually curated list of 14 categories. The LLM must pick from this list (invalid tags are silently dropped). The taxonomy should stay fixed to keep history matching stable. See "Refining the Tag Taxonomy" above for when and how to update it.
### `keep` means unread ### `keep` means unread

View File

@@ -47,8 +47,8 @@ def _build_prompt(email_data, config):
max_body = config.get("rules", {}).get("max_body_length", 1000) max_body = config.get("rules", {}).get("max_body_length", 1000)
# Gather learning context from decision history # Gather learning context from decision history
examples = decision_store.get_relevant_examples(email_data, n=10) examples = decision_store.get_relevant_examples(email_data, n=5)
sender_email = decision_store._extract_email_address(email_data.get("sender", "")) sender_email = decision_store.extract_email_address(email_data.get("sender", ""))
sender_stats = decision_store.get_sender_stats(sender_email) if sender_email else {} sender_stats = decision_store.get_sender_stats(sender_email) if sender_email else {}
known_labels = decision_store.get_known_labels() known_labels = decision_store.get_known_labels()
@@ -80,7 +80,7 @@ def _build_prompt(email_data, config):
# Section 4: Few-shot examples (top 5 most relevant past decisions) # Section 4: Few-shot examples (top 5 most relevant past decisions)
if examples: if examples:
parts.append("\n--- Past decisions (learn from these) ---") parts.append("\n--- Past decisions (learn from these) ---")
for ex in examples[:5]: for ex in examples:
parts.append( parts.append(
f"From: {ex['sender'][:60]} | To: {ex['recipient'][:40]} | " f"From: {ex['sender'][:60]} | To: {ex['recipient'][:40]} | "
f"Subject: {ex['subject'][:60]} -> {ex['action']}" f"Subject: {ex['subject'][:60]} -> {ex['action']}"
@@ -135,9 +135,9 @@ def _parse_response(output):
Expected format (one per line): Expected format (one per line):
Action: delete Action: delete
Tags: promotion, marketing, newsletter Tags: promotion, newsletter, social
Summary: Promotional offer from retailer Summary: Promotional offer from retailer
Reason: Clearly a marketing email with discount offer Reason: Clearly a promotional email with discount offer
Falls back to safe defaults (keep, empty tags) on parse failure. Falls back to safe defaults (keep, empty tags) on parse failure.
""" """

View File

@@ -4,8 +4,7 @@
"model": "kamekichi128/qwen3-4b-instruct-2507:latest" "model": "kamekichi128/qwen3-4b-instruct-2507:latest"
}, },
"rules": { "rules": {
"max_body_length": 1000, "max_body_length": 1000
"check_unseen_only": true
}, },
"automation": { "automation": {
"confidence_threshold": 85 "confidence_threshold": 85

View File

@@ -25,7 +25,6 @@ from collections import Counter
SCRIPT_DIR = Path(__file__).parent SCRIPT_DIR = Path(__file__).parent
DATA_DIR = SCRIPT_DIR / "data" DATA_DIR = SCRIPT_DIR / "data"
HISTORY_FILE = DATA_DIR / "decision_history.json" HISTORY_FILE = DATA_DIR / "decision_history.json"
PENDING_FILE = DATA_DIR / "pending_emails.json"
# Stop-words excluded from subject keyword matching to reduce noise. # Stop-words excluded from subject keyword matching to reduce noise.
_STOP_WORDS = {"re", "fwd", "the", "a", "an", "is", "to", "for", "and", "or", "your", "you"} _STOP_WORDS = {"re", "fwd", "the", "a", "an", "is", "to", "for", "and", "or", "your", "you"}
@@ -50,18 +49,7 @@ def _save_history(history):
json.dump(history, f, indent=2, ensure_ascii=False) json.dump(history, f, indent=2, ensure_ascii=False)
def _extract_domain(sender): def extract_email_address(sender):
"""Extract the domain part from a sender string.
Handles formats like:
"Display Name <user@example.com>"
user@example.com
"""
match = re.search(r"[\w.+-]+@([\w.-]+)", sender)
return match.group(1).lower() if match else ""
def _extract_email_address(sender):
"""Extract the full email address from a sender string.""" """Extract the full email address from a sender string."""
match = re.search(r"([\w.+-]+@[\w.-]+)", sender) match = re.search(r"([\w.+-]+@[\w.-]+)", sender)
return match.group(1).lower() if match else sender.lower() return match.group(1).lower() if match else sender.lower()
@@ -85,7 +73,6 @@ def record_decision(email_data, action, source="user", tags=None):
entry = { entry = {
"timestamp": datetime.now().isoformat(timespec="seconds"), "timestamp": datetime.now().isoformat(timespec="seconds"),
"sender": email_data.get("sender", ""), "sender": email_data.get("sender", ""),
"sender_domain": _extract_domain(email_data.get("sender", "")),
"recipient": email_data.get("recipient", ""), "recipient": email_data.get("recipient", ""),
"subject": email_data.get("subject", ""), "subject": email_data.get("subject", ""),
"summary": email_data.get("summary", ""), "summary": email_data.get("summary", ""),
@@ -98,7 +85,7 @@ def record_decision(email_data, action, source="user", tags=None):
return entry return entry
def get_relevant_examples(email_data, n=10): def get_relevant_examples(email_data, n=5):
"""Find the N most relevant past decisions for a given email. """Find the N most relevant past decisions for a given email.
Relevance is scored by two signals: Relevance is scored by two signals:
@@ -112,7 +99,7 @@ def get_relevant_examples(email_data, n=10):
if not history: if not history:
return [] return []
target_email = _extract_email_address(email_data.get("sender", "")) target_email = extract_email_address(email_data.get("sender", ""))
target_words = ( target_words = (
set(re.findall(r"\w+", email_data.get("subject", "").lower())) - _STOP_WORDS set(re.findall(r"\w+", email_data.get("subject", "").lower())) - _STOP_WORDS
) )
@@ -122,7 +109,7 @@ def get_relevant_examples(email_data, n=10):
score = 0 score = 0
# Signal 1: sender email match # Signal 1: sender email match
if target_email and _extract_email_address(entry.get("sender", "")) == target_email: if target_email and extract_email_address(entry.get("sender", "")) == target_email:
score += 3 score += 3
# Signal 2: subject keyword overlap # Signal 2: subject keyword overlap
@@ -146,7 +133,7 @@ def get_sender_stats(sender_email):
history = _load_history() history = _load_history()
actions = Counter() actions = Counter()
for entry in history: for entry in history:
if _extract_email_address(entry.get("sender", "")) == sender_email: if extract_email_address(entry.get("sender", "")) == sender_email:
actions[entry["action"]] += 1 actions[entry["action"]] += 1
return dict(actions) return dict(actions)
@@ -171,7 +158,7 @@ def compute_confidence(sender_email, action, tags):
# Find past decisions with same sender and sufficient tag overlap # Find past decisions with same sender and sufficient tag overlap
matches = [] matches = []
for entry in history: for entry in history:
entry_email = _extract_email_address(entry.get("sender", "")) entry_email = extract_email_address(entry.get("sender", ""))
if entry_email != sender_email: if entry_email != sender_email:
continue continue
@@ -216,7 +203,7 @@ def get_known_labels():
def get_all_stats(): def get_all_stats():
"""Compute aggregate statistics across the full decision history. """Compute aggregate statistics across the full decision history.
Returns a dict with keys: total, by_action, by_source, top_domains. Returns a dict with keys: total, by_action, by_source, top_senders.
Returns None if history is empty. Returns None if history is empty.
""" """
history = _load_history() history = _load_history()
@@ -228,7 +215,7 @@ def get_all_stats():
by_source = Counter(e["source"] for e in history) by_source = Counter(e["source"] for e in history)
# Top 10 sender addresses by decision count # Top 10 sender addresses by decision count
sender_counts = Counter(_extract_email_address(e.get("sender", "")) for e in history) sender_counts = Counter(extract_email_address(e.get("sender", "")) for e in history)
top_senders = sender_counts.most_common(10) top_senders = sender_counts.most_common(10)
return { return {

View File

@@ -121,6 +121,20 @@ def read_message(envelope_id):
return _himalaya("message", "read", "--preview", "--no-headers", str(envelope_id)) return _himalaya("message", "read", "--preview", "--no-headers", str(envelope_id))
def _format_address(addr_field):
"""Format a himalaya address field (dict, list, or string) into a display string."""
if isinstance(addr_field, dict):
name = addr_field.get("name", "")
addr = addr_field.get("addr", "")
return f"{name} <{addr}>" if name else addr
elif isinstance(addr_field, list) and addr_field:
first = addr_field[0]
name = first.get("name", "")
addr = first.get("addr", "")
return f"{name} <{addr}>" if name else addr
return str(addr_field)
def build_email_data(envelope, body, config): def build_email_data(envelope, body, config):
"""Build the email_data dict expected by classifier and decision_store. """Build the email_data dict expected by classifier and decision_store.
@@ -129,40 +143,11 @@ def build_email_data(envelope, body, config):
""" """
max_body = config.get("rules", {}).get("max_body_length", 1000) max_body = config.get("rules", {}).get("max_body_length", 1000)
# himalaya envelope JSON uses "from" as a nested object or string
sender = envelope.get("from", {})
if isinstance(sender, dict):
# Format: {"name": "Display Name", "addr": "user@example.com"}
name = sender.get("name", "")
addr = sender.get("addr", "")
sender_str = f"{name} <{addr}>" if name else addr
elif isinstance(sender, list) and sender:
first = sender[0]
name = first.get("name", "")
addr = first.get("addr", "")
sender_str = f"{name} <{addr}>" if name else addr
else:
sender_str = str(sender)
# Same for "to"
to = envelope.get("to", {})
if isinstance(to, dict):
name = to.get("name", "")
addr = to.get("addr", "")
to_str = f"{name} <{addr}>" if name else addr
elif isinstance(to, list) and to:
first = to[0]
name = first.get("name", "")
addr = first.get("addr", "")
to_str = f"{name} <{addr}>" if name else addr
else:
to_str = str(to)
return { return {
"id": str(envelope.get("id", "")), "id": str(envelope.get("id", "")),
"subject": envelope.get("subject", "(No Subject)"), "subject": envelope.get("subject", "(No Subject)"),
"sender": sender_str, "sender": _format_address(envelope.get("from", {})),
"recipient": to_str, "recipient": _format_address(envelope.get("to", {})),
"date": envelope.get("date", ""), "date": envelope.get("date", ""),
"body": body[:max_body], "body": body[:max_body],
} }
@@ -322,7 +307,7 @@ def cmd_scan(config, recent=None, dry_run=False):
# Load automation threshold # Load automation threshold
automation = config.get("automation", {}) automation = config.get("automation", {})
confidence_threshold = automation.get("confidence_threshold", 75) confidence_threshold = automation.get("confidence_threshold", 85)
# Fetch envelopes via himalaya # Fetch envelopes via himalaya
if recent: if recent:
@@ -340,9 +325,8 @@ def cmd_scan(config, recent=None, dry_run=False):
queued = 0 queued = 0
skipped = 0 skipped = 0
# Load pending queue once to skip already-queued emails # Reuse the cleared pending dict from above to skip already-queued emails
pending = load_pending() pending_eids = {v.get("envelope_id") for v in cleared.values() if v.get("status") == "pending"}
pending_eids = {v.get("envelope_id") for v in pending.values() if v.get("status") == "pending"}
for envelope in envelopes: for envelope in envelopes:
eid = envelope.get("id", "?") eid = envelope.get("id", "?")
@@ -353,6 +337,9 @@ def cmd_scan(config, recent=None, dry_run=False):
skipped += 1 skipped += 1
continue continue
# Track this eid so duplicates within the same envelope list are caught
pending_eids.add(str(eid))
print(f"[{eid}] ", end="", flush=True) print(f"[{eid}] ", end="", flush=True)
# Read message body without marking as seen # Read message body without marking as seen
@@ -370,7 +357,7 @@ def cmd_scan(config, recent=None, dry_run=False):
) )
# Compute confidence from decision history # Compute confidence from decision history
sender_email = decision_store._extract_email_address(email_data.get("sender", "")) sender_email = decision_store.extract_email_address(email_data.get("sender", ""))
confidence = decision_store.compute_confidence(sender_email, action, tags) confidence = decision_store.compute_confidence(sender_email, action, tags)
tags_str = ", ".join(tags) if tags else "(none)" tags_str = ", ".join(tags) if tags else "(none)"
@@ -479,7 +466,7 @@ def cmd_review_act(selector, action):
"""Execute an action on one or more pending emails. """Execute an action on one or more pending emails.
Args: Args:
selector: a 1-based number, a msg_id string, or "all". selector: a scan_index number, a msg_id string, or "all".
action: one of delete/archive/keep/mark_read/label:<name>. action: one of delete/archive/keep/mark_read/label:<name>.
""" """
# Validate action # Validate action
@@ -507,8 +494,11 @@ def cmd_review_act(selector, action):
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log" log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
# Execute action on each target # Execute action on each target
pending = load_pending()
pending_dirty = False
for msg_id, data in targets: for msg_id, data in targets:
eid = data.get("envelope_id") or data.get("imap_uid") eid = data.get("envelope_id")
if not eid: if not eid:
print(f" {msg_id}: No envelope ID, skipping") print(f" {msg_id}: No envelope ID, skipping")
continue continue
@@ -519,11 +509,10 @@ def cmd_review_act(selector, action):
decision_store.record_decision(data, action, source="user", tags=data.get("tags", [])) decision_store.record_decision(data, action, source="user", tags=data.get("tags", []))
# Mark as done in pending queue # Mark as done in pending queue
pending = load_pending()
pending[msg_id]["status"] = "done" pending[msg_id]["status"] = "done"
pending[msg_id]["action"] = action pending[msg_id]["action"] = action
pending[msg_id]["processed_at"] = datetime.now().isoformat() pending[msg_id]["processed_at"] = datetime.now().isoformat()
save_pending(pending) pending_dirty = True
log_result(log_file, data, f"REVIEW:{action}", data.get("reason", "")) log_result(log_file, data, f"REVIEW:{action}", data.get("reason", ""))
print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})") print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})")
@@ -531,6 +520,9 @@ def cmd_review_act(selector, action):
log_result(log_file, data, f"REVIEW_FAILED:{action}", data.get("reason", "")) log_result(log_file, data, f"REVIEW_FAILED:{action}", data.get("reason", ""))
print(f" {msg_id}: {action} -> FAILED") print(f" {msg_id}: {action} -> FAILED")
if pending_dirty:
save_pending(pending)
def cmd_review_accept(): def cmd_review_accept():
"""Accept all classifier suggestions for pending emails. """Accept all classifier suggestions for pending emails.
@@ -547,13 +539,16 @@ def cmd_review_accept():
LOGS_DIR.mkdir(exist_ok=True) LOGS_DIR.mkdir(exist_ok=True)
log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log" log_file = LOGS_DIR / f"{datetime.now().strftime('%Y-%m-%d')}.log"
pending = load_pending()
pending_dirty = False
for msg_id, data in sorted_items: for msg_id, data in sorted_items:
action = data.get("suggested_action") action = data.get("suggested_action")
if not action: if not action:
print(f" {msg_id}: No suggestion, skipping") print(f" {msg_id}: No suggestion, skipping")
continue continue
eid = data.get("envelope_id") or data.get("imap_uid") eid = data.get("envelope_id")
if not eid: if not eid:
print(f" {msg_id}: No envelope ID, skipping") print(f" {msg_id}: No envelope ID, skipping")
continue continue
@@ -562,11 +557,10 @@ def cmd_review_accept():
if success: if success:
decision_store.record_decision(data, action, source="user", tags=data.get("tags", [])) decision_store.record_decision(data, action, source="user", tags=data.get("tags", []))
pending = load_pending()
pending[msg_id]["status"] = "done" pending[msg_id]["status"] = "done"
pending[msg_id]["action"] = action pending[msg_id]["action"] = action
pending[msg_id]["processed_at"] = datetime.now().isoformat() pending[msg_id]["processed_at"] = datetime.now().isoformat()
save_pending(pending) pending_dirty = True
log_result(log_file, data, f"ACCEPT:{action}", data.get("reason", "")) log_result(log_file, data, f"ACCEPT:{action}", data.get("reason", ""))
print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})") print(f" {msg_id}: {action} -> OK ({data['subject'][:40]})")
@@ -574,6 +568,9 @@ def cmd_review_accept():
log_result(log_file, data, f"ACCEPT_FAILED:{action}", data.get("reason", "")) log_result(log_file, data, f"ACCEPT_FAILED:{action}", data.get("reason", ""))
print(f" {msg_id}: {action} -> FAILED") print(f" {msg_id}: {action} -> FAILED")
if pending_dirty:
save_pending(pending)
def _resolve_target(selector, sorted_items): def _resolve_target(selector, sorted_items):
"""Resolve a selector (scan_index number or msg_id) to a (msg_id, data) tuple. """Resolve a selector (scan_index number or msg_id) to a (msg_id, data) tuple.
@@ -611,7 +608,7 @@ def cmd_stats():
"""Print a summary of the decision history. """Print a summary of the decision history.
Shows total decisions, user vs. auto breakdown, action distribution, Shows total decisions, user vs. auto breakdown, action distribution,
top sender domains, and custom labels. top senders, and custom labels.
""" """
stats = decision_store.get_all_stats() stats = decision_store.get_all_stats()
@@ -672,7 +669,11 @@ if __name__ == "__main__":
i = 0 i = 0
while i < len(args): while i < len(args):
if args[i] == "--recent" and i + 1 < len(args): if args[i] == "--recent" and i + 1 < len(args):
try:
recent = int(args[i + 1]) recent = int(args[i + 1])
except ValueError:
print(f"--recent requires a number, got: {args[i + 1]}")
sys.exit(1)
i += 2 i += 2
elif args[i] == "--dry-run": elif args[i] == "--dry-run":
dry_run = True dry_run = True