Clean up stale comments, dead code, and code quality issues

- Remove dead code: unused PENDING_FILE, _extract_domain(), sender_domain field, imap_uid fallback, check_unseen_only config key - Fix stale comments: removed tag references in README and docstrings, top_domains -> top_senders, 1-based number -> scan_index number - Make _extract_email_address public (used by 3 modules) - Extract _format_address helper to deduplicate from/to parsing - Batch pending queue disk I/O in review act/accept (load once, save once) - Reuse cleared pending dict in scan instead of redundant disk load - Track envelope IDs during scan loop to catch duplicates - Fix default confidence_threshold 75 -> 85 to match config and docs - Update get_relevant_examples default n=10 -> n=5 to match caller - Add graceful error for --recent with non-numeric value
2026-03-05 15:28:05 -08:00
parent 361e983b0f
commit 723c47bbb3
5 changed files with 70 additions and 83 deletions
--- a/scripts/email_processor/README.md
+++ b/scripts/email_processor/README.md
@@ -35,11 +35,11 @@ The system separates **classification** (what the LLM does) from **confidence**
   ```
     1. [msg_f1d43ea3]  Subject: New jobs matching your profile
        From: LinkedIn
-        Tags: [promotion, social, notification]
+        Tags: [promotion, social, newsletter]
        Suggested: delete (50%)
     2. [msg_60c56a87]  Subject: Your order shipped
        From: Amazon
-        Tags: [shipping, confirmation, notification]
+        Tags: [shipping, confirmation, receipt]
        Suggested: archive (50%)
   ```

@@ -147,16 +147,16 @@ Example: sender `noreply@example.com` has 8 entries with action `delete` and 4 e
 Look at the subject lines, summaries, and current tags of the entries that got different actions. Identify the pattern — what makes the "delete" emails different from the "keep" emails?

 Example:
- Deleted emails: subjects like "50% off sale", "Weekly deals" → tags: `[promotion, notification, newsletter]`
- Kept emails: subjects like "Your password was changed", "New login from Chrome" → tags: `[security, notification, update]`
+- Deleted emails: subjects like "50% off sale", "Weekly deals" → tags: `[promotion, account, newsletter]`
+- Kept emails: subjects like "Your password was changed", "New login from Chrome" → tags: `[security, account, alert]`

-The shared tag `notification` is causing these to match as the same signature, dragging confidence down.
+The shared tag `account` is causing these to match as the same signature, dragging confidence down.

 **Step 3: Determine if a new tag would fix it.**

-Ask: is there a category that applies to one group but not the other? In the example above, an `account` tag would distinguish password/login emails from promotional emails. Check if the tag already exists in `TAG_TAXONOMY` in `classifier.py` — it might just be that the LLM isn't using it consistently.
+Ask: is there a category that applies to one group but not the other? In the example above, the LLM is assigning `account` to both promotional and security emails from the same service. Check if the problem is LLM consistency (the tag exists but the model uses it too broadly) or a missing tag (no existing tag can distinguish the two types).

-If the tag already exists: the problem is LLM consistency, not the taxonomy. Consider adjusting the prompt or few-shot examples.
+If the tag exists but is overused: the problem is LLM consistency, not the taxonomy. Consider adjusting the prompt or few-shot examples.

 If the tag doesn't exist: propose a new tag.

@@ -169,7 +169,7 @@ Before adding, check that the new tag:

 **Step 5: Add the tag to `TAG_TAXONOMY` in `classifier.py`.**

-Add the new tag to the `TAG_TAXONOMY` list in `classifier.py:30-38`. Keep the list organized by category. The LLM prompt automatically picks up the updated list on the next scan.
+Add the new tag to the `TAG_TAXONOMY` list in `classifier.py:30-37`. Keep the list organized by category. The LLM prompt automatically picks up the updated list on the next scan.

 **Step 6: Decide whether to wipe history.**

@@ -191,7 +191,7 @@ Check the logs for the affected senders:

 - **Only add tags, never rename.** Renaming `billing` to `finance` means old entries with `billing` never match new entries with `finance`. If you must rename, keep both in the taxonomy.
 - **Avoid deleting tags.** Old entries with deleted tags become slightly less useful (fewer matching tags) but don't cause incorrect matches. Only delete a tag if it's actively causing confusion (e.g., the LLM uses it inconsistently and it's hurting overlap calculations).
- **Keep the taxonomy small.** More tags = more choices for the LLM = more inconsistency. The taxonomy should have the minimum number of tags needed to distinguish email types that deserve different actions. 20-30 tags is a reasonable range.
+- **Keep the taxonomy small.** More tags = more choices for the LLM = more inconsistency. The taxonomy should have the minimum number of tags needed to distinguish email types that deserve different actions. 10-20 tags is a reasonable range.

 ## Configuration

@@ -298,7 +298,7 @@ The top 5 most relevant examples are injected into the prompt as few-shot demons

 ### Fixed tag taxonomy

-Tags are defined in `classifier.py` as `TAG_TAXONOMY` — a manually curated list of 21 categories. The LLM must pick from this list (invalid tags are silently dropped). The taxonomy should stay fixed to keep history matching stable. See "Refining the Tag Taxonomy" above for when and how to update it.
+Tags are defined in `classifier.py` as `TAG_TAXONOMY` — a manually curated list of 14 categories. The LLM must pick from this list (invalid tags are silently dropped). The taxonomy should stay fixed to keep history matching stable. See "Refining the Tag Taxonomy" above for when and how to update it.

 ### `keep` means unread