Remove scan_index, use envelope_id (IMAP UID) as single identifier

scan_index created confusion for the OpenClaw agent which would
sometimes reference emails by scan_index and sometimes by envelope_id.
Since himalaya's envelope ID is an IMAP UID (stable, never recycled),
it works as the sole identifier for review commands.
This commit is contained in:
Yanxin Lu
2026-03-07 22:01:02 -08:00
parent 2c00649488
commit 3c54098b1d
3 changed files with 37 additions and 53 deletions

View File

@@ -31,22 +31,24 @@ The system separates **classification** (what the LLM does) from **confidence**
1. **Cron runs `scan`.** For each email, the LLM suggests an action and assigns tags from a fixed taxonomy. Since there's no history yet, `compute_confidence` returns 50% (below the 85% threshold), so everything gets queued. 1. **Cron runs `scan`.** For each email, the LLM suggests an action and assigns tags from a fixed taxonomy. Since there's no history yet, `compute_confidence` returns 50% (below the 85% threshold), so everything gets queued.
2. **You run `review list`.** It prints what's pending. Item numbers are stable within a scan cycle — they don't shift when earlier items are resolved: 2. **You run `review list`.** It prints what's pending, identified by envelope ID (himalaya's IMAP UID):
``` ```
1. [msg_f1d43ea3] Subject: New jobs matching your profile [42] msg_f1d43ea3
Subject: New jobs matching your profile
From: LinkedIn From: LinkedIn
Tags: [promotion, social, newsletter] Tags: [promotion, social, newsletter]
Suggested: delete (50%) Suggested: delete (50%)
2. [msg_60c56a87] Subject: Your order shipped [43] msg_60c56a87
Subject: Your order shipped
From: Amazon From: Amazon
Tags: [shipping, confirmation, receipt] Tags: [shipping, confirmation, receipt]
Suggested: archive (50%) Suggested: archive (50%)
``` ```
3. **You act on them.** Either individually or in bulk. Numbers stay stable — after deleting item 1, item 2 is still 2: 3. **You act on them.** Either individually or in bulk, using the envelope ID:
```bash ```bash
./email-processor.sh review 1 delete # agree with suggestion ./email-processor.sh review 42 delete # agree with suggestion
./email-processor.sh review 2 archive # still #2, not renumbered ./email-processor.sh review 43 archive # archive by envelope ID
./email-processor.sh review accept # accept all suggestions at once ./email-processor.sh review accept # accept all suggestions at once
``` ```
Each command executes via himalaya and appends to `decision_history.json` with tags. Each command executes via himalaya and appends to `decision_history.json` with tags.
@@ -88,9 +90,9 @@ chmod +x email-processor.sh
# --- Review --- # --- Review ---
./email-processor.sh review list # show pending queue ./email-processor.sh review list # show pending queue
./email-processor.sh review 1 delete # delete item #1 ./email-processor.sh review 42 delete # delete envelope 42
./email-processor.sh review 3 archive # #3 is still #3 even after #1 was deleted ./email-processor.sh review 43 archive # archive envelope 43
./email-processor.sh review msg_f1d43ea3 archive # archive by ID ./email-processor.sh review msg_f1d43ea3 archive # archive by msg_id
./email-processor.sh review all delete # delete all pending ./email-processor.sh review all delete # delete all pending
./email-processor.sh review accept # accept all suggestions ./email-processor.sh review accept # accept all suggestions
@@ -238,7 +240,7 @@ ollama list # should show kamekichi128/qwen3-4b-instruct-2507:latest
./email-processor.sh review list ./email-processor.sh review list
# 6. Act on a queued email to seed decision history # 6. Act on a queued email to seed decision history
./email-processor.sh review 1 delete ./email-processor.sh review 42 delete
# 7. Check that the decision was recorded # 7. Check that the decision was recorded
./email-processor.sh stats ./email-processor.sh stats
@@ -304,9 +306,9 @@ Tags are defined in `classifier.py` as `TAG_TAXONOMY` — a manually curated lis
The `keep` action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from `mark_read`, which dismisses low-priority emails without moving them. The `keep` action is a deliberate no-op — it leaves the email unread in the inbox, meaning it needs human attention. This is distinct from `mark_read`, which dismisses low-priority emails without moving them.
### Stable item numbers during review ### Envelope IDs
Each pending item gets a `scan_index` assigned sequentially during `scan`. These numbers are stable within a scan cycle — resolving item 1 doesn't renumber item 2 to 1. This matters when an agent (like OpenClaw) issues multiple `review <n> <action>` commands in sequence: without stable indices, the queue renumbers after each action, causing later commands to target the wrong emails. Indices reset to 1 on each new `scan` (done items from the previous cycle are cleared at scan start). Emails are identified by their envelope ID, which is himalaya's IMAP UID — a stable, unique identifier assigned by the mail server. UIDs don't shift when other messages are deleted or moved, so the same envelope ID always refers to the same email. Review commands use envelope IDs directly (e.g., `review 93 delete`). The `msg_id` hash (e.g., `msg_f1d43ea3`) is an internal key for the pending queue and can also be used as a selector.
### Fail-safe classification ### Fail-safe classification

View File

@@ -7,7 +7,7 @@
# ./email-processor.sh scan --dry-run # classify only, no changes # ./email-processor.sh scan --dry-run # classify only, no changes
# ./email-processor.sh scan --recent 7 --dry-run # combine both # ./email-processor.sh scan --recent 7 --dry-run # combine both
# ./email-processor.sh review list # show pending queue # ./email-processor.sh review list # show pending queue
# ./email-processor.sh review 1 delete # act on email #1 # ./email-processor.sh review 93 delete # act on envelope 93
# ./email-processor.sh review all delete # act on all pending # ./email-processor.sh review all delete # act on all pending
# ./email-processor.sh review accept # accept all suggestions # ./email-processor.sh review accept # accept all suggestions
# ./email-processor.sh stats # show history stats # ./email-processor.sh stats # show history stats

View File

@@ -15,7 +15,7 @@ Subcommands:
python main.py scan --dry-run # classify only, no changes python main.py scan --dry-run # classify only, no changes
python main.py scan --recent 7 --dry-run # combine both python main.py scan --recent 7 --dry-run # combine both
python main.py review list # print pending queue python main.py review list # print pending queue
python main.py review <num-or-id> <action> # act on one email python main.py review <envelope_id> <action> # act on one email
python main.py review all <action> # act on all pending python main.py review all <action> # act on all pending
python main.py review accept # accept all suggestions python main.py review accept # accept all suggestions
python main.py stats # show decision history python main.py stats # show decision history
@@ -222,9 +222,7 @@ def add_to_pending(email_data, summary, reason, action_suggestion, confidence, t
Stores the classifier's suggestion, computed confidence, and tags Stores the classifier's suggestion, computed confidence, and tags
alongside the email metadata so the user can see what the model thought. alongside the email metadata so the user can see what the model thought.
Each item gets a stable scan_index (assigned sequentially within a scan Uses envelope_id as the primary identifier for review commands.
cycle) so that review commands can reference items by number without
indices shifting after earlier items are resolved.
""" """
pending = load_pending() pending = load_pending()
@@ -233,14 +231,6 @@ def add_to_pending(email_data, summary, reason, action_suggestion, confidence, t
key = f"{eid}_{email_data['subject']}" key = f"{eid}_{email_data['subject']}"
msg_id = f"msg_{hashlib.md5(key.encode()).hexdigest()[:8]}" msg_id = f"msg_{hashlib.md5(key.encode()).hexdigest()[:8]}"
# Assign the next scan_index: max of existing pending items + 1
existing_indices = [
v.get("scan_index", 0)
for v in pending.values()
if v.get("status") == "pending"
]
next_index = max(existing_indices, default=0) + 1
pending[msg_id] = { pending[msg_id] = {
"envelope_id": eid, "envelope_id": eid,
"subject": email_data["subject"], "subject": email_data["subject"],
@@ -254,7 +244,6 @@ def add_to_pending(email_data, summary, reason, action_suggestion, confidence, t
"email_date": email_data.get("date", ""), "email_date": email_data.get("date", ""),
"status": "pending", "status": "pending",
"found_at": datetime.now().isoformat(), "found_at": datetime.now().isoformat(),
"scan_index": next_index,
} }
save_pending(pending) save_pending(pending)
return msg_id return msg_id
@@ -295,8 +284,7 @@ def cmd_scan(config, recent=None, dry_run=False):
print(f"Email Processor - {mode}") print(f"Email Processor - {mode}")
print("=" * 50) print("=" * 50)
# Clear done items from previous scan cycles so new items get # Clear done items from previous scan cycles
# scan_index values starting from 1.
pending = load_pending() pending = load_pending()
cleared = {k: v for k, v in pending.items() if v.get("status") != "done"} cleared = {k: v for k, v in pending.items() if v.get("status") != "done"}
if len(cleared) < len(pending): if len(cleared) < len(pending):
@@ -418,17 +406,17 @@ def cmd_scan(config, recent=None, dry_run=False):
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _get_pending_items(): def _get_pending_items():
"""Return only pending (not done) items, sorted by scan_index.""" """Return only pending (not done) items, sorted by envelope_id."""
pending = load_pending() pending = load_pending()
items = {k: v for k, v in pending.items() if v.get("status") == "pending"} items = {k: v for k, v in pending.items() if v.get("status") == "pending"}
sorted_items = sorted(items.items(), key=lambda x: x[1].get("scan_index", 0)) sorted_items = sorted(items.items(), key=lambda x: int(x[1].get("envelope_id", 0)))
return sorted_items return sorted_items
def cmd_review_list(): def cmd_review_list():
"""Print the pending queue and exit. """Print the pending queue and exit.
Shows each email with its number, ID, subject, sender, summary, Shows each email with its envelope ID, subject, sender, summary,
and the classifier's suggested action with confidence. and the classifier's suggested action with confidence.
""" """
sorted_items = _get_pending_items() sorted_items = _get_pending_items()
@@ -441,12 +429,12 @@ def cmd_review_list():
print("=" * 60) print("=" * 60)
for msg_id, data in sorted_items: for msg_id, data in sorted_items:
num = data.get("scan_index", "?") eid = data.get("envelope_id", "?")
suggested = data.get("suggested_action", "?") suggested = data.get("suggested_action", "?")
conf = data.get("confidence", "?") conf = data.get("confidence", "?")
tags = data.get("tags", []) tags = data.get("tags", [])
tags_str = ", ".join(tags) if tags else "(none)" tags_str = ", ".join(tags) if tags else "(none)"
print(f"\n {num}. [{msg_id}]") print(f"\n [{eid}] {msg_id}")
print(f" Subject: {data.get('subject', 'N/A')[:55]}") print(f" Subject: {data.get('subject', 'N/A')[:55]}")
print(f" From: {data.get('sender', 'N/A')[:55]}") print(f" From: {data.get('sender', 'N/A')[:55]}")
print(f" To: {data.get('recipient', 'N/A')[:40]}") print(f" To: {data.get('recipient', 'N/A')[:40]}")
@@ -456,7 +444,7 @@ def cmd_review_list():
print(f"\n{'=' * 60}") print(f"\n{'=' * 60}")
print("Usage:") print("Usage:")
print(" python main.py review <number> <action>") print(" python main.py review <envelope_id> <action>")
print(" python main.py review all <action>") print(" python main.py review all <action>")
print(" python main.py review accept") print(" python main.py review accept")
print("Actions: delete / archive / keep / mark_read / label:<name>") print("Actions: delete / archive / keep / mark_read / label:<name>")
@@ -466,7 +454,7 @@ def cmd_review_act(selector, action):
"""Execute an action on one or more pending emails. """Execute an action on one or more pending emails.
Args: Args:
selector: a scan_index number, a msg_id string, or "all". selector: an envelope_id, a msg_id string, or "all".
action: one of delete/archive/keep/mark_read/label:<name>. action: one of delete/archive/keep/mark_read/label:<name>.
""" """
# Validate action # Validate action
@@ -573,23 +561,15 @@ def cmd_review_accept():
def _resolve_target(selector, sorted_items): def _resolve_target(selector, sorted_items):
"""Resolve a selector (scan_index number or msg_id) to a (msg_id, data) tuple. """Resolve a selector (envelope_id or msg_id) to a (msg_id, data) tuple.
When given a number, looks up the pending item whose scan_index matches Looks up by envelope_id first, then by msg_id string.
(stable across deletions). When given a string, looks up by msg_id.
Returns None and prints an error if the selector is invalid. Returns None and prints an error if the selector is invalid.
""" """
# Try as scan_index number # Try as envelope_id
try:
idx = int(selector)
for msg_id, data in sorted_items: for msg_id, data in sorted_items:
if data.get("scan_index") == idx: if data.get("envelope_id") == selector:
return (msg_id, data) return (msg_id, data)
valid = [str(d.get("scan_index")) for _, d in sorted_items]
print(f"No item with number {idx}. Valid numbers: {', '.join(valid)}")
return None
except ValueError:
pass
# Try as msg_id # Try as msg_id
for msg_id, data in sorted_items: for msg_id, data in sorted_items:
@@ -597,6 +577,8 @@ def _resolve_target(selector, sorted_items):
return (msg_id, data) return (msg_id, data)
print(f"Not found: {selector}") print(f"Not found: {selector}")
valid = [d.get("envelope_id") for _, d in sorted_items]
print(f"Valid envelope IDs: {', '.join(valid)}")
return None return None
@@ -705,7 +687,7 @@ if __name__ == "__main__":
else: else:
print("Usage:") print("Usage:")
print(" python main.py review list") print(" python main.py review list")
print(" python main.py review <number-or-id> <action>") print(" python main.py review <envelope_id> <action>")
print(" python main.py review all <action>") print(" python main.py review all <action>")
print(" python main.py review accept") print(" python main.py review accept")
sys.exit(1) sys.exit(1)