comment from youlu
This commit is contained in:
@@ -159,3 +159,11 @@ HTTP requests are made through a shared `requests.Session` with:
|
|||||||
- **Rate limiting** (Ollama latency between fetches when configured; 1-second fallback otherwise)
|
- **Rate limiting** (Ollama latency between fetches when configured; 1-second fallback otherwise)
|
||||||
|
|
||||||
Some sites (e.g. paywalled or bot-protected) may still return errors — in those cases the content field is left empty and the RSS description is used as a fallback for summaries.
|
Some sites (e.g. paywalled or bot-protected) may still return errors — in those cases the content field is left empty and the RSS description is used as a fallback for summaries.
|
||||||
|
|
||||||
|
## Design notes
|
||||||
|
|
||||||
|
- **Articles without dates are included by default.** `is_within_lookback` returns `True` when an article has no published date, and the query uses `OR published_date IS NULL`. This is intentional — silently dropping articles just because the feed omits a date would be worse than including them. If you only want dated articles, filter on `published_date` in the output.
|
||||||
|
|
||||||
|
- **`generate_summary` accepts both `description` and `content`.** The `description` parameter is not redundant — `body = content or description` uses the RSS description as a fallback when `fetch_content()` fails and returns `None`. This ensures articles still get summarized even when the full page can't be fetched.
|
||||||
|
|
||||||
|
- **`fetch_content` uses a chained ternary for element selection.** The expression `article if article else soup.body if soup.body else soup` picks the most specific container available. This is a common Python pattern and reads top-to-bottom as a priority list.
|
||||||
|
|||||||
@@ -369,7 +369,7 @@ def main():
|
|||||||
max_per_feed = settings.get("max_articles_per_feed", 0)
|
max_per_feed = settings.get("max_articles_per_feed", 0)
|
||||||
|
|
||||||
conn = init_db(args.database)
|
conn = init_db(args.database)
|
||||||
|
try:
|
||||||
# Purge old articles
|
# Purge old articles
|
||||||
deleted = purge_old_articles(conn, retention_days)
|
deleted = purge_old_articles(conn, retention_days)
|
||||||
if deleted:
|
if deleted:
|
||||||
@@ -377,7 +377,6 @@ def main():
|
|||||||
|
|
||||||
if args.purge_only:
|
if args.purge_only:
|
||||||
logger.info("Purge-only mode; exiting")
|
logger.info("Purge-only mode; exiting")
|
||||||
conn.close()
|
|
||||||
return
|
return
|
||||||
|
|
||||||
# Fetch feeds
|
# Fetch feeds
|
||||||
@@ -466,7 +465,7 @@ def main():
|
|||||||
fields = [f.strip() for f in args.fields.split(",")]
|
fields = [f.strip() for f in args.fields.split(",")]
|
||||||
output = [{k: article[k] for k in fields if k in article} for article in recent]
|
output = [{k: article[k] for k in fields if k in article} for article in recent]
|
||||||
print(json.dumps(output, ensure_ascii=False, indent=2))
|
print(json.dumps(output, ensure_ascii=False, indent=2))
|
||||||
|
finally:
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user