Refocus classifier on rich extraction and local dedupe only
This commit is contained in:
61
README.md
61
README.md
@@ -1,6 +1,6 @@
|
||||
# email-classifier
|
||||
|
||||
FastAPI service that classifies email using a configurable LLM backend, enriches the output for human review, and can upsert Todoist tasks without creating duplicates.
|
||||
FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using fingerprint-based dedupe.
|
||||
|
||||
## Environment configuration
|
||||
|
||||
@@ -25,11 +25,9 @@ export LLM_API_KEY=your_minimax_key
|
||||
export LLM_MODEL=MiniMax-M2.7
|
||||
```
|
||||
|
||||
Optional Todoist sync:
|
||||
Optional local dedupe store path:
|
||||
|
||||
```bash
|
||||
export TODOIST_API_KEY=your_todoist_token
|
||||
export TODOIST_PROJECT_ID=optional_project_id
|
||||
export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db
|
||||
```
|
||||
|
||||
@@ -37,9 +35,9 @@ export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db
|
||||
|
||||
### POST /classify
|
||||
|
||||
Backward-compatible top-level response fields are preserved.
|
||||
This overhaul is intended to return richer extraction. Top-level compatibility is not required.
|
||||
|
||||
Optional request metadata for dedupe and richer sync:
|
||||
Request example:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -47,8 +45,6 @@ Optional request metadata for dedupe and richer sync:
|
||||
"subject": "Can you review this by Friday?",
|
||||
"body": "Hi Daniel, please review the attached budget proposal."
|
||||
},
|
||||
"message_id": "<abc123@example.com>",
|
||||
"thread_id": "thread-789",
|
||||
"from_address": "sender@example.com",
|
||||
"received_at": "2026-04-09T12:55:00Z",
|
||||
"provider": "anthropic",
|
||||
@@ -57,7 +53,7 @@ Optional request metadata for dedupe and richer sync:
|
||||
}
|
||||
```
|
||||
|
||||
Response now includes optional enrichment and Todoist sync info:
|
||||
Response example:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -80,42 +76,43 @@ Response now includes optional enrichment and Todoist sync info:
|
||||
"source_signals": ["request", "deadline"],
|
||||
"dedupe_key": "..."
|
||||
},
|
||||
"todoist": {
|
||||
"status": "created",
|
||||
"task_id": "1234567890",
|
||||
"comment_added": false,
|
||||
"dedupe_match": "none",
|
||||
"message": null
|
||||
"dedupe": {
|
||||
"status": "new",
|
||||
"seen_count": 1,
|
||||
"matched_on": "none",
|
||||
"subject_key": "...",
|
||||
"fingerprint": "..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dedupe behavior
|
||||
|
||||
When Todoist sync is enabled and `needs_action=true`:
|
||||
- first match by `message_id`
|
||||
- then by `thread_id`
|
||||
- then by normalized content fingerprint fallback
|
||||
The API does not create or update Todoist tasks.
|
||||
It only returns richer extraction and local dedupe metadata for downstream automation like n8n.
|
||||
|
||||
Behavior:
|
||||
- no existing task: create Todoist task
|
||||
- existing task, same classification: do not duplicate, mark `unchanged`
|
||||
- existing task, changed classification/context: update task in place
|
||||
- add a Todoist comment only when material context changed
|
||||
Matching strategy:
|
||||
- normalized subject plus sender-derived `subject_key`
|
||||
- full content fingerprint fallback based on sender + normalized subject + cleaned body
|
||||
|
||||
Statuses:
|
||||
- `new`: no prior similar email seen
|
||||
- `duplicate`: same dedupe target and same extracted result as before
|
||||
- `updated`: matched prior email, but extracted result changed
|
||||
|
||||
This is intentionally heuristic, not perfect.
|
||||
|
||||
## Architecture
|
||||
|
||||
- `app/classifier.py`: classification orchestration and Todoist sync handoff
|
||||
- `app/classifier.py`: classification orchestration and dedupe handoff
|
||||
- `app/prompts.py`: richer extraction prompt
|
||||
- `app/sync.py`: dedupe, task rendering, Todoist upsert logic
|
||||
- `app/dedupe_store.py`: SQLite-backed mapping store
|
||||
- `app/todoist.py`: Todoist REST client
|
||||
- `app/sync.py`: subject normalization, fingerprinting, dedupe application
|
||||
- `app/dedupe_store.py`: SQLite-backed dedupe store
|
||||
- `app/llm_adapters.py`: provider adapters
|
||||
- `app/config.py`: LLM settings
|
||||
|
||||
## Notes
|
||||
|
||||
- `/classify` remains backward compatible at the top level.
|
||||
- New request metadata fields are optional.
|
||||
- Todoist sync safely no-ops when `TODOIST_API_KEY` is not configured.
|
||||
- SQLite is used for lightweight production-safe dedupe tracking.
|
||||
- No Todoist integration lives in this API.
|
||||
- Dedupe is best-effort and designed to help downstream workflows avoid obvious duplicates.
|
||||
- SQLite is used for lightweight local dedupe tracking.
|
||||
|
||||
Reference in New Issue
Block a user