119 lines
3.3 KiB
Markdown
119 lines
3.3 KiB
Markdown
# email-classifier
|
|
|
|
FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using fingerprint-based dedupe.
|
|
|
|
## Environment configuration
|
|
|
|
LLM defaults:
|
|
|
|
```bash
|
|
export LLM_PROVIDER=openai
|
|
export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1
|
|
export LLM_API_KEY=none
|
|
export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
|
|
export LLM_TEMPERATURE=0.1
|
|
export LLM_TIMEOUT_SECONDS=60
|
|
export LLM_MAX_RETRIES=3
|
|
```
|
|
|
|
MiniMax via Anthropic-compatible API:
|
|
|
|
```bash
|
|
export LLM_PROVIDER=anthropic
|
|
export LLM_BASE_URL=https://api.minimax.io/anthropic
|
|
export LLM_API_KEY=your_minimax_key
|
|
export LLM_MODEL=MiniMax-M2.7
|
|
```
|
|
|
|
Optional local dedupe store path:
|
|
|
|
```bash
|
|
export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db
|
|
```
|
|
|
|
## API
|
|
|
|
### POST /classify
|
|
|
|
This overhaul is intended to return richer extraction. Top-level compatibility is not required.
|
|
|
|
Request example:
|
|
|
|
```json
|
|
{
|
|
"email_data": {
|
|
"subject": "Can you review this by Friday?",
|
|
"body": "Hi Daniel, please review the attached budget proposal."
|
|
},
|
|
"from_address": "sender@example.com",
|
|
"received_at": "2026-04-09T12:55:00Z",
|
|
"provider": "anthropic",
|
|
"base_url": "https://api.minimax.io/anthropic",
|
|
"model": "MiniMax-M2.7"
|
|
}
|
|
```
|
|
|
|
Response example:
|
|
|
|
```json
|
|
{
|
|
"needs_action": true,
|
|
"category": "question",
|
|
"priority": "high",
|
|
"task_description": "Review the budget proposal and respond by Friday",
|
|
"reasoning": "Direct request with a deadline requires follow-up",
|
|
"confidence": 0.91,
|
|
"details": {
|
|
"summary": "Budget proposal review requested with Friday deadline.",
|
|
"suggested_title": "Review budget proposal and respond by Friday",
|
|
"suggested_notes": "Requester asked for feedback on attached budget proposal before Friday.",
|
|
"deadline": "Friday",
|
|
"people": ["Daniel"],
|
|
"organizations": [],
|
|
"attachments_referenced": ["budget proposal"],
|
|
"next_steps": ["Review attachment", "Reply with feedback"],
|
|
"key_points": ["Deadline is Friday"],
|
|
"source_signals": ["request", "deadline"],
|
|
"dedupe_key": "..."
|
|
},
|
|
"dedupe": {
|
|
"status": "new",
|
|
"seen_count": 1,
|
|
"matched_on": "none",
|
|
"subject_key": "...",
|
|
"fingerprint": "..."
|
|
}
|
|
}
|
|
```
|
|
|
|
## Dedupe behavior
|
|
|
|
The API does not create or update Todoist tasks.
|
|
It only returns richer extraction and local dedupe metadata for downstream automation like n8n.
|
|
|
|
Matching strategy:
|
|
- normalized subject plus sender-derived `subject_key`
|
|
- full content fingerprint fallback based on sender + normalized subject + cleaned body
|
|
|
|
Statuses:
|
|
- `new`: no prior similar email seen
|
|
- `duplicate`: same dedupe target and same extracted result as before
|
|
- `updated`: matched prior email, but extracted result changed
|
|
|
|
This is intentionally heuristic, not perfect.
|
|
|
|
## Architecture
|
|
|
|
- `app/classifier.py`: classification orchestration and dedupe handoff
|
|
- `app/prompts.py`: richer extraction prompt
|
|
- `app/sync.py`: subject normalization, fingerprinting, dedupe application
|
|
- `app/dedupe_store.py`: SQLite-backed dedupe store
|
|
- `app/llm_adapters.py`: provider adapters
|
|
- `app/config.py`: LLM settings
|
|
|
|
## Notes
|
|
|
|
- No Todoist integration lives in this API.
|
|
- Dedupe is best-effort and designed to help downstream workflows avoid obvious duplicates.
|
|
- SQLite is used for lightweight local dedupe tracking.
|