Files
email-classifier/README.md

122 lines
3.4 KiB
Markdown

# email-classifier
FastAPI service that classifies email using a configurable LLM backend, enriches the output for human review, and can upsert Todoist tasks without creating duplicates.
## Environment configuration
LLM defaults:
```bash
export LLM_PROVIDER=openai
export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1
export LLM_API_KEY=none
export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
export LLM_TEMPERATURE=0.1
export LLM_TIMEOUT_SECONDS=60
export LLM_MAX_RETRIES=3
```
MiniMax via Anthropic-compatible API:
```bash
export LLM_PROVIDER=anthropic
export LLM_BASE_URL=https://api.minimax.io/anthropic
export LLM_API_KEY=your_minimax_key
export LLM_MODEL=MiniMax-M2.7
```
Optional Todoist sync:
```bash
export TODOIST_API_KEY=your_todoist_token
export TODOIST_PROJECT_ID=optional_project_id
export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db
```
## API
### POST /classify
Backward-compatible top-level response fields are preserved.
Optional request metadata for dedupe and richer sync:
```json
{
"email_data": {
"subject": "Can you review this by Friday?",
"body": "Hi Daniel, please review the attached budget proposal."
},
"message_id": "<abc123@example.com>",
"thread_id": "thread-789",
"from_address": "sender@example.com",
"received_at": "2026-04-09T12:55:00Z",
"provider": "anthropic",
"base_url": "https://api.minimax.io/anthropic",
"model": "MiniMax-M2.7"
}
```
Response now includes optional enrichment and Todoist sync info:
```json
{
"needs_action": true,
"category": "question",
"priority": "high",
"task_description": "Review the budget proposal and respond by Friday",
"reasoning": "Direct request with a deadline requires follow-up",
"confidence": 0.91,
"details": {
"summary": "Budget proposal review requested with Friday deadline.",
"suggested_title": "Review budget proposal and respond by Friday",
"suggested_notes": "Requester asked for feedback on attached budget proposal before Friday.",
"deadline": "Friday",
"people": ["Daniel"],
"organizations": [],
"attachments_referenced": ["budget proposal"],
"next_steps": ["Review attachment", "Reply with feedback"],
"key_points": ["Deadline is Friday"],
"source_signals": ["request", "deadline"],
"dedupe_key": "..."
},
"todoist": {
"status": "created",
"task_id": "1234567890",
"comment_added": false,
"dedupe_match": "none",
"message": null
}
}
```
## Dedupe behavior
When Todoist sync is enabled and `needs_action=true`:
- first match by `message_id`
- then by `thread_id`
- then by normalized content fingerprint fallback
Behavior:
- no existing task: create Todoist task
- existing task, same classification: do not duplicate, mark `unchanged`
- existing task, changed classification/context: update task in place
- add a Todoist comment only when material context changed
## Architecture
- `app/classifier.py`: classification orchestration and Todoist sync handoff
- `app/prompts.py`: richer extraction prompt
- `app/sync.py`: dedupe, task rendering, Todoist upsert logic
- `app/dedupe_store.py`: SQLite-backed mapping store
- `app/todoist.py`: Todoist REST client
- `app/llm_adapters.py`: provider adapters
- `app/config.py`: LLM settings
## Notes
- `/classify` remains backward compatible at the top level.
- New request metadata fields are optional.
- Todoist sync safely no-ops when `TODOIST_API_KEY` is not configured.
- SQLite is used for lightweight production-safe dedupe tracking.