# email-classifier FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using fingerprint-based dedupe. ## Environment configuration LLM defaults: ```bash export LLM_PROVIDER=openai export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1 export LLM_API_KEY=none export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m export LLM_TEMPERATURE=0.1 export LLM_TIMEOUT_SECONDS=60 export LLM_MAX_RETRIES=3 ``` MiniMax via Anthropic-compatible API: ```bash export LLM_PROVIDER=anthropic export LLM_BASE_URL=https://api.minimax.io/anthropic export LLM_API_KEY=your_minimax_key export LLM_MODEL=MiniMax-M2.7 ``` Optional local dedupe store path: ```bash export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db ``` ## API ### POST /classify This overhaul is intended to return richer extraction. Top-level compatibility is not required. Request example: ```json { "email_data": { "subject": "Can you review this by Friday?", "body": "Hi Daniel, please review the attached budget proposal." }, "from_address": "sender@example.com", "received_at": "2026-04-09T12:55:00Z", "provider": "anthropic", "base_url": "https://api.minimax.io/anthropic", "model": "MiniMax-M2.7" } ``` Response example: ```json { "needs_action": true, "category": "question", "priority": "high", "task_description": "Review the budget proposal and respond by Friday", "reasoning": "Direct request with a deadline requires follow-up", "confidence": 0.91, "details": { "summary": "Budget proposal review requested with Friday deadline.", "suggested_title": "Review budget proposal and respond by Friday", "suggested_notes": "Requester asked for feedback on attached budget proposal before Friday.", "deadline": "Friday", "people": ["Daniel"], "organizations": [], "attachments_referenced": ["budget proposal"], "next_steps": ["Review attachment", "Reply with feedback"], "key_points": ["Deadline is Friday"], "source_signals": ["request", "deadline"], "dedupe_key": "..." }, "dedupe": { "status": "new", "seen_count": 1, "matched_on": "none", "subject_key": "...", "fingerprint": "..." } } ``` ## Dedupe behavior The API does not create or update Todoist tasks. It only returns richer extraction and local dedupe metadata for downstream automation like n8n. Matching strategy: - normalized subject plus sender-derived `subject_key` - full content fingerprint fallback based on sender + normalized subject + cleaned body Statuses: - `new`: no prior similar email seen - `duplicate`: same dedupe target and same extracted result as before - `updated`: matched prior email, but extracted result changed This is intentionally heuristic, not perfect. ## Architecture - `app/classifier.py`: classification orchestration and dedupe handoff - `app/prompts.py`: richer extraction prompt - `app/sync.py`: subject normalization, fingerprinting, dedupe application - `app/dedupe_store.py`: SQLite-backed dedupe store - `app/llm_adapters.py`: provider adapters - `app/config.py`: LLM settings ## Notes - No Todoist integration lives in this API. - Dedupe is best-effort and designed to help downstream workflows avoid obvious duplicates. - SQLite is used for lightweight local dedupe tracking.