# email-classifier FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using Outlook-aware dedupe. ## Environment configuration LLM defaults: ```bash export LLM_PROVIDER=openai export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1 export LLM_API_KEY=none export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m export LLM_TEMPERATURE=0.1 export LLM_TIMEOUT_SECONDS=60 export LLM_MAX_RETRIES=3 ``` MiniMax via Anthropic-compatible API: ```bash export LLM_PROVIDER=anthropic export LLM_BASE_URL=https://api.minimax.io/anthropic export LLM_API_KEY=your_minimax_key export LLM_MODEL=MiniMax-M2.7 ``` Optional local dedupe store path: ```bash export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db ``` ## Input shape Designed around real Outlook message payloads. Relevant fields: ```json { "id": "AAMk...", "internetMessageId": "<...@...>", "conversationId": "AAQk...", "subject": "MB Printer", "bodyPreview": "Good morning, ...", "receivedDateTime": "2026-02-19T15:27:35Z", "sentDateTime": "2026-02-19T15:27:32Z", "hasAttachments": false, "importance": "normal", "isRead": false, "body": { "contentType": "html", "content": "..." } } ``` API request example: ```json { "id": "AAMk...", "internetMessageId": "<...@...>", "conversationId": "AAQk...", "bodyPreview": "Good morning, ...", "receivedDateTime": "2026-02-19T15:27:35Z", "sentDateTime": "2026-02-19T15:27:32Z", "hasAttachments": false, "importance": "normal", "isRead": false, "email_data": { "subject": "MB Printer", "body": "..." }, "provider": "anthropic", "base_url": "https://api.minimax.io/anthropic", "model": "MiniMax-M2.7" } ``` ## Response example ```json { "needs_action": true, "category": "question", "priority": "high", "task_description": "Investigate MB Printer issue and reply", "reasoning": "The email appears to describe an issue requiring action.", "confidence": 0.91, "details": { "summary": "Printer issue reported in the MB area.", "suggested_title": "Handle MB Printer issue", "suggested_notes": "Review the printer problem, identify urgency, and reply with next steps.", "deadline": null, "people": [], "organizations": [], "attachments_referenced": [], "next_steps": ["Review issue", "Respond to sender"], "key_points": ["Printer issue reported"], "source_signals": ["request"], "dedupe_key": "..." }, "dedupe": { "status": "new", "seen_count": 1, "matched_on": "none", "message_id": "AAMk...", "conversation_id": "AAQk...", "fingerprint": "..." } } ``` ## Dedupe precedence 1. `id` for exact Outlook message match 2. `conversationId` for thread grouping 3. normalized subject + preview/body fingerprint fallback Statuses: - `new`: no prior similar email seen - `duplicate`: same dedupe target and same extracted result as before - `updated`: matched prior email, but extracted result changed This is intentionally heuristic for the fallback path. ## Notes - No Todoist integration lives in this API. - Dedupe is local and intended to help downstream workflows avoid obvious duplicates. - SQLite is used for lightweight local dedupe tracking.