132 lines
3.2 KiB
Markdown
132 lines
3.2 KiB
Markdown
# email-classifier
|
|
|
|
FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using Outlook-aware dedupe.
|
|
|
|
## Environment configuration
|
|
|
|
LLM defaults:
|
|
|
|
```bash
|
|
export LLM_PROVIDER=openai
|
|
export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1
|
|
export LLM_API_KEY=none
|
|
export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
|
|
export LLM_TEMPERATURE=0.1
|
|
export LLM_TIMEOUT_SECONDS=60
|
|
export LLM_MAX_RETRIES=3
|
|
```
|
|
|
|
MiniMax via Anthropic-compatible API:
|
|
|
|
```bash
|
|
export LLM_PROVIDER=anthropic
|
|
export LLM_BASE_URL=https://api.minimax.io/anthropic
|
|
export LLM_API_KEY=your_minimax_key
|
|
export LLM_MODEL=MiniMax-M2.7
|
|
```
|
|
|
|
Optional local dedupe store path:
|
|
|
|
```bash
|
|
export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db
|
|
```
|
|
|
|
## Input shape
|
|
|
|
Designed around real Outlook message payloads. Relevant fields:
|
|
|
|
```json
|
|
{
|
|
"id": "AAMk...",
|
|
"internetMessageId": "<...@...>",
|
|
"conversationId": "AAQk...",
|
|
"subject": "MB Printer",
|
|
"bodyPreview": "Good morning, ...",
|
|
"receivedDateTime": "2026-02-19T15:27:35Z",
|
|
"sentDateTime": "2026-02-19T15:27:32Z",
|
|
"hasAttachments": false,
|
|
"importance": "normal",
|
|
"isRead": false,
|
|
"body": {
|
|
"contentType": "html",
|
|
"content": "..."
|
|
}
|
|
}
|
|
```
|
|
|
|
API request example:
|
|
|
|
```json
|
|
{
|
|
"id": "AAMk...",
|
|
"internetMessageId": "<...@...>",
|
|
"conversationId": "AAQk...",
|
|
"bodyPreview": "Good morning, ...",
|
|
"receivedDateTime": "2026-02-19T15:27:35Z",
|
|
"sentDateTime": "2026-02-19T15:27:32Z",
|
|
"hasAttachments": false,
|
|
"importance": "normal",
|
|
"isRead": false,
|
|
"email_data": {
|
|
"subject": "MB Printer",
|
|
"body": "<html>...</html>"
|
|
},
|
|
"provider": "anthropic",
|
|
"base_url": "https://api.minimax.io/anthropic",
|
|
"model": "MiniMax-M2.7"
|
|
}
|
|
```
|
|
|
|
## Response example
|
|
|
|
```json
|
|
{
|
|
"needs_action": true,
|
|
"category": "question",
|
|
"priority": "high",
|
|
"task_description": "Investigate MB Printer issue and reply",
|
|
"reasoning": "The email appears to describe an issue requiring action.",
|
|
"confidence": 0.91,
|
|
"details": {
|
|
"summary": "Printer issue reported in the MB area.",
|
|
"suggested_title": "Handle MB Printer issue",
|
|
"suggested_notes": "Review the printer problem, identify urgency, and reply with next steps.",
|
|
"deadline": null,
|
|
"people": [],
|
|
"organizations": [],
|
|
"attachments_referenced": [],
|
|
"next_steps": ["Review issue", "Respond to sender"],
|
|
"key_points": ["Printer issue reported"],
|
|
"source_signals": ["request"],
|
|
"dedupe_key": "..."
|
|
},
|
|
"dedupe": {
|
|
"status": "new",
|
|
"seen_count": 1,
|
|
"matched_on": "none",
|
|
"message_id": "AAMk...",
|
|
"conversation_id": "AAQk...",
|
|
"fingerprint": "..."
|
|
}
|
|
}
|
|
```
|
|
|
|
## Dedupe precedence
|
|
|
|
1. `id` for exact Outlook message match
|
|
2. `conversationId` for thread grouping
|
|
3. normalized subject + preview/body fingerprint fallback
|
|
|
|
Statuses:
|
|
- `new`: no prior similar email seen
|
|
- `duplicate`: same dedupe target and same extracted result as before
|
|
- `updated`: matched prior email, but extracted result changed
|
|
|
|
This is intentionally heuristic for the fallback path.
|
|
|
|
## Notes
|
|
|
|
- No Todoist integration lives in this API.
|
|
- Dedupe is local and intended to help downstream workflows avoid obvious duplicates.
|
|
- SQLite is used for lightweight local dedupe tracking.
|