email-classifier/README.md

# email-classifier

FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using fingerprint-based dedupe.

## Environment configuration

LLM defaults:

```bash
export LLM_PROVIDER=openai
export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1
export LLM_API_KEY=none
export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
export LLM_TEMPERATURE=0.1
export LLM_TIMEOUT_SECONDS=60
export LLM_MAX_RETRIES=3
```

MiniMax via Anthropic-compatible API:

```bash
export LLM_PROVIDER=anthropic
export LLM_BASE_URL=https://api.minimax.io/anthropic
export LLM_API_KEY=your_minimax_key
export LLM_MODEL=MiniMax-M2.7
```

Optional local dedupe store path:

```bash
export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db
```

## API

### POST /classify

This overhaul is intended to return richer extraction. Top-level compatibility is not required.

Request example:

```json
{
  "email_data": {
    "subject": "Can you review this by Friday?",
    "body": "Hi Daniel, please review the attached budget proposal."
  },
  "from_address": "sender@example.com",
  "received_at": "2026-04-09T12:55:00Z",
  "provider": "anthropic",
  "base_url": "https://api.minimax.io/anthropic",
  "model": "MiniMax-M2.7"
}
```

Response example:

```json
{
  "needs_action": true,
  "category": "question",
  "priority": "high",
  "task_description": "Review the budget proposal and respond by Friday",
  "reasoning": "Direct request with a deadline requires follow-up",
  "confidence": 0.91,
  "details": {
    "summary": "Budget proposal review requested with Friday deadline.",
    "suggested_title": "Review budget proposal and respond by Friday",
    "suggested_notes": "Requester asked for feedback on attached budget proposal before Friday.",
    "deadline": "Friday",
    "people": ["Daniel"],
    "organizations": [],
    "attachments_referenced": ["budget proposal"],
    "next_steps": ["Review attachment", "Reply with feedback"],
    "key_points": ["Deadline is Friday"],
    "source_signals": ["request", "deadline"],
    "dedupe_key": "..."
  },
  "dedupe": {
    "status": "new",
    "seen_count": 1,
    "matched_on": "none",
    "subject_key": "...",
    "fingerprint": "..."
  }
}
```

## Dedupe behavior

The API does not create or update Todoist tasks.
It only returns richer extraction and local dedupe metadata for downstream automation like n8n.

Matching strategy:
- normalized subject plus sender-derived `subject_key`
- full content fingerprint fallback based on sender + normalized subject + cleaned body

Statuses:
- `new`: no prior similar email seen
- `duplicate`: same dedupe target and same extracted result as before
- `updated`: matched prior email, but extracted result changed

This is intentionally heuristic, not perfect.

## Architecture

- `app/classifier.py`: classification orchestration and dedupe handoff
- `app/prompts.py`: richer extraction prompt
- `app/sync.py`: subject normalization, fingerprinting, dedupe application
- `app/dedupe_store.py`: SQLite-backed dedupe store
- `app/llm_adapters.py`: provider adapters
- `app/config.py`: LLM settings

## Notes

- No Todoist integration lives in this API.
- Dedupe is best-effort and designed to help downstream workflows avoid obvious duplicates.
- SQLite is used for lightweight local dedupe tracking.