email-classifier

FastAPI service that classifies email using a configurable LLM backend, returns richer structured extraction, and tracks duplicate classifications using fingerprint-based dedupe.

Environment configuration

LLM defaults:

export LLM_PROVIDER=openai
export LLM_BASE_URL=http://ollama.internal.henryhosted.com:9292/v1
export LLM_API_KEY=none
export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
export LLM_TEMPERATURE=0.1
export LLM_TIMEOUT_SECONDS=60
export LLM_MAX_RETRIES=3

MiniMax via Anthropic-compatible API:

export LLM_PROVIDER=anthropic
export LLM_BASE_URL=https://api.minimax.io/anthropic
export LLM_API_KEY=your_minimax_key
export LLM_MODEL=MiniMax-M2.7

Optional local dedupe store path:

export EMAIL_CLASSIFIER_DB_PATH=.data/email_classifier.db

API

POST /classify

This overhaul is intended to return richer extraction. Top-level compatibility is not required.

Request example:

{
  "email_data": {
    "subject": "Can you review this by Friday?",
    "body": "Hi Daniel, please review the attached budget proposal."
  },
  "from_address": "sender@example.com",
  "received_at": "2026-04-09T12:55:00Z",
  "provider": "anthropic",
  "base_url": "https://api.minimax.io/anthropic",
  "model": "MiniMax-M2.7"
}

Response example:

{
  "needs_action": true,
  "category": "question",
  "priority": "high",
  "task_description": "Review the budget proposal and respond by Friday",
  "reasoning": "Direct request with a deadline requires follow-up",
  "confidence": 0.91,
  "details": {
    "summary": "Budget proposal review requested with Friday deadline.",
    "suggested_title": "Review budget proposal and respond by Friday",
    "suggested_notes": "Requester asked for feedback on attached budget proposal before Friday.",
    "deadline": "Friday",
    "people": ["Daniel"],
    "organizations": [],
    "attachments_referenced": ["budget proposal"],
    "next_steps": ["Review attachment", "Reply with feedback"],
    "key_points": ["Deadline is Friday"],
    "source_signals": ["request", "deadline"],
    "dedupe_key": "..."
  },
  "dedupe": {
    "status": "new",
    "seen_count": 1,
    "matched_on": "none",
    "subject_key": "...",
    "fingerprint": "..."
  }
}

Dedupe behavior

The API does not create or update Todoist tasks. It only returns richer extraction and local dedupe metadata for downstream automation like n8n.

Matching strategy:

  • normalized subject plus sender-derived subject_key
  • full content fingerprint fallback based on sender + normalized subject + cleaned body

Statuses:

  • new: no prior similar email seen
  • duplicate: same dedupe target and same extracted result as before
  • updated: matched prior email, but extracted result changed

This is intentionally heuristic, not perfect.

Architecture

  • app/classifier.py: classification orchestration and dedupe handoff
  • app/prompts.py: richer extraction prompt
  • app/sync.py: subject normalization, fingerprinting, dedupe application
  • app/dedupe_store.py: SQLite-backed dedupe store
  • app/llm_adapters.py: provider adapters
  • app/config.py: LLM settings

Notes

  • No Todoist integration lives in this API.
  • Dedupe is best-effort and designed to help downstream workflows avoid obvious duplicates.
  • SQLite is used for lightweight local dedupe tracking.
Description
No description provided
Readme 124 KiB
Languages
Python 97.4%
Dockerfile 2.6%