Covers: overview, setup, API reference, configuration, testing, deployment, and known quirks.
62 lines
2.8 KiB
Markdown
62 lines
2.8 KiB
Markdown
# email-classifier
|
|
|
|
FastAPI service that classifies emails using a configurable LLM backend. It accepts Outlook-shaped email JSON payloads, extracts structured classification data, and tracks duplicate classifications using a local SQLite dedupe store.
|
|
|
|
## Purpose
|
|
|
|
This service is designed to help workflow systems (e.g., Todoist ticket creation) automatically process incoming emails by:
|
|
|
|
- Determining whether an email requires action
|
|
- Extracting priority, category, suggested task title/notes, people, organizations, and deadlines
|
|
- Deduplicating repeated emails based on Outlook message ID, conversation ID, or content fingerprinting
|
|
|
|
## Key Features
|
|
|
|
- **Configurable LLM providers** — OpenAI-compatible (Ollama, LM Studio, OpenAI) or Anthropic-compatible (MiniMax, Anthropic API)
|
|
- **Outlook-shaped input** — Accepts native Microsoft Graph API email payloads with no transformation required
|
|
- **Simplified input** — Also accepts a minimal `email_data` shape with just `subject` and `body`
|
|
- **Deduplication** — Local SQLite store tracks seen emails by message ID, conversation ID, or content fingerprint
|
|
- **Structured extraction** — Returns classification, priority, suggested task title/notes, people, organizations, deadlines, and more
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
email-classifier/
|
|
├── app/
|
|
│ ├── main.py # FastAPI app entry point
|
|
│ ├── config.py # Pydantic settings from environment variables
|
|
│ ├── classifier.py # Core classification orchestration
|
|
│ ├── llm_adapters.py # OpenAI- and Anthropic-compatible adapter layer
|
|
│ ├── models.py # Pydantic request/response models
|
|
│ ├── prompts.py # System prompt sent to the LLM
|
|
│ ├── sync.py # Deduplication logic and content fingerprinting
|
|
│ ├── dedupe_store.py # SQLite persistence for dedupe tracking
|
|
│ ├── routers/
|
|
│ │ └── classify_email.py # /classify POST endpoint
|
|
│ └── helpers/
|
|
│ ├── clean_email_html.py
|
|
│ ├── extract_latest_message.py
|
|
│ └── remove_disclaimer.py
|
|
├── docs/ # MkDocs documentation (this site)
|
|
├── Dockerfile
|
|
├── pyproject.toml
|
|
└── uv.lock
|
|
```
|
|
|
|
## Output Classification Schema
|
|
|
|
Emails are classified into one of these categories:
|
|
|
|
| Category | Description |
|
|
|---|---|
|
|
| `action_required` | Direct request requiring user action |
|
|
| `question` | Question needing a response |
|
|
| `fyi` | Informational, no reply needed |
|
|
| `newsletter` | Newsletter or publication |
|
|
| `promotional` | Marketing or sales outreach |
|
|
| `automated` | Automated system notification |
|
|
| `alert` | I.T. or security alert |
|
|
| `uncategorized` | Fallback when classification fails |
|
|
|
|
Priority is one of: `high`, `medium`, `low`.
|