email-classifier/docs/api.md

# API Reference

## `POST /classify`

Classifies a single email and returns structured extraction results.

**Endpoint:** `POST /classify`

**Content-Type:** `application/json`

---

## Request

The endpoint accepts **two input shapes**: a full Outlook-shaped payload (native Microsoft Graph API format) or a simplified `email_data` object.

### Simplified Shape

Use this for lightweight clients or testing:

```json
{
  "email_data": {
    "subject": "Printer issue in MB",
    "body": "<html>...</html>"
  },
  "id": "AAMk...",
  "conversationId": "AAQk..."
}
```

### Full Outlook Shape

Pass through an email directly from Microsoft Graph API:

```json
{
  "id": "AAMk...",
  "internetMessageId": "<abc123@mail.example.com>",
  "conversationId": "AAQk...",
  "subject": "MB Printer",
  "bodyPreview": "Good morning, ...",
  "body": {
    "contentType": "html",
    "content": "<html>...(full HTML body)</html>"
  },
  "sender": {
    "emailAddress": {
      "name": "Bobbi Johnson",
      "address": "bobbi.johnson@grandportage.com"
    }
  },
  "from": {
    "emailAddress": {
      "name": "Bobbi Johnson",
      "address": "bobbi.johnson@grandportage.com"
    }
  },
  "toRecipients": [
    {
      "emailAddress": {
        "name": "IT Helpdesk Mail",
        "address": "helpdeskmail@grandportage.com"
      }
    }
  ],
  "ccRecipients": [],
  "bccRecipients": [],
  "replyTo": [],
  "receivedDateTime": "2026-02-19T15:27:35Z",
  "sentDateTime": "2026-02-19T15:27:32Z",
  "hasAttachments": false,
  "importance": "normal",
  "isRead": false,
  "flag": { "flagStatus": "notFlagged" }
}
```

### Per-Request LLM Overrides

You can override the global LLM settings for individual requests:

| Field | Type | Description |
|---|---|---|
| `provider` | `openai` | `anthropic` | Override the global LLM provider |
| `model` | `string` | Override the model name |
| `base_url` | `string` | Override the API base URL |
| `api_key` | `string` | Override the API key (excluded from logs) |
| `temperature` | `float` | Override the temperature (0.0–1.0) |

---

## Response

```json
{
  "needs_action": true,
  "category": "action_required",
  "priority": "high",
  "task_description": "Investigate MB Printer issue and reply",
  "reasoning": "The email describes an active problem requiring I.T. attention.",
  "confidence": 0.91,
  "details": {
    "summary": "Printer issue reported in the MB area requiring investigation.",
    "suggested_title": "Handle MB Printer issue",
    "suggested_notes": "Review the printer problem, identify urgency, and reply with next steps.",
    "deadline": null,
    "people": ["Bobbi Johnson"],
    "organizations": ["Grand Portage"],
    "attachments_referenced": [],
    "next_steps": ["Review printer status", "Reply to Bobbi Johnson"],
    "key_points": ["Printer issue in MB", "Needs on-site investigation"],
    "source_signals": ["request", "problem_report"]
  },
  "dedupe": {
    "status": "new",
    "seen_count": 1,
    "matched_on": "none",
    "message_id": "AAMk...",
    "conversation_id": "AAQk...",
    "fingerprint": "a3f8b..."
  }
}
```

### Response Fields

| Field | Type | Description |
|---|---|---|
| `needs_action` | `bool` | Whether the email requires user action |
| `category` | `string` | One of the 8 classification categories |
| `priority` | `string` | `high`, `medium`, or `low` |
| `task_description` | `string|null` | Short action-oriented description |
| `reasoning` | `string` | One-sentence explanation of the classification |
| `confidence` | `float` | Model confidence score (0.0–1.0) |
| `details` | `object` | Structured extraction (see below) |
| `dedupe` | `object` | Deduplication result (see below) |

### `details` Object

| Field | Type | Description |
|---|---|---|
| `summary` | `string|null` | Brief human-readable summary |
| `suggested_title` | `string|null` | Good task/Todoist title |
| `suggested_notes` | `string|null` | Multiline notes for a human reviewer |
| `deadline` | `string|null` | Any date/time deadline mentioned |
| `people` | `string[]` | People involved or referenced |
| `organizations` | `string[]` | Organizations, departments, vendors, teams |
| `attachments_referenced` | `string[]` | Attachment names mentioned in the email |
| `next_steps` | `string[]` | Specific recommended next actions |
| `key_points` | `string[]` | Important context bullets |
| `source_signals` | `string[]` | Signals that triggered the classification |
| `dedupe_key` | `string|null` | Content fingerprint (SHA-256) |

### `dedupe` Object

| Field | Type | Description |
|---|---|---|
| `status` | `new | duplicate | updated` | Whether this is new, a duplicate, or updated |
| `seen_count` | `int` | Number of times this email thread has been seen |
| `matched_on` | `none | id | conversation | fingerprint` | Which dedupe mechanism matched |
| `message_id` | `string|null` | Outlook `id` field if available |
| `conversation_id` | `string|null` | Outlook `conversationId` if available |
| `fingerprint` | `string` | SHA-256 content fingerprint |

---

## Error Responses

If the request is missing both `email_data` and Outlook body fields, the API returns a `422 Unprocessable Entity` with a validation error.

If classification fails after all retries, the service returns a `200` with an `uncategorized` result and `confidence: 0.0`.