Merge pull request 'ci: update build-publish workflow for multi-branch and tag support' (#8 ) from ci/update-workflow into main

Reviewed-on: #8
ci: fix if-expression syntax and restructure into separate jobs
2026-04-09 23:10:37 +00:00 · 2026-04-09 23:05:25 +00:00 · 2026-04-09 21:14:23 +00:00 · 2026-04-09 21:06:52 +00:00 · 2026-04-09 20:57:49 +00:00 · 2026-04-09 20:55:52 +00:00
12 changed files with 885 additions and 20 deletions
--- a/.github/workflows/build-publish.yaml
+++ b/.github/workflows/build-publish.yaml
@@ -3,11 +3,38 @@ name: Build and Publish Docker Image
 on:
  push:
    branches:
-      - main  # Trigger on pushes to main
+      - '**'
  pull_request:
    types: [opened, synchronize, reopened]
  create:
    refs/tags/v*
 jobs:
-  build-and-push:
+  build-only:
-    runs-on: ubuntu-latest  # Ensure your Gitea runner has this label
+    runs-on: ubuntu-latest
    # All branches, all PRs, and anything that's not a push to main or a version tag
    if: github.event_name != 'push' || (github.event_name == 'push' && !startsWith(gitea.ref, 'refs/tags/v') && gitea.ref != 'refs/heads/main')
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build (no push)
        uses: docker/build-push-action@v5
        with:
          context: .
          file: Dockerfile
          push: false
          tags: |
            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:build-test
          cache-from: type=gha
          cache-to: type=gha,mode=max
  build-and-push-main:
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && gitea.ref == 'refs/heads/main'
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
@@ -15,27 +42,58 @@ jobs:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      # Login to your registry (Docker Hub, Gitea Package Registry, or Harbor)
      - name: Login to Docker Registry
        uses: docker/login-action@v3
        with:
-          registry: ${{ secrets.DOCKER_REGISTRY }} # Remove if using Docker Hub
+          registry: ${{ secrets.DOCKER_REGISTRY }}
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
-      - name: Build and push
+      - name: Build and push (main branch)
        uses: docker/build-push-action@v5
        with:
          context: .
          file: Dockerfile
          push: true
          # Tags the image as 'latest' and also uses the git SHA for versioning
          tags: |
-            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:${{ gitea.sha }}
+            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:main
            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:latest
-          # Caching speeds up builds by reusing layers (crucial for 'uv' installs)
+            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:${{ gitea.sha }}
          labels: |
            org.opencontainers.image.source=${{ gitea.server_url }}/${{ gitea.repository }}
            org.opencontainers.image.description=Email Classifier Service
          cache-from: type=gha
-          cache-to: type=gha,mode=max
+          cache-to: type=gha,mode=max
  build-and-push-tag:
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && startsWith(gitea.ref, 'refs/tags/v')
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to Docker Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ secrets.DOCKER_REGISTRY }}
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
      - name: Build and push (tagged release)
        uses: docker/build-push-action@v5
        with:
          context: .
          file: Dockerfile
          push: true
          tags: |
            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:${{ gitea.ref_name }}
            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:latest
            ${{ secrets.DOCKER_REGISTRY }}/${{ secrets.DOCKER_USERNAME }}/email-classifier:${{ gitea.sha }}
          labels: |
            org.opencontainers.image.source=${{ gitea.server_url }}/${{ gitea.repository }}
            org.opencontainers.image.description=Email Classifier Service
          cache-from: type=gha
          cache-to: type=gha,mode=max
--- a/.gitignore
+++ b/.gitignore
@@ -8,3 +8,6 @@ wheels/
 # Virtual environments
 .venv
 venv/
 docs/.venv/
 docs/venv/
--- a/app/config.py
+++ b/app/config.py
@@ -2,26 +2,66 @@ from __future__ import annotations
 import os
 from functools import lru_cache
-from typing import Literal
+from pathlib import Path
 from typing import Any, Literal
-from pydantic import BaseModel, Field
+import yaml
 from pydantic import BaseModel
 Provider = Literal["openai", "anthropic"]
 DEFAULT_CONFIG_PATHS = ["config.yml", "config.yaml", "/config/config.yml", "/config/config.yaml"]
 class LLMSettings(BaseModel):
-    provider: Provider = Field(default=os.getenv("LLM_PROVIDER", "openai"))
+    provider: Provider = "openai"
-    api_key: str = Field(default=os.getenv("LLM_API_KEY", "none"))
+    api_key: str = "none"
-    model: str = Field(default=os.getenv("LLM_MODEL", "qwen2.5-7b-instruct.q4_k_m"))
+    model: str = "qwen2.5-7b-instruct.q4_k_m"
-    base_url: str = Field(default=os.getenv("LLM_BASE_URL", "http://ollama.internal.henryhosted.com:9292/v1"))
+    base_url: str = "http://ollama.internal.henryhosted.com:9292/v1"
-    temperature: float = Field(default=float(os.getenv("LLM_TEMPERATURE", "0.1")))
+    temperature: float = 0.1
-    timeout_seconds: float = Field(default=float(os.getenv("LLM_TIMEOUT_SECONDS", "60")))
+    timeout_seconds: float = 60
-    max_retries: int = Field(default=int(os.getenv("LLM_MAX_RETRIES", "3")))
+    max_retries: int = 3
 def _load_yaml_config() -> dict[str, Any]:
    explicit = os.getenv("EMAIL_CLASSIFIER_CONFIG") or os.getenv("APP_CONFIG_FILE")
    candidates = [explicit] if explicit else DEFAULT_CONFIG_PATHS
    for candidate in candidates:
        if not candidate:
            continue
        path = Path(candidate)
        if not path.exists() or not path.is_file():
            continue
        data = yaml.safe_load(path.read_text()) or {}
        if not isinstance(data, dict):
            raise ValueError(f"Config file must contain a mapping/object: {path}")
        llm = data.get("llm", data)
        if not isinstance(llm, dict):
            raise ValueError(f"LLM config must be a mapping/object: {path}")
        return llm
    return {}
 def _env_or_yaml(env_name: str, yaml_data: dict[str, Any], yaml_key: str, default: Any) -> Any:
    value = os.getenv(env_name)
    if value is not None:
        return value
    if yaml_key in yaml_data and yaml_data[yaml_key] is not None:
        return yaml_data[yaml_key]
    return default
@lru_cache(maxsize=1)
 def get_settings() -> LLMSettings:
-    return LLMSettings()
+    yaml_data = _load_yaml_config()
    return LLMSettings(
        provider=_env_or_yaml("LLM_PROVIDER", yaml_data, "provider", "openai"),
        api_key=_env_or_yaml("LLM_API_KEY", yaml_data, "api_key", "none"),
        model=_env_or_yaml("LLM_MODEL", yaml_data, "model", "qwen2.5-7b-instruct.q4_k_m"),
        base_url=_env_or_yaml("LLM_BASE_URL", yaml_data, "base_url", "http://ollama.internal.henryhosted.com:9292/v1"),
        temperature=float(_env_or_yaml("LLM_TEMPERATURE", yaml_data, "temperature", 0.1)),
        timeout_seconds=float(_env_or_yaml("LLM_TIMEOUT_SECONDS", yaml_data, "timeout_seconds", 60)),
        max_retries=int(_env_or_yaml("LLM_MAX_RETRIES", yaml_data, "max_retries", 3)),
    )
 def get_request_settings(
--- a/docs/api.md
+++ b/docs/api.md
@@ -0,0 +1,172 @@
 # API Reference
 ## `POST /classify`
 Classifies a single email and returns structured extraction results.
 **Endpoint:** `POST /classify`
 **Content-Type:** `application/json`
 ---
 ## Request
 The endpoint accepts **two input shapes**: a full Outlook-shaped payload (native Microsoft Graph API format) or a simplified `email_data` object.
 ### Simplified Shape
 Use this for lightweight clients or testing:
 ```json
 {
  "email_data": {
    "subject": "Printer issue in MB",
    "body": "<html>...</html>"
  },
  "id": "AAMk...",
  "conversationId": "AAQk..."
 }
 ```
 ### Full Outlook Shape
 Pass through an email directly from Microsoft Graph API:
 ```json
 {
  "id": "AAMk...",
  "internetMessageId": "<abc123@mail.example.com>",
  "conversationId": "AAQk...",
  "subject": "MB Printer",
  "bodyPreview": "Good morning, ...",
  "body": {
    "contentType": "html",
    "content": "<html>...(full HTML body)</html>"
  },
  "sender": {
    "emailAddress": {
      "name": "Bobbi Johnson",
      "address": "bobbi.johnson@grandportage.com"
    }
  },
  "from": {
    "emailAddress": {
      "name": "Bobbi Johnson",
      "address": "bobbi.johnson@grandportage.com"
    }
  },
  "toRecipients": [
    {
      "emailAddress": {
        "name": "IT Helpdesk Mail",
        "address": "helpdeskmail@grandportage.com"
      }
    }
  ],
  "ccRecipients": [],
  "bccRecipients": [],
  "replyTo": [],
  "receivedDateTime": "2026-02-19T15:27:35Z",
  "sentDateTime": "2026-02-19T15:27:32Z",
  "hasAttachments": false,
  "importance": "normal",
  "isRead": false,
  "flag": { "flagStatus": "notFlagged" }
 }
 ```
 ### Per-Request LLM Overrides
 You can override the global LLM settings for individual requests:
 | Field | Type | Description |
 |---|---|---|
 | `provider` | `openai` | `anthropic` | Override the global LLM provider |
 | `model` | `string` | Override the model name |
 | `base_url` | `string` | Override the API base URL |
 | `api_key` | `string` | Override the API key (excluded from logs) |
 | `temperature` | `float` | Override the temperature (0.0–1.0) |
 ---
 ## Response
 ```json
 {
  "needs_action": true,
  "category": "action_required",
  "priority": "high",
  "task_description": "Investigate MB Printer issue and reply",
  "reasoning": "The email describes an active problem requiring I.T. attention.",
  "confidence": 0.91,
  "details": {
    "summary": "Printer issue reported in the MB area requiring investigation.",
    "suggested_title": "Handle MB Printer issue",
    "suggested_notes": "Review the printer problem, identify urgency, and reply with next steps.",
    "deadline": null,
    "people": ["Bobbi Johnson"],
    "organizations": ["Grand Portage"],
    "attachments_referenced": [],
    "next_steps": ["Review printer status", "Reply to Bobbi Johnson"],
    "key_points": ["Printer issue in MB", "Needs on-site investigation"],
    "source_signals": ["request", "problem_report"]
  },
  "dedupe": {
    "status": "new",
    "seen_count": 1,
    "matched_on": "none",
    "message_id": "AAMk...",
    "conversation_id": "AAQk...",
    "fingerprint": "a3f8b..."
  }
 }
 ```
 ### Response Fields
 | Field | Type | Description |
 |---|---|---|
 | `needs_action` | `bool` | Whether the email requires user action |
 | `category` | `string` | One of the 8 classification categories |
 | `priority` | `string` | `high`, `medium`, or `low` |
 | `task_description` | `string|null` | Short action-oriented description |
 | `reasoning` | `string` | One-sentence explanation of the classification |
 | `confidence` | `float` | Model confidence score (0.0–1.0) |
 | `details` | `object` | Structured extraction (see below) |
 | `dedupe` | `object` | Deduplication result (see below) |
 ### `details` Object
 | Field | Type | Description |
 |---|---|---|
 | `summary` | `string|null` | Brief human-readable summary |
 | `suggested_title` | `string|null` | Good task/Todoist title |
 | `suggested_notes` | `string|null` | Multiline notes for a human reviewer |
 | `deadline` | `string|null` | Any date/time deadline mentioned |
 | `people` | `string[]` | People involved or referenced |
 | `organizations` | `string[]` | Organizations, departments, vendors, teams |
 | `attachments_referenced` | `string[]` | Attachment names mentioned in the email |
 | `next_steps` | `string[]` | Specific recommended next actions |
 | `key_points` | `string[]` | Important context bullets |
 | `source_signals` | `string[]` | Signals that triggered the classification |
 | `dedupe_key` | `string|null` | Content fingerprint (SHA-256) |
 ### `dedupe` Object
 | Field | Type | Description |
 |---|---|---|
 | `status` | `new | duplicate | updated` | Whether this is new, a duplicate, or updated |
 | `seen_count` | `int` | Number of times this email thread has been seen |
 | `matched_on` | `none | id | conversation | fingerprint` | Which dedupe mechanism matched |
 | `message_id` | `string|null` | Outlook `id` field if available |
 | `conversation_id` | `string|null` | Outlook `conversationId` if available |
 | `fingerprint` | `string` | SHA-256 content fingerprint |
 ---
 ## Error Responses
 If the request is missing both `email_data` and Outlook body fields, the API returns a `422 Unprocessable Entity` with a validation error.
 If classification fails after all retries, the service returns a `200` with an `uncategorized` result and `confidence: 0.0`.
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -0,0 +1,108 @@
 # Configuration
 All configuration is driven by environment variables. There are no config files.
 ## LLM Provider Settings
 ### `LLM_PROVIDER`
 - **Values:** `openai` | `anthropic`
 - **Default:** `openai`
 - Determines which adapter to use for API calls. Use `openai` for Ollama, LM Studio, and any OpenAI-compatible API. Use `anthropic` for MiniMax or any Anthropic-compatible API.
 ### `LLM_BASE_URL`
 - **Default:** `http://ollama.internal.henryhosted.com:9292/v1`
 - The base URL for the LLM API. Must include the `/v1` (OpenAI format) or `/anthropic` (Anthropic format) suffix as appropriate.
 ### `LLM_API_KEY`
 - **Default:** `none`
 - API key for the LLM provider. Set to `none` for local Ollama instances that don't require authentication.
 ### `LLM_MODEL`
 - **Default:** `qwen2.5-7b-instruct.q4_k_m`
 - The model name. Must match a model available on the target LLM backend.
 ### `LLM_TEMPERATURE`
 - **Default:** `0.1`
 - Sampling temperature (0.0–1.0). Lower values produce more deterministic outputs. A value around `0.1` is recommended for classification tasks.
 ### `LLM_TIMEOUT_SECONDS`
 - **Default:** `60`
 - Request timeout in seconds.
 ### `LLM_MAX_RETRIES`
 - **Default:** `3`
 - Maximum number of retries when a classification attempt fails to parse or returns an invalid result.
 ## Deduplication Settings
 ### `EMAIL_CLASSIFIER_DB_PATH`
 - **Default:** `.data/email_classifier.db`
 - Path to the SQLite database used for deduplication tracking. The directory will be created automatically.
 ---
 ## Provider-Specific Examples
 ### Ollama (local, OpenAI-compatible)
 ```bash
 export LLM_PROVIDER=openai
 export LLM_BASE_URL=http://localhost:11434/v1
 export LLM_API_KEY=none
 export LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
 export LLM_TEMPERATURE=0.1
 ```
 ### MiniMax (Anthropic-compatible API)
 ```bash
 export LLM_PROVIDER=anthropic
 export LLM_BASE_URL=https://api.minimax.io/anthropic
 export LLM_API_KEY=your_minimax_key
 export LLM_MODEL=MiniMax-M2.7
 export LLM_TEMPERATURE=0.1
 ```
 ### LM Studio (local, OpenAI-compatible)
 ```bash
 export LLM_PROVIDER=openai
 export LLM_BASE_URL=http://localhost:1234/v1
 export LLM_API_KEY=none
 export LLM_MODEL=your-loaded-model-name
 export LLM_TEMPERATURE=0.1
 ```
 ### OpenAI (cloud)
 ```bash
 export LLM_PROVIDER=openai
 export LLM_BASE_URL=https://api.openai.com/v1
 export LLM_API_KEY=sk-...
 export LLM_MODEL=gpt-4o-mini
 export LLM_TEMPERATURE=0.1
 ```
 ---
 ## Per-Request Overrides
 Any LLM setting can be overridden per-request by passing the field in the request body. This is useful when a single client needs to route to different providers dynamically (e.g., different email accounts with different LLM backends).
 ```json
 {
  "email_data": { "subject": "...", "body": "..." },
  "provider": "anthropic",
  "base_url": "https://api.minimax.io/anthropic",
  "api_key": "minimax_key_here",
  "model": "MiniMax-M2.7"
 }
 ```
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -0,0 +1,140 @@
 # Deployment
 ## Docker
 The service ships with a `Dockerfile` based on `python:3.12-slim-bookworm` using [uv](https://astral.sh/uv/) for fast dependency installation.
 ### Configuration sources
 The application now supports two configuration sources:
 - environment variables
 - a YAML config file
 Load order:
 1. per-request overrides
 2. environment variables
 3. YAML config file
 4. built-in defaults
 Supported config file locations:
 - `config.yml`
 - `config.yaml`
 - `/config/config.yml`
 - `/config/config.yaml`
 You can also set an explicit config path with:
 ```bash
 export EMAIL_CLASSIFIER_CONFIG=/path/to/config.yml
 ```
 Example `config.yml`:
 ```yaml
 llm:
  provider: anthropic
  base_url: https://api.minimax.io/anthropic
  api_key: your_api_key_here
  model: MiniMax-M2.7
  temperature: 0.1
  timeout_seconds: 60
  max_retries: 3
 ```
 ### Building
 ```bash
 docker build -t email-classifier .
 ```
 ### Running
 ```bash
 docker run -d --name email-classifier \
  -p 7999:7999 \
  -e EMAIL_CLASSIFIER_CONFIG=/config/config.yml \
  -e EMAIL_CLASSIFIER_DB_PATH=/data/email_classifier.db \
  -v /path/to/config.yml:/config/config.yml:ro \
  -v /path/to/local/data:/data \
  email-classifier
 ```
 Mount a persistent volume for `/data` (or wherever `EMAIL_CLASSIFIER_DB_PATH` points) to preserve the dedupe database across container restarts.
 Environment variables still override file-based config, so you can keep most settings in YAML and override just one or two values at deploy time.
 ## Docker Compose example
 ```yaml
 services:
  email-classifier:
    image: your-registry.example.com/your-org/email-classifier:latest
    container_name: email-classifier
    ports:
      - "7999:7999"
    environment:
      EMAIL_CLASSIFIER_CONFIG: /config/config.yml
      EMAIL_CLASSIFIER_DB_PATH: /data/email_classifier.db
      # Optional overrides. Env vars win over YAML values.
      # LLM_MODEL: MiniMax-M2.7
      # LLM_TIMEOUT_SECONDS: "90"
    volumes:
      - ./config.yml:/config/config.yml:ro
      - ./data:/data
    restart: unless-stopped
    # If your LLM backend runs on the Docker host, one option is:
    # extra_hosts:
    #   - "host.docker.internal:host-gateway"
 ```
 ### Compose notes
 - Mount the YAML config read-only into the container, typically at `/config/config.yml`
 - Mount a writable volume for `/data` so dedupe state survives restarts
 - Override specific values with environment variables when needed
 - If the LLM backend is another container on the same Compose network, use its service name in `base_url`
 - If the LLM backend runs on the host, use `host.docker.internal` or a host-gateway mapping where appropriate
 ## Building for a Remote Registry
 ```bash
 docker build -t \
  your-registry.example.com/your-org/email-classifier:latest \
  .
 docker push your-registry.example.com/your-org/email-classifier:latest
 ```
 ## GitHub Actions CI/CD
 The repository includes a workflow at `.github/workflows/build-publish.yaml` that builds and pushes a Docker image on every push to `main`.
 ### Required Secrets
 Configure these in your GitHub/Gitea Actions secrets:
 | Secret | Description |
 |---|---|
 | `DOCKER_REGISTRY` | Registry hostname (e.g., `ghcr.io` or your custom registry) |
 | `DOCKER_USERNAME` | Registry username |
 | `DOCKER_PASSWORD` | Registry password or access token |
 The workflow tags the image as:
 - `:latest` — always points to the latest commit on `main`
 - `:<sha>` — the full git SHA of the triggering commit (useful for rollbacks)
 ### Deployment Considerations
 - **Network access** — The container needs to reach your LLM backend. If using Ollama or another service on the host, use `host.docker.internal` or an explicit host-gateway mapping.
 - **Dedupe persistence** — Mount a volume for the SQLite database to persist dedupe state across deploys.
 - **Port** — The container exposes port `7999`. Map it to any host port you prefer.
 - **Health check** — The service does not currently expose a dedicated `/health` endpoint. Use `GET /docs` as a liveness probe.
 ## Production Checklist
 - [ ] Provide either a YAML config file or the required `LLM_*` environment variables
 - [ ] Use HTTPS for remote `LLM_BASE_URL` values in production
 - [ ] Mount a persistent volume for `EMAIL_CLASSIFIER_DB_PATH`
 - [ ] Set appropriate resource limits (CPU/memory) on the container
 - [ ] Configure `LLM_MAX_RETRIES` and `LLM_TIMEOUT_SECONDS` to suit your LLM backend's reliability
 - [ ] Keep `LLM_TEMPERATURE` low for consistent classification results
--- a/docs/index.md
+++ b/docs/index.md
@@ -0,0 +1,61 @@
 # email-classifier
 FastAPI service that classifies emails using a configurable LLM backend. It accepts Outlook-shaped email JSON payloads, extracts structured classification data, and tracks duplicate classifications using a local SQLite dedupe store.
 ## Purpose
 This service is designed to help workflow systems (e.g., Todoist ticket creation) automatically process incoming emails by:
 - Determining whether an email requires action
 - Extracting priority, category, suggested task title/notes, people, organizations, and deadlines
 - Deduplicating repeated emails based on Outlook message ID, conversation ID, or content fingerprinting
 ## Key Features
 - **Configurable LLM providers** — OpenAI-compatible (Ollama, LM Studio, OpenAI) or Anthropic-compatible (MiniMax, Anthropic API)
 - **Outlook-shaped input** — Accepts native Microsoft Graph API email payloads with no transformation required
 - **Simplified input** — Also accepts a minimal `email_data` shape with just `subject` and `body`
 - **Deduplication** — Local SQLite store tracks seen emails by message ID, conversation ID, or content fingerprint
 - **Structured extraction** — Returns classification, priority, suggested task title/notes, people, organizations, deadlines, and more
 ## Project Structure
 ```
 email-classifier/
 ├── app/
 │   ├── main.py              # FastAPI app entry point
 │   ├── config.py            # Pydantic settings from environment variables
 │   ├── classifier.py        # Core classification orchestration
 │   ├── llm_adapters.py       # OpenAI- and Anthropic-compatible adapter layer
 │   ├── models.py            # Pydantic request/response models
 │   ├── prompts.py           # System prompt sent to the LLM
 │   ├── sync.py              # Deduplication logic and content fingerprinting
 │   ├── dedupe_store.py      # SQLite persistence for dedupe tracking
 │   ├── routers/
 │   │   └── classify_email.py # /classify POST endpoint
 │   └── helpers/
 │       ├── clean_email_html.py
 │       ├── extract_latest_message.py
 │       └── remove_disclaimer.py
 ├── docs/                    # MkDocs documentation (this site)
 ├── Dockerfile
 ├── pyproject.toml
 └── uv.lock
 ```
 ## Output Classification Schema
 Emails are classified into one of these categories:
 | Category | Description |
 |---|---|
 | `action_required` | Direct request requiring user action |
 | `question` | Question needing a response |
 | `fyi` | Informational, no reply needed |
 | `newsletter` | Newsletter or publication |
 | `promotional` | Marketing or sales outreach |
 | `automated` | Automated system notification |
 | `alert` | I.T. or security alert |
 | `uncategorized` | Fallback when classification fails |
 Priority is one of: `high`, `medium`, `low`.
--- a/docs/quirks.md
+++ b/docs/quirks.md
@@ -0,0 +1,52 @@
 # Known Quirks
 ## MiniMax Base URL
 MiniMax uses an **Anthropic-compatible** API endpoint that is **different** from the standard OpenAI-compatible path. Using the wrong URL will result in silent failures or 404 errors.
 **Correct MiniMax configuration:**
 ```bash
 export LLM_PROVIDER=anthropic
 export LLM_BASE_URL=https://api.minimax.io/anthropic
 export LLM_MODEL=MiniMax-M2.7
 ```
 **Incorrect (common mistake):**
 ```bash
 # Wrong — this is the OpenAI-compatible path, not the Anthropic path
 export LLM_BASE_URL=https://api.minimax.io/v1
 ```
 MiniMax's Anthropic-compatible endpoint is at `/anthropic`, not `/v1`. Always verify the correct endpoint in your provider's documentation.
 ## Per-Request `api_key` Exclusion
 The `api_key` field in a request body is excluded from all logging and dedupe storage (`exclude=True` in the Pydantic model). However, it is still transmitted to the LLM adapter in plaintext during the request. Do not send untrusted request bodies to untrusted networks.
 ## SQLite Dedupe Database Path
 The dedupe database path is relative to the **working directory** where the process starts, not relative to the application code. If you run the service from different directories, you may end up with multiple databases.
 Always set `EMAIL_CLASSIFIER_DB_PATH` to an absolute path when running in production:
 ```bash
 export EMAIL_CLASSIFIER_DB_PATH=/data/email_classifier.db
 ```
 ## Classification Retries
 The classifier **retries** when `needs_action=true` but `task_description` is missing (an invalid state). This means a flaky LLM that sometimes omits `task_description` will be called multiple times. If this causes issues (e.g., rate limiting), set `LLM_MAX_RETRIES=1`.
 ## HTML Body Processing
 The service strips disclaimers and cleans HTML from email bodies before sending to the LLM. This is aggressive and may also remove some legitimate HTML content in some email clients. There is currently no way to disable this cleaning step.
 ## No Authentication
 The service has **no built-in authentication**. It is designed to run behind a reverse proxy (nginx, Caddy, etc.) that handles auth. Do not expose port `7999` directly to the internet.
 ## Dedupe Fingerprinting Limitations
 The fingerprint-based dedupe fallback is **heuristic**, not exact. It uses a normalized subject + body preview + first 2000 characters of the cleaned body. Minor edits to an email (rewording, adding a signature line) can produce a different fingerprint and cause the email to be treated as `new` rather than `duplicate`. Conversely, very similar emails from different senders may collide.
 For strict deduplication, rely on `message_id` (exact Outlook message ID match) or `conversation_id` (thread grouping) rather than fingerprint.
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -0,0 +1,67 @@
 # Setup & Installation
 ## Prerequisites
 - Python 3.12+
 - [uv](https://astral.sh/uv/) package manager
 - An LLM backend (Ollama, LM Studio, MiniMax, OpenAI, or any OpenAI/Anthropic-compatible API)
 ## Quick Start
 ```bash
 # Clone the repository
 git clone https://git.danhenry.dev/daniel/email-classifier.git
 cd email-classifier
 # Install dependencies
 uv sync
 # Start the server
 uv run uvicorn app.main:app --host 0.0.0.0 --port 7999
 ```
 The API will be available at `http://localhost:7999`. Auto-generated API docs are at `http://localhost:7999/docs` (Swagger UI) and `http://localhost:7999/redoc`.
 ## Environment Variables
 The service is configured entirely through environment variables. See [Configuration](configuration.md) for the full reference.
 A minimal `.env` file for local development with Ollama:
 ```bash
 LLM_PROVIDER=openai
 LLM_BASE_URL=http://localhost:11434/v1
 LLM_API_KEY=none
 LLM_MODEL=qwen2.5-7b-instruct.q4_k_m
 LLM_TEMPERATURE=0.1
 ```
 ## Using Docker
 ```bash
 # Build the image
 docker build -t email-classifier .
 # Run the container
 docker run -p 7999:7999 \
  -e LLM_PROVIDER=openai \
  -e LLM_BASE_URL=http://host.docker.internal:11434/v1 \
  -e LLM_API_KEY=none \
  -e LLM_MODEL=qwen2.5-7b-instruct.q4_k_m \
  email-classifier
 ```
 ## Dependency Management
 This project uses [uv](https://astral.sh/uv/) for dependency management. Do not use `pip` directly.
 ```bash
 # Add a new dependency
 uv add <package>
 # Sync dependencies (after pulling changes)
 uv sync
 # Run with uv (recommended)
 uv run uvicorn app.main:app --reload
 ```
--- a/docs/testing.md
+++ b/docs/testing.md
@@ -0,0 +1,114 @@
 # Testing Locally
 ## Running the Server
 ```bash
 cd email-classifier
 uv sync
 uv run uvicorn app.main:app --host 0.0.0.0 --port 7999 --reload
 ```
 The server starts on port **7999** by default. Access the API docs at:
 - Swagger UI: `http://localhost:7999/docs`
 - ReDoc: `http://localhost:7999/redoc`
 ## Sending Test Requests
 ### With `curl`
 **Simplified request:**
 ```bash
 curl -X POST http://localhost:7999/classify \
  -H "Content-Type: application/json" \
  -d '{
    "email_data": {
      "subject": "Printer issue in MB building",
      "body": "Hi, the printer on floor 2 is not working. Can someone take a look?"
    },
    "id": "test-001",
    "conversationId": "test-conv-001"
  }'
 ```
 **Full Outlook-shaped request:**
 ```bash
 curl -X POST http://localhost:7999/classify \
  -H "Content-Type: application/json" \
  -d '{
    "id": "AAMkAD...",
    "conversationId": "AAQkAD...",
    "subject": "VPN is down",
    "body": {
      "contentType": "html",
      "content": "<html><body>Users are reporting VPN connectivity issues.</body></html>"
    },
    "sender": {
      "emailAddress": {
        "name": "Jane Smith",
        "address": "jane.smith@grandportage.com"
      }
    },
    "from": {
      "emailAddress": {
        "name": "Jane Smith",
        "address": "jane.smith@grandportage.com"
      }
    },
    "toRecipients": [
      {
        "emailAddress": {
          "name": "IT Helpdesk",
          "address": "helpdesk@grandportage.com"
        }
      }
    ],
    "ccRecipients": [],
    "receivedDateTime": "2026-04-09T10:00:00Z",
    "sentDateTime": "2026-04-09T09:55:00Z",
    "importance": "high"
  }'
 ```
 ### With the Swagger UI
 Open `http://localhost:7999/docs`, click **POST /classify**, click **Try it out**, paste your JSON payload, and click **Execute**.
 ## Running Tests
 This project does not currently include a test suite. To add tests, use `pytest`:
 ```bash
 uv add --dev pytest pytest-asyncio httpx
 uv run pytest
 ```
 ## Verifying Deduplication
 The dedupe store is a SQLite database at `.data/email_classifier.db`. You can inspect it directly:
 ```bash
 sqlite3 .data/email_classifier.db ".schema classification_dedupe"
 sqlite3 .data/email_classifier.db "SELECT * FROM classification_dedupe LIMIT 10;"
 ```
 To reset deduplication state between tests:
 ```bash
 rm .data/email_classifier.db
 ```
 ## Testing with Different LLM Providers
 Start the server with a specific provider:
 ```bash
 LLM_PROVIDER=anthropic \
 LLM_BASE_URL=https://api.minimax.io/anthropic \
 LLM_API_KEY=your_key \
 LLM_MODEL=MiniMax-M2.7 \
 uv run uvicorn app.main:app --reload
 ```
 Or override per-request by including `provider`, `base_url`, `model`, and `api_key` in the request body.
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -0,0 +1,49 @@
 site_name: email-classifier
 site_description: FastAPI service that classifies email using a configurable LLM backend
 site_url: https://git.danhenry.dev/daniel/email-classifier
 docs_dir: docs
 exclude_docs: |
  venv/
  .venv/
  __pycache__/
 repo_name: daniel/email-classifier
 repo_url: https://git.danhenry.dev/daniel/email-classifier
 nav:
  - Home: index.md
  - Setup: setup.md
  - API Reference: api.md
  - Configuration: configuration.md
  - Testing Locally: testing.md
  - Deployment: deployment.md
  - Known Quirks: quirks.md
 theme:
  name: material
  palette:
    - scheme: default
      primary: indigo
      accent: indigo
      toggle:
        icon: material/brightness-7
        name: Switch to dark mode
    - scheme: slate
      primary: indigo
      accent: indigo
      toggle:
        icon: material/brightness-4
        name: Switch to light mode
  features:
    - navigation.instant
    - navigation.tracking
    - content.code.copy
 markdown_extensions:
  - pymdownx.highlight:
      anchor_linenums: true
  - pymdownx.superfences
  - admonition
  - toc:
      permalink: true
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -9,5 +9,6 @@ dependencies = [
    "beautifulsoup4>=4.14.3",
    "fastapi>=0.128.0",
    "openai>=2.16.0",
    "PyYAML>=6.0.2",
    "uvicorn>=0.40.0",
 ]
Author	SHA1	Message	Date
Daniel Henry	c0970c066e	Merge pull request 'ci: update build-publish workflow for multi-branch and tag support' (#8 ) from ci/update-workflow into main All checks were successful Build and Publish Docker Image / build-only (push) Has been skipped Details Build and Publish Docker Image / build-and-push-main (push) Successful in 4m22s Details Build and Publish Docker Image / build-and-push-tag (push) Has been skipped Details Reviewed-on: #8	2026-04-09 23:10:37 +00:00
Lennie S.	97abc74297	ci: fix if-expression syntax and restructure into separate jobs All checks were successful Build and Publish Docker Image / build-only (push) Successful in 3m23s Details Build and Publish Docker Image / build-and-push-main (push) Has been skipped Details Build and Publish Docker Image / build-and-push-tag (push) Has been skipped Details Build and Publish Docker Image / build-only (pull_request) Successful in 2m7s Details Build and Publish Docker Image / build-and-push-main (pull_request) Has been skipped Details Build and Publish Docker Image / build-and-push-tag (pull_request) Has been skipped Details - Replace complex single-job conditional steps with 3 separate jobs: build-only, build-and-push-main, build-and-push-tag - Use Gitea Actions-compatible startsWith() function instead of .startsWith() method call - Remove nested parentheses that Gitea Actions cannot parse - Push only on refs/heads/main and refs/tags/v* (not all branches)	2026-04-09 23:05:25 +00:00
Daniel Henry	ab14d55824	Merge pull request 'Add YAML config support and Compose deployment example' (#6 ) from docs/mkdocs into main All checks were successful Build and Publish Docker Image / build-and-push (push) Successful in 3m1s Details Reviewed-on: #6	2026-04-09 21:14:23 +00:00
Steve W	3e9904576f	Add YAML config support and Compose deployment example	2026-04-09 21:06:52 +00:00
Daniel Henry	9f259a299f	Merge pull request 'Add MkDocs documentation' (#5 ) from docs/mkdocs into main All checks were successful Build and Publish Docker Image / build-and-push (push) Successful in 1m58s Details Reviewed-on: #5	2026-04-09 20:57:49 +00:00
Lennie S.	8d1109c309	Fix pipe escaping in API docs tables	2026-04-09 20:55:52 +00:00
Steve W	39c0d787fc	Fix MkDocs docs_dir and ignore docs virtualenv	2026-04-09 20:52:17 +00:00
Lennie S.	bcf660f222	Fix mkdocs.yml: remove invalid pymdownx.snippets autoindent option	2026-04-09 20:48:03 +00:00
Lennie S.	760b56bfd6	Add MkDocs documentation Covers: overview, setup, API reference, configuration, testing, deployment, and known quirks.	2026-04-09 20:24:49 +00:00