Add MkDocs documentation
Covers: overview, setup, API reference, configuration, testing, deployment, and known quirks.
This commit is contained in:
52
docs/quirks.md
Normal file
52
docs/quirks.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Known Quirks
|
||||
|
||||
## MiniMax Base URL
|
||||
|
||||
MiniMax uses an **Anthropic-compatible** API endpoint that is **different** from the standard OpenAI-compatible path. Using the wrong URL will result in silent failures or 404 errors.
|
||||
|
||||
**Correct MiniMax configuration:**
|
||||
```bash
|
||||
export LLM_PROVIDER=anthropic
|
||||
export LLM_BASE_URL=https://api.minimax.io/anthropic
|
||||
export LLM_MODEL=MiniMax-M2.7
|
||||
```
|
||||
|
||||
**Incorrect (common mistake):**
|
||||
```bash
|
||||
# Wrong — this is the OpenAI-compatible path, not the Anthropic path
|
||||
export LLM_BASE_URL=https://api.minimax.io/v1
|
||||
```
|
||||
|
||||
MiniMax's Anthropic-compatible endpoint is at `/anthropic`, not `/v1`. Always verify the correct endpoint in your provider's documentation.
|
||||
|
||||
## Per-Request `api_key` Exclusion
|
||||
|
||||
The `api_key` field in a request body is excluded from all logging and dedupe storage (`exclude=True` in the Pydantic model). However, it is still transmitted to the LLM adapter in plaintext during the request. Do not send untrusted request bodies to untrusted networks.
|
||||
|
||||
## SQLite Dedupe Database Path
|
||||
|
||||
The dedupe database path is relative to the **working directory** where the process starts, not relative to the application code. If you run the service from different directories, you may end up with multiple databases.
|
||||
|
||||
Always set `EMAIL_CLASSIFIER_DB_PATH` to an absolute path when running in production:
|
||||
|
||||
```bash
|
||||
export EMAIL_CLASSIFIER_DB_PATH=/data/email_classifier.db
|
||||
```
|
||||
|
||||
## Classification Retries
|
||||
|
||||
The classifier **retries** when `needs_action=true` but `task_description` is missing (an invalid state). This means a flaky LLM that sometimes omits `task_description` will be called multiple times. If this causes issues (e.g., rate limiting), set `LLM_MAX_RETRIES=1`.
|
||||
|
||||
## HTML Body Processing
|
||||
|
||||
The service strips disclaimers and cleans HTML from email bodies before sending to the LLM. This is aggressive and may also remove some legitimate HTML content in some email clients. There is currently no way to disable this cleaning step.
|
||||
|
||||
## No Authentication
|
||||
|
||||
The service has **no built-in authentication**. It is designed to run behind a reverse proxy (nginx, Caddy, etc.) that handles auth. Do not expose port `7999` directly to the internet.
|
||||
|
||||
## Dedupe Fingerprinting Limitations
|
||||
|
||||
The fingerprint-based dedupe fallback is **heuristic**, not exact. It uses a normalized subject + body preview + first 2000 characters of the cleaned body. Minor edits to an email (rewording, adding a signature line) can produce a different fingerprint and cause the email to be treated as `new` rather than `duplicate`. Conversely, very similar emails from different senders may collide.
|
||||
|
||||
For strict deduplication, rely on `message_id` (exact Outlook message ID match) or `conversation_id` (thread grouping) rather than fingerprint.
|
||||
Reference in New Issue
Block a user