Enhance README with detailed service description, setup instructions, and example .env configuration for the FastAPI service that integrates with Paperless-ngx and llama.cpp for PDF processing.

This commit is contained in:
2026-03-31 14:29:50 -05:00
parent facf6b26f0
commit 9b1705d82b
7 changed files with 699 additions and 0 deletions

View File

@@ -1,2 +1,56 @@
# notebook-tools
FastAPI service that:
- downloads PDFs from Paperless-ngx
- splits them into pages (JPEG)
- OCRs each page via your llama.cpp OpenAI-compatible endpoint
- converts each page back into a single-page PDF
- uploads **one Paperless document per page**
- patches each uploaded document with:
- `content` = OCR text
- custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2)
- `document_type` = Paperless document type id (default **3**, configurable)
## Setup
Install deps:
```bash
uv sync
```
Create a `.env` file (example below) and **do not commit it**.
## Run locally
```bash
uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080
```
Then open the docs at:
- `http://127.0.0.1:8080/docs` (same machine)
- `http://<your-lan-ip>:8080/docs` (other machines on your network)
If other machines still cant connect, check your macOS firewall and any router/network rules.
## Example `.env`
```bash
PAPERLESS_BASE_URL="https://paperless.example.com"
PAPERLESS_TOKEN="paste-token-here"
LLAMA_BASE_URL="http://127.0.0.1:9292"
LLAMA_MODEL="ggml-model-q4_k_m"
# Custom field ids in Paperless
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
PAPERLESS_DOCUMENT_TYPE_ID=3
# Rendering / OCR knobs
RENDER_DPI=200
OCR_MAX_TOKENS=1024
OCR_TEMPERATURE=0.0
```