Files
notebook-tools/README.md

73 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# notebook-tools
FastAPI service that:
- downloads PDFs from Paperless-ngx
- splits them into pages (JPEG)
- OCRs each page via your llama.cpp OpenAI-compatible endpoint
- converts each page back into a single-page PDF
- uploads **one Paperless document per page** (all uploads run **in parallel**; OCR stays **one page at a time** for VRAM)
- patches each uploaded document with:
- `content` = OCR text
- custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2)
- `document_type` = Paperless document type id (default **3**, configurable)
## Setup
Install deps:
```bash
uv sync
```
Create a `.env` file (example below) and **do not commit it**.
## Run locally
```bash
uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080
```
Then open the docs at:
- `http://127.0.0.1:8080/docs` (same machine)
- `http://<your-lan-ip>:8080/docs` (other machines on your network)
If other machines still cant connect, check your macOS firewall and any router/network rules.
## Docker
Build and run (pass env via file or `-e`; the app reads `.env` only if you mount it):
```bash
docker build -t notebook-tools:local .
docker run --rm -p 8080:8080 --env-file .env notebook-tools:local
```
`LLAMA_BASE_URL` / `PAPERLESS_BASE_URL` must be reachable **from inside the container** (use `host.docker.internal` on Docker Desktop, or your LAN IP, not `127.0.0.1` for services on the host).
CI: on push to `main`, [.gitea/workflows/build-docker.yml](.gitea/workflows/build-docker.yml) builds and pushes using the same secrets pattern as your other Gitea repos (`DOCKER_REGISTRY`, `DOCKER_USERNAME`, `DOCKER_PASSWORD`). For Docker Hub, set `DOCKER_REGISTRY` to `docker.io` (or leave per your runner docs).
## Example `.env`
```bash
PAPERLESS_BASE_URL="https://paperless.example.com"
PAPERLESS_TOKEN="paste-token-here"
LLAMA_BASE_URL="http://127.0.0.1:9292"
LLAMA_MODEL="ggml-model-q4_k_m"
# Custom field ids in Paperless
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
PAPERLESS_DOCUMENT_TYPE_ID=3
# Optional: cap concurrent Paperless uploads (0 = unlimited)
PAPERLESS_UPLOAD_CONCURRENCY=4
# Rendering / OCR knobs
RENDER_DPI=200
OCR_MAX_TOKENS=1024
OCR_TEMPERATURE=0.0
```