Files

Daniel Henry 7fec4bc575 Add Dockerfile, .dockerignore, and Gitea CI for image build/push

Made-with: Cursor

2026-03-31 18:21:08 -05:00

2.2 KiB

Raw Blame History

notebook-tools

FastAPI service that:

downloads PDFs from Paperless-ngx
splits them into pages (JPEG)
OCRs each page via your llama.cpp OpenAI-compatible endpoint
converts each page back into a single-page PDF
uploads one Paperless document per page (all uploads run in parallel; OCR stays one page at a time for VRAM)
patches each uploaded document with:
- content = OCR text
- custom fields notebook_id (field id 1) and notebook_page (field id 2)
- document_type = Paperless document type id (default 3, configurable)

Setup

Install deps:

uv sync

Create a .env file (example below) and do not commit it.

Run locally

uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080

Then open the docs at:

http://127.0.0.1:8080/docs (same machine)
http://<your-lan-ip>:8080/docs (other machines on your network)

If other machines still can’t connect, check your macOS firewall and any router/network rules.

Docker

Build and run (pass env via file or -e; the app reads .env only if you mount it):

docker build -t notebook-tools:local .
docker run --rm -p 8080:8080 --env-file .env notebook-tools:local

LLAMA_BASE_URL / PAPERLESS_BASE_URL must be reachable from inside the container (use host.docker.internal on Docker Desktop, or your LAN IP, not 127.0.0.1 for services on the host).

CI: on push to main, .gitea/workflows/build-docker.yml builds and pushes using the same secrets pattern as your other Gitea repos (DOCKER_REGISTRY, DOCKER_USERNAME, DOCKER_PASSWORD). For Docker Hub, set DOCKER_REGISTRY to docker.io (or leave per your runner docs).

Example `.env`

PAPERLESS_BASE_URL="https://paperless.example.com"
PAPERLESS_TOKEN="paste-token-here"

LLAMA_BASE_URL="http://127.0.0.1:9292"
LLAMA_MODEL="ggml-model-q4_k_m"

# Custom field ids in Paperless
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
PAPERLESS_DOCUMENT_TYPE_ID=3

# Optional: cap concurrent Paperless uploads (0 = unlimited)
PAPERLESS_UPLOAD_CONCURRENCY=4

# Rendering / OCR knobs
RENDER_DPI=200
OCR_MAX_TOKENS=1024
OCR_TEMPERATURE=0.0

2.2 KiB Raw Blame History Unescape Escape