Files
notebook-tools/README.md
Daniel Henry 4820888ff9
All checks were successful
Build and Publish Docker Image / build-and-push (push) Successful in 3m27s
Update Readme
Signed-off-by: Daniel Henry <iamdanhenry@gmail.com>
2026-03-31 18:40:32 -05:00

3.0 KiB
Raw Blame History

notebook-tools

FastAPI service that:

  • downloads PDFs from Paperless-ngx
  • splits them into pages (JPEG)
  • OCRs each page via your llama.cpp OpenAI-compatible endpoint
  • converts each page back into a single-page PDF
  • uploads one Paperless document per page (all uploads run in parallel; OCR stays one page at a time for VRAM)
  • patches each uploaded document with:
    • content = OCR text
    • custom fields notebook_id (field id 1) and notebook_page (field id 2)
    • document_type = Paperless document type id (default 3, configurable)

Setup

Install deps:

uv sync

Create a .env file (example below) and do not commit it.

Run locally

uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080

Then open the docs at:

  • http://127.0.0.1:8080/docs (same machine)
  • http://<your-lan-ip>:8080/docs (other machines on your network)

If other machines still cant connect, check your macOS firewall and any router/network rules.

Docker

Build and run (pass env via file or -e; the app reads .env only if you mount it):

docker build -t notebook-tools:local .
docker run --rm -p 8080:8080 --env-file .env notebook-tools:local

LLAMA_BASE_URL / PAPERLESS_BASE_URL must be reachable from inside the container (use host.docker.internal on Docker Desktop, or your LAN IP, not 127.0.0.1 for services on the host).

Docker Compose

Save as compose.yaml (any directory with your .env):

services:
  notebook-tools:
    image: git.danhenry.dev/daniel/notebook-tools:latest
    ports:
      - "8080:8080"
    env_file:
      - .env
    # Lets the container reach services bound on the host (e.g. llama on :9292).
    # Linux: requires Docker 20.10+ / Compose v2; omit on Docker Desktop if already available.
    extra_hosts:
      - "host.docker.internal:host-gateway"
docker compose pull && docker compose up

Log in to git.danhenry.dev first if the registry requires auth: docker login git.danhenry.dev.

For llama running on the host, set in .env:

LLAMA_BASE_URL="http://host.docker.internal:9292"

PAPERLESS_BASE_URL can stay a normal https://… URL if the container has network access to it.

CI: on push to main, .github/workflows/build-docker.yml builds and pushes using the same secrets pattern as your other Gitea repos (DOCKER_REGISTRY, DOCKER_USERNAME, DOCKER_PASSWORD). For Docker Hub, set DOCKER_REGISTRY to docker.io (or leave per your runner docs).

Example .env

PAPERLESS_BASE_URL="https://paperless.example.com"
PAPERLESS_TOKEN="paste-token-here"

LLAMA_BASE_URL="http://127.0.0.1:9292"
LLAMA_MODEL="ggml-model-q4_k_m"

# Custom field ids in Paperless
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
PAPERLESS_DOCUMENT_TYPE_ID=3

# Optional: cap concurrent Paperless uploads (0 = unlimited)
PAPERLESS_UPLOAD_CONCURRENCY=4

# Rendering / OCR knobs
RENDER_DPI=200
OCR_MAX_TOKENS=1024
OCR_TEMPERATURE=0.0