# notebook-tools FastAPI service that: - downloads PDFs from Paperless-ngx - splits them into pages (JPEG) - OCRs each page via your llama.cpp OpenAI-compatible endpoint - converts each page back into a single-page PDF - uploads **one Paperless document per page** (all uploads run **in parallel**; OCR stays **one page at a time** for VRAM) - patches each uploaded document with: - `content` = OCR text - custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2) - `document_type` = Paperless document type id (default **3**, configurable) ## Setup Install deps: ```bash uv sync ``` Create a `.env` file (example below) and **do not commit it**. ## Run locally ```bash uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080 ``` Then open the docs at: - `http://127.0.0.1:8080/docs` (same machine) - `http://:8080/docs` (other machines on your network) If other machines still can’t connect, check your macOS firewall and any router/network rules. ## Docker Build and run (pass env via file or `-e`; the app reads `.env` only if you mount it): ```bash docker build -t notebook-tools:local . docker run --rm -p 8080:8080 --env-file .env notebook-tools:local ``` `LLAMA_BASE_URL` / `PAPERLESS_BASE_URL` must be reachable **from inside the container** (use `host.docker.internal` on Docker Desktop, or your LAN IP, not `127.0.0.1` for services on the host). ### Docker Compose Save as `compose.yaml` (any directory with your `.env`): ```yaml services: notebook-tools: image: git.danhenry.dev/daniel/notebook-tools:latest ports: - "8080:8080" env_file: - .env # Lets the container reach services bound on the host (e.g. llama on :9292). # Linux: requires Docker 20.10+ / Compose v2; omit on Docker Desktop if already available. extra_hosts: - "host.docker.internal:host-gateway" ``` ```bash docker compose pull && docker compose up ``` Log in to `git.danhenry.dev` first if the registry requires auth: `docker login git.danhenry.dev`. For llama running **on the host**, set in `.env`: ```bash LLAMA_BASE_URL="http://host.docker.internal:9292" ``` `PAPERLESS_BASE_URL` can stay a normal `https://…` URL if the container has network access to it. CI: on push to `main`, [.github/workflows/build-docker.yml](.github/workflows/build-docker.yml) builds and pushes using the same secrets pattern as your other Gitea repos (`DOCKER_REGISTRY`, `DOCKER_USERNAME`, `DOCKER_PASSWORD`). For Docker Hub, set `DOCKER_REGISTRY` to `docker.io` (or leave per your runner docs). ## Example `.env` ```bash PAPERLESS_BASE_URL="https://paperless.example.com" PAPERLESS_TOKEN="paste-token-here" LLAMA_BASE_URL="http://127.0.0.1:9292" LLAMA_MODEL="ggml-model-q4_k_m" # Custom field ids in Paperless PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1 PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2 PAPERLESS_DOCUMENT_TYPE_ID=3 # Optional: cap concurrent Paperless uploads (0 = unlimited) PAPERLESS_UPLOAD_CONCURRENCY=4 # Rendering / OCR knobs RENDER_DPI=200 OCR_MAX_TOKENS=1024 OCR_TEMPERATURE=0.0 ```