Files
notebook-tools/README.md
Daniel Henry 4820888ff9
All checks were successful
Build and Publish Docker Image / build-and-push (push) Successful in 3m27s
Update Readme
Signed-off-by: Daniel Henry <iamdanhenry@gmail.com>
2026-03-31 18:40:32 -05:00

105 lines
3.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# notebook-tools
FastAPI service that:
- downloads PDFs from Paperless-ngx
- splits them into pages (JPEG)
- OCRs each page via your llama.cpp OpenAI-compatible endpoint
- converts each page back into a single-page PDF
- uploads **one Paperless document per page** (all uploads run **in parallel**; OCR stays **one page at a time** for VRAM)
- patches each uploaded document with:
- `content` = OCR text
- custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2)
- `document_type` = Paperless document type id (default **3**, configurable)
## Setup
Install deps:
```bash
uv sync
```
Create a `.env` file (example below) and **do not commit it**.
## Run locally
```bash
uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080
```
Then open the docs at:
- `http://127.0.0.1:8080/docs` (same machine)
- `http://<your-lan-ip>:8080/docs` (other machines on your network)
If other machines still cant connect, check your macOS firewall and any router/network rules.
## Docker
Build and run (pass env via file or `-e`; the app reads `.env` only if you mount it):
```bash
docker build -t notebook-tools:local .
docker run --rm -p 8080:8080 --env-file .env notebook-tools:local
```
`LLAMA_BASE_URL` / `PAPERLESS_BASE_URL` must be reachable **from inside the container** (use `host.docker.internal` on Docker Desktop, or your LAN IP, not `127.0.0.1` for services on the host).
### Docker Compose
Save as `compose.yaml` (any directory with your `.env`):
```yaml
services:
notebook-tools:
image: git.danhenry.dev/daniel/notebook-tools:latest
ports:
- "8080:8080"
env_file:
- .env
# Lets the container reach services bound on the host (e.g. llama on :9292).
# Linux: requires Docker 20.10+ / Compose v2; omit on Docker Desktop if already available.
extra_hosts:
- "host.docker.internal:host-gateway"
```
```bash
docker compose pull && docker compose up
```
Log in to `git.danhenry.dev` first if the registry requires auth: `docker login git.danhenry.dev`.
For llama running **on the host**, set in `.env`:
```bash
LLAMA_BASE_URL="http://host.docker.internal:9292"
```
`PAPERLESS_BASE_URL` can stay a normal `https://…` URL if the container has network access to it.
CI: on push to `main`, [.github/workflows/build-docker.yml](.github/workflows/build-docker.yml) builds and pushes using the same secrets pattern as your other Gitea repos (`DOCKER_REGISTRY`, `DOCKER_USERNAME`, `DOCKER_PASSWORD`). For Docker Hub, set `DOCKER_REGISTRY` to `docker.io` (or leave per your runner docs).
## Example `.env`
```bash
PAPERLESS_BASE_URL="https://paperless.example.com"
PAPERLESS_TOKEN="paste-token-here"
LLAMA_BASE_URL="http://127.0.0.1:9292"
LLAMA_MODEL="ggml-model-q4_k_m"
# Custom field ids in Paperless
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
PAPERLESS_DOCUMENT_TYPE_ID=3
# Optional: cap concurrent Paperless uploads (0 = unlimited)
PAPERLESS_UPLOAD_CONCURRENCY=4
# Rendering / OCR knobs
RENDER_DPI=200
OCR_MAX_TOKENS=1024
OCR_TEMPERATURE=0.0
```