60 lines
1.5 KiB
Markdown
60 lines
1.5 KiB
Markdown
# notebook-tools
|
||
|
||
FastAPI service that:
|
||
- downloads PDFs from Paperless-ngx
|
||
- splits them into pages (JPEG)
|
||
- OCRs each page via your llama.cpp OpenAI-compatible endpoint
|
||
- converts each page back into a single-page PDF
|
||
- uploads **one Paperless document per page** (all uploads run **in parallel**; OCR stays **one page at a time** for VRAM)
|
||
- patches each uploaded document with:
|
||
- `content` = OCR text
|
||
- custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2)
|
||
- `document_type` = Paperless document type id (default **3**, configurable)
|
||
|
||
## Setup
|
||
|
||
Install deps:
|
||
|
||
```bash
|
||
uv sync
|
||
```
|
||
|
||
Create a `.env` file (example below) and **do not commit it**.
|
||
|
||
## Run locally
|
||
|
||
```bash
|
||
uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080
|
||
```
|
||
|
||
Then open the docs at:
|
||
- `http://127.0.0.1:8080/docs` (same machine)
|
||
- `http://<your-lan-ip>:8080/docs` (other machines on your network)
|
||
|
||
If other machines still can’t connect, check your macOS firewall and any router/network rules.
|
||
|
||
## Example `.env`
|
||
|
||
```bash
|
||
PAPERLESS_BASE_URL="https://paperless.example.com"
|
||
PAPERLESS_TOKEN="paste-token-here"
|
||
|
||
LLAMA_BASE_URL="http://127.0.0.1:9292"
|
||
LLAMA_MODEL="ggml-model-q4_k_m"
|
||
|
||
# Custom field ids in Paperless
|
||
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
|
||
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
|
||
PAPERLESS_DOCUMENT_TYPE_ID=3
|
||
|
||
# Optional: cap concurrent Paperless uploads (0 = unlimited)
|
||
PAPERLESS_UPLOAD_CONCURRENCY=4
|
||
|
||
# Rendering / OCR knobs
|
||
RENDER_DPI=200
|
||
OCR_MAX_TOKENS=1024
|
||
OCR_TEMPERATURE=0.0
|
||
```
|
||
|
||
|