Enhance README with detailed service description, setup instructions, and example .env configuration for the FastAPI service that integrates with Paperless-ngx and llama.cpp for PDF processing.
This commit is contained in:
54
README.md
54
README.md
@@ -1,2 +1,56 @@
|
||||
# notebook-tools
|
||||
|
||||
FastAPI service that:
|
||||
- downloads PDFs from Paperless-ngx
|
||||
- splits them into pages (JPEG)
|
||||
- OCRs each page via your llama.cpp OpenAI-compatible endpoint
|
||||
- converts each page back into a single-page PDF
|
||||
- uploads **one Paperless document per page**
|
||||
- patches each uploaded document with:
|
||||
- `content` = OCR text
|
||||
- custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2)
|
||||
- `document_type` = Paperless document type id (default **3**, configurable)
|
||||
|
||||
## Setup
|
||||
|
||||
Install deps:
|
||||
|
||||
```bash
|
||||
uv sync
|
||||
```
|
||||
|
||||
Create a `.env` file (example below) and **do not commit it**.
|
||||
|
||||
## Run locally
|
||||
|
||||
```bash
|
||||
uv run uvicorn notebook_tools.api:app --reload --host 0.0.0.0 --port 8080
|
||||
```
|
||||
|
||||
Then open the docs at:
|
||||
- `http://127.0.0.1:8080/docs` (same machine)
|
||||
- `http://<your-lan-ip>:8080/docs` (other machines on your network)
|
||||
|
||||
If other machines still can’t connect, check your macOS firewall and any router/network rules.
|
||||
|
||||
## Example `.env`
|
||||
|
||||
```bash
|
||||
PAPERLESS_BASE_URL="https://paperless.example.com"
|
||||
PAPERLESS_TOKEN="paste-token-here"
|
||||
|
||||
LLAMA_BASE_URL="http://127.0.0.1:9292"
|
||||
LLAMA_MODEL="ggml-model-q4_k_m"
|
||||
|
||||
# Custom field ids in Paperless
|
||||
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
|
||||
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
|
||||
PAPERLESS_DOCUMENT_TYPE_ID=3
|
||||
|
||||
# Rendering / OCR knobs
|
||||
RENDER_DPI=200
|
||||
OCR_MAX_TOKENS=1024
|
||||
OCR_TEMPERATURE=0.0
|
||||
```
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user