Add parallel uploading of documents

Signed-off-by: Daniel Henry <iamdanhenry@gmail.com>
This commit is contained in:
2026-03-31 16:23:10 -05:00
parent 29c790fdfd
commit 612fbe2055
4 changed files with 91 additions and 54 deletions

View File

@@ -5,7 +5,7 @@ FastAPI service that:
- splits them into pages (JPEG)
- OCRs each page via your llama.cpp OpenAI-compatible endpoint
- converts each page back into a single-page PDF
- uploads **one Paperless document per page**
- uploads **one Paperless document per page** (all uploads run **in parallel**; OCR stays **one page at a time** for VRAM)
- patches each uploaded document with:
- `content` = OCR text
- custom fields `notebook_id` (field id 1) and `notebook_page` (field id 2)
@@ -47,6 +47,9 @@ PAPERLESS_CUSTOM_FIELD_NOTEBOOK_ID=1
PAPERLESS_CUSTOM_FIELD_NOTEBOOK_PAGE=2
PAPERLESS_DOCUMENT_TYPE_ID=3
# Optional: cap concurrent Paperless uploads (0 = unlimited)
PAPERLESS_UPLOAD_CONCURRENCY=4
# Rendering / OCR knobs
RENDER_DPI=200
OCR_MAX_TOKENS=1024