Commit Graph

30 Commits

Author SHA1 Message Date
perf3ct 564c564613
feat(ocr): use ocrmypdf and pdftotext to get OCR layer if it already exists 2025-07-15 15:59:29 +00:00
perf3ct 549c2f8a16
feat(dev): drop pdf_extract in favor of ocrmypdf 2025-07-15 14:50:17 +00:00
perf3ct a393bd030f
fix(tests): resolve issues in integration tests for the new multiple ocr languages 2025-07-14 21:28:55 +00:00
perf3ct fd152acb91
Merge branch 'feat/multiple-ocr-languages' of https://github.com/readur/readur into feat/multiple-ocr-languages 2025-07-14 19:33:51 +00:00
perf3ct 9d9488954c
feat(lang): update backend to support multiple languages at the same time during OCR 2025-07-14 19:33:43 +00:00
Jon Fuller 8edc4759f1
Merge branch 'main' into feat/multiple-ocr-languages 2025-07-14 11:29:46 -07:00
perf3ct 9c051b6f55
feat(ocr): gracefully handle problematic PDFs in all the ways, create tests so that it doesn't happen again 2025-07-14 16:36:32 +00:00
perf3ct 5671038bd1
fix(dev): merge main into feature 2025-07-13 17:15:59 +00:00
perf3ct f18696d4f8
feat(server): gracefully manage requeue requests for the same document 2025-07-11 21:27:12 +00:00
perf3ct ea94dff8ba
fix(stats): create new get_queue_statistics function to avoid conflicts 2025-07-09 00:27:43 +00:00
perf3ct 36b5330622
fix(stats): try to fix the stats extraction, again 2025-07-08 21:18:21 +00:00
perf3ct 8b1cf027a3
fix(server): resolve incorrect document failure titles 2025-07-08 20:24:52 +00:00
perf3ct d51d1f1c78
fix(stats): try to fix stats export, again 2025-07-08 20:03:55 +00:00
perf3ct 05a0355796
fix(tests): fix the crazy metrics collection issue 2025-07-08 16:52:23 +00:00
perf3ct 7d48480cd6
fix(tests): and resolve missing endpoint 2025-07-08 04:37:33 +00:00
perf3ct bf2162ad89
fix(web_upload): resolve issue that caused files that were uploaded via the web, to not be added to the queue 2025-07-07 19:28:08 +00:00
perf3ct 1b984a12c2
fix(server): resolve type issues and functions for compilation issues 2025-07-04 00:53:32 +00:00
perf3ct bdf4f5f8fe
feat(ocr): add even more about the multiple ocr languages 2025-07-03 19:20:19 +00:00
perf3ct 8ed8701d5b
feat(server): implement DEBUG environment variable 2025-07-02 17:57:57 +00:00
Jon Fuller a88f387aeb
Merge branch 'main' into feat/multiple-ocr-languages 2025-07-01 11:53:42 -07:00
perf3ct f7018575d8
feat(pdf): implement ocrmypdf to extract text from PDFs 2025-07-01 00:56:48 +00:00
perf3ct f26ab1e367
fix(pdf): resolve PDF wordcount error 2025-07-01 00:10:49 +00:00
perf3ct dd90e48fd2
feat(server): mark documents with 0 words as failed, and fix webdav unit tests 2025-06-30 22:43:25 +00:00
perf3ct 5f10a8b82c
feat(server): continue to try to wrangle the failed and ignored documents 2025-06-29 23:27:51 +00:00
perf3ct 8d1a886139
fix(tests): resolve compilation error in the multiple OCR functionality 2025-06-29 23:21:42 +00:00
perf3ct e0b0f49ba2
feat(tests): implement and update tests for multiple OCR languages 2025-06-29 23:03:37 +00:00
perf3ct b4ddf034b0
feat(server/client): support multiple OCR languages 2025-06-29 22:51:06 +00:00
perf3ct 34bc207e39
feat(server/client): add failed_documents table to handle failures, and move logic of failures 2025-06-28 20:52:58 +00:00
perfectra1n 7f69cd2e5f fix(server/client): fix incorrect OCR measurements 2025-06-27 20:23:59 -07:00
perf3ct cdad6477ed
feat(server): reorganize components into their own modules and fix imports 2025-06-27 18:27:42 +00:00