Commit Graph

35 Commits

Author SHA1 Message Date
perf3ct 67ae68745c
fix(dev): remove unneeded docs 2025-08-13 20:51:13 +00:00
perf3ct 862c36aa72
feat(storage): further support the s3 storage backend 2025-08-01 17:57:09 +00:00
perf3ct abd55ef419
feat(storage): abstract storage to also support s3, along with local filesystem still 2025-08-01 04:33:08 +00:00
perf3ct 65f42c2cd7 fix(ocr): use proper failure reasons to avoid constraint violations in failed_documents table 2025-07-21 20:43:37 +00:00
perf3ct 45ec99a031 feat(ocr): get rid of managing TESSDATA_PREFIX 2025-07-20 02:23:06 +00:00
perf3ct ccc3bc2ce4 feat(ocr): use ocrmypdf and pdftotext to get OCR layer if it already exists 2025-07-15 15:59:29 +00:00
perf3ct a3f33140ee feat(dev): drop pdf_extract in favor of ocrmypdf 2025-07-15 14:50:17 +00:00
perf3ct 862eb3217a fix(tests): resolve issues in integration tests for the new multiple ocr languages 2025-07-14 21:28:55 +00:00
perf3ct 7317fd5ebb Merge branch 'feat/multiple-ocr-languages' of https://github.com/readur/readur into feat/multiple-ocr-languages 2025-07-14 19:33:51 +00:00
perf3ct 849c9f91c7 feat(lang): update backend to support multiple languages at the same time during OCR 2025-07-14 19:33:43 +00:00
Jon Fuller f0e39d155e Merge branch 'main' into feat/multiple-ocr-languages 2025-07-14 11:29:46 -07:00
perf3ct 6165148e4d feat(ocr): gracefully handle problematic PDFs in all the ways, create tests so that it doesn't happen again 2025-07-14 16:36:32 +00:00
perf3ct e6fd8424d2 fix(dev): merge main into feature 2025-07-13 17:15:59 +00:00
perf3ct b31e1a672d feat(server): gracefully manage requeue requests for the same document 2025-07-11 21:27:12 +00:00
perf3ct f2a050458b fix(stats): create new get_queue_statistics function to avoid conflicts 2025-07-09 00:27:43 +00:00
perf3ct a6f2b6df09 fix(stats): try to fix the stats extraction, again 2025-07-08 21:18:21 +00:00
perf3ct e628b0d4d5 fix(server): resolve incorrect document failure titles 2025-07-08 20:24:52 +00:00
perf3ct a7e9f75eab fix(stats): try to fix stats export, again 2025-07-08 20:03:55 +00:00
perf3ct 03555ed756 fix(tests): fix the crazy metrics collection issue 2025-07-08 16:52:23 +00:00
perf3ct 58b8a71404 fix(tests): and resolve missing endpoint 2025-07-08 04:37:33 +00:00
perf3ct a4b9626616 fix(web_upload): resolve issue that caused files that were uploaded via the web, to not be added to the queue 2025-07-07 19:28:08 +00:00
perf3ct 497b34ce0a fix(server): resolve type issues and functions for compilation issues 2025-07-04 00:53:32 +00:00
perf3ct 44aaaca5c5 feat(ocr): add even more about the multiple ocr languages 2025-07-03 19:20:19 +00:00
perf3ct 6bdd6f4a56 feat(server): implement DEBUG environment variable 2025-07-02 17:57:57 +00:00
Jon Fuller 2e1a05fc8d Merge branch 'main' into feat/multiple-ocr-languages 2025-07-01 11:53:42 -07:00
perf3ct df281f3b26 feat(pdf): implement ocrmypdf to extract text from PDFs 2025-07-01 00:56:48 +00:00
perf3ct 0052032772 fix(pdf): resolve PDF wordcount error 2025-07-01 00:10:49 +00:00
perf3ct 830f9d0b38 feat(server): mark documents with 0 words as failed, and fix webdav unit tests 2025-06-30 22:43:25 +00:00
perf3ct fef28a33c6 feat(server): continue to try to wrangle the failed and ignored documents 2025-06-29 23:27:51 +00:00
perf3ct 87cfab9ff8 fix(tests): resolve compilation error in the multiple OCR functionality 2025-06-29 23:21:42 +00:00
perf3ct 197afc19f4 feat(tests): implement and update tests for multiple OCR languages 2025-06-29 23:03:37 +00:00
perf3ct 6b6890d529 feat(server/client): support multiple OCR languages 2025-06-29 22:51:06 +00:00
perf3ct 84577806ef feat(server/client): add failed_documents table to handle failures, and move logic of failures 2025-06-28 20:52:58 +00:00
perfectra1n 582617ab88 fix(server/client): fix incorrect OCR measurements 2025-06-27 20:23:59 -07:00
perf3ct 9a8bf72ff7 feat(server): reorganize components into their own modules and fix imports 2025-06-27 18:27:42 +00:00