Commit Graph

975 Commits

Author SHA1 Message Date
perf3ct 2a99a32819
feat(docs): also update docs for updated db env vars 2025-09-24 19:56:05 +00:00
perf3ct b435437ad3
feat(server): also support individual DB environment variables instead of DATABASE_URL 2025-09-24 19:53:20 +00:00
perf3ct 32262db3dd
feat(ci): resolve issues in stress tests 2025-09-18 21:07:01 +00:00
perf3ct e43e7af372
feat(ci): also manage the longer filenames 2025-09-17 21:21:47 +00:00
perf3ct c09fea4deb
feat(ci): try again to get dufs working in ci... 2025-09-17 20:58:46 +00:00
perf3ct ff2a3c93df
feat(ci): try to get dufs to work again... 2025-09-17 17:36:21 +00:00
perf3ct ac50fcc4c8
feat(ci): try again to get dufs to work? 2025-09-17 16:09:59 +00:00
perf3ct dad97849c4
feat(ci): let's try this for the webdav stress tests 2025-09-17 14:29:25 +00:00
perf3ct aa5bd77753
feat(webdav): get rid of complex loop detection 2025-09-09 02:11:57 +00:00
perf3ct 88c376f655
feat(webdav): add some stress test utilities 2025-09-09 01:38:36 +00:00
perf3ct 7863b9100f
feat(ocr): no longer add explicit section / page break 2025-09-05 00:06:09 +00:00
perf3ct 07602a0096
feat(ci): try to further decrease disk usage 2025-09-03 20:15:14 +00:00
perf3ct 1df0f07d71
Merge branch 'main' of https://github.com/readur/readur 2025-09-02 23:04:38 +00:00
perf3ct 036941b3dc
feat(dev): trigger re-ocr on doc and docx 2025-09-02 23:04:31 +00:00
Jon Fuller a59fb45ec0
Merge pull request #204 from readur/renovate/tokio-tracing-monorepo
fix(deps): update rust crate tracing-subscriber to v0.3.20
2025-09-02 15:54:48 -07:00
Jon Fuller 2601ef41aa
Merge pull request #205 from readur/renovate/uuid-1.x-lockfile
fix(deps): update rust crate uuid to v1.18.1
2025-09-02 15:54:35 -07:00
Jon Fuller 1e1a63419c
Merge pull request #203 from readur/renovate/clap-4.x-lockfile
fix(deps): update rust crate clap to v4.5.47
2025-09-02 15:54:20 -07:00
perf3ct ef36fc1b6c
feat(tests): add tests for doc and docx 2025-09-02 22:52:47 +00:00
perf3ct 43b679f59b
fix(server): resolve compilation warnings and fix test that expects no pass, to have it actually expect pass 2025-09-02 22:51:17 +00:00
Jon Fuller 1b7fbed90d
Merge pull request #197 from readur/fix/doc-and-docx-utf-issues
feat(office): try to resolve docx/doc not working
2025-09-02 15:05:29 -07:00
perf3ct 7cf1fd623c
feat(ci): try to prepull containers 2025-09-02 22:05:02 +00:00
perf3ct 90be003874
feat(db): add more guardrails for null bytes 2025-09-02 21:26:03 +00:00
renovate[bot] 7e55db6557
fix(deps): update rust crate uuid to v1.18.1 2025-09-02 21:23:36 +00:00
renovate[bot] 2e703146cf
fix(deps): update rust crate tracing-subscriber to v0.3.20 2025-09-02 21:23:28 +00:00
perf3ct 11ffe9d0e5
feat(ci): add dockerhub auth 2025-09-02 21:21:37 +00:00
perf3ct 483d89132f
feat(office): add documentation around using antiword/catdoc for `doc` functionality 2025-09-02 20:29:17 +00:00
renovate[bot] a8a8563adc
fix(deps): update rust crate clap to v4.5.47 2025-09-02 20:09:22 +00:00
Jon Fuller 1859af68e1
Merge pull request #192 from readur/renovate/playwright-monorepo
chore(deps): update dependency @playwright/test to v1.55.0
2025-09-02 13:07:36 -07:00
Jon Fuller 7f9f2ffe15
Merge pull request #199 from readur/renovate/vitejs-plugin-react-5.x-lockfile
chore(deps): update dependency @vitejs/plugin-react to v5.0.2
2025-09-02 13:07:27 -07:00
Jon Fuller 5eb1e2306a
Merge pull request #200 from readur/renovate/vite-7.x-lockfile
chore(deps): update dependency vite to v7.1.4
2025-09-02 13:07:16 -07:00
Jon Fuller d561c12993
Merge pull request #201 from readur/renovate/aws-sdk-rust-monorepo
fix(deps): update aws-sdk-rust monorepo
2025-09-02 13:06:41 -07:00
renovate[bot] cd68921a14
fix(deps): update aws-sdk-rust monorepo 2025-09-02 05:08:37 +00:00
perf3ct 149c3b9a3f
feat(office): yeet unused fallback strategy 2025-09-02 03:47:20 +00:00
perf3ct d5d6d2edb4
feat(office): xml extraction seems to work now 2025-09-02 01:22:19 +00:00
perf3ct 774efd1140
refactor(server): remove XML vs library comparison functionality
Remove all comparison-related code used to evaluate XML vs library-based
Office document extraction. The XML approach has proven superior, so the
comparison functionality is no longer needed.

Changes:
- Remove extraction_comparator.rs (entire comparison engine)
- Remove test_extraction_comparison.rs binary
- Remove comparison mode logic from enhanced.rs
- Simplify fallback_strategy.rs to use XML extraction only
- Update OCR service to use XML extraction as primary method
- Clean up database migration to remove comparison-specific settings
- Remove test_extraction binary from Cargo.toml
- Update integration tests to work with simplified extraction

The Office document extraction now flows directly to XML-based
extraction
without any comparison checks, maintaining the superior extraction
quality
while removing unnecessary complexity.
2025-09-02 01:22:19 +00:00
renovate[bot] 56c6b3bef9
chore(deps): update dependency vite to v7.1.4 2025-09-02 01:06:32 +00:00
renovate[bot] afd38974b3
chore(deps): update dependency @vitejs/plugin-react to v5.0.2 2025-09-02 01:06:24 +00:00
Jon Fuller 2ce8c143af
Merge pull request #193 from readur/renovate/testing-library-monorepo
chore(deps): update dependency @testing-library/jest-dom to v6.8.0
2025-09-01 18:05:45 -07:00
Jon Fuller 220eafee5d
Merge pull request #194 from readur/renovate/tempfile-3.x-lockfile
chore(deps): update rust crate tempfile to v3.21.0
2025-09-01 18:05:36 -07:00
Jon Fuller a3697a12be
Merge pull request #195 from readur/renovate/react-monorepo
chore(deps): update react monorepo to v19.1.12
2025-09-01 18:05:26 -07:00
perf3ct 73525eca02
feat(office): add library-based and xml-based parsing 2025-09-02 00:25:06 +00:00
perf3ct 57a5d2ab15
feat(office): add xml parsing 2025-09-01 22:32:42 +00:00
perf3ct 325731aa04
feat(office): create legitimate office files for testing 2025-09-01 22:07:59 +00:00
perf3ct b8bf7c9585
feat(office): use catdoc and antiword to convert doc 2025-09-01 21:49:30 +00:00
perf3ct 78af7e7861
feat(office): use actual packages for extraction 2025-09-01 21:21:22 +00:00
perf3ct 546b41b462
feat(office): try to resolve docx/doc not working 2025-09-01 19:58:06 +00:00
perf3ct 4dbd1aa5d6
fix(errors): resolve the sql casting, and introduce unit test to prevent this from happening again 2025-09-01 18:15:52 +00:00
perf3ct 10d461aeac
fix(errors): resolve issues with error handling 2025-09-01 18:01:36 +00:00
renovate[bot] be8178ceb2
chore(deps): update dependency @testing-library/jest-dom to v6.8.0 2025-08-31 10:20:37 +00:00
renovate[bot] cb4a50ed63
chore(deps): update dependency @playwright/test to v1.55.0 2025-08-31 10:20:31 +00:00