Commit Graph

317 Commits

Author SHA1 Message Date
perf3ct 90be003874
feat(db): add more guardrails for null bytes 2025-09-02 21:26:03 +00:00
perf3ct 483d89132f
feat(office): add documentation around using antiword/catdoc for `doc` functionality 2025-09-02 20:29:17 +00:00
perf3ct 149c3b9a3f
feat(office): yeet unused fallback strategy 2025-09-02 03:47:20 +00:00
perf3ct d5d6d2edb4
feat(office): xml extraction seems to work now 2025-09-02 01:22:19 +00:00
perf3ct 774efd1140
refactor(server): remove XML vs library comparison functionality
Remove all comparison-related code used to evaluate XML vs library-based
Office document extraction. The XML approach has proven superior, so the
comparison functionality is no longer needed.

Changes:
- Remove extraction_comparator.rs (entire comparison engine)
- Remove test_extraction_comparison.rs binary
- Remove comparison mode logic from enhanced.rs
- Simplify fallback_strategy.rs to use XML extraction only
- Update OCR service to use XML extraction as primary method
- Clean up database migration to remove comparison-specific settings
- Remove test_extraction binary from Cargo.toml
- Update integration tests to work with simplified extraction

The Office document extraction now flows directly to XML-based
extraction
without any comparison checks, maintaining the superior extraction
quality
while removing unnecessary complexity.
2025-09-02 01:22:19 +00:00
perf3ct 73525eca02
feat(office): add library-based and xml-based parsing 2025-09-02 00:25:06 +00:00
perf3ct 57a5d2ab15
feat(office): add xml parsing 2025-09-01 22:32:42 +00:00
perf3ct b8bf7c9585
feat(office): use catdoc and antiword to convert doc 2025-09-01 21:49:30 +00:00
perf3ct 78af7e7861
feat(office): use actual packages for extraction 2025-09-01 21:21:22 +00:00
perf3ct 546b41b462
feat(office): try to resolve docx/doc not working 2025-09-01 19:58:06 +00:00
perf3ct 4dbd1aa5d6
fix(errors): resolve the sql casting, and introduce unit test to prevent this from happening again 2025-09-01 18:15:52 +00:00
perf3ct 10d461aeac
fix(errors): resolve issues with error handling 2025-09-01 18:01:36 +00:00
perf3ct f6eb7ba49f
feat(metrics): try to simplify webdav metrics some 2025-08-23 22:17:40 +00:00
perf3ct 4b5ee94724
fix(metrics): casting is the name of the game 2025-08-23 20:31:56 +00:00
perf3ct 1b4573f658
feat(webdav): resolve failing migration tests, and implement better error handling 2025-08-23 18:52:52 +00:00
perf3ct 00795ace02
feat(webdav): fix all the wonderful compilation issues 2025-08-21 05:07:28 +00:00
perf3ct 18832b9c12
feat(webdav): fix all the wonderful compilation issues 2025-08-21 04:29:36 +00:00
perf3ct b7dd64c8f6
feat(webdav): try to do better webdav errors to not slam webdav endpoints 2025-08-20 21:59:14 +00:00
perf3ct d793509af9
feat(source): update names of sourceerror, and update tests 2025-08-17 22:37:41 +00:00
perf3ct 6a64d9e6ed
feat(source): implement generic "SourceError" and then have it be propagated as "WebDAVerror", etc. 2025-08-17 22:05:58 +00:00
perf3ct cddba50799
feat(webdav): webdav error management and tests 2025-08-17 20:16:46 +00:00
perf3ct 93c2863d01
feat(webdav): support capturing individual directory errors in webdav 2025-08-14 16:24:05 +00:00
perf3ct 67ae68745c
fix(dev): remove unneeded docs 2025-08-13 20:51:13 +00:00
perf3ct caf4e7cf7d
feat(docs): update docs for S3 backend implemenation 2025-08-13 20:24:59 +00:00
perf3ct 4b6e0820b7
feat(websocket): update websockets and websocket tests so that they actually pass 2025-08-11 20:08:36 +00:00
perf3ct 0fb250e28c
feat(security): this was just pain 2025-08-11 01:13:29 +00:00
perf3ct 080263a9ac
fix(tests): resolve issues with s3 tests 2025-08-11 00:54:09 +00:00
perf3ct cb3b3f05b8
fix(tests): migrate auto resume tests to use new test app state management 2025-08-02 18:29:12 +00:00
perf3ct 4396ce312b
feat(tests): completely redo the test_helpers to actually be helpers and not hinderers 2025-08-01 21:21:15 +00:00
perf3ct aad6036b4c
feat(tests): bring back the test helpers 2025-08-01 20:34:42 +00:00
perf3ct 862c36aa72
feat(storage): further support the s3 storage backend 2025-08-01 17:57:09 +00:00
perf3ct 6624fc57fb
feat(queue): have the queue service use new storage service correctly 2025-08-01 17:33:59 +00:00
perf3ct abd55ef419
feat(storage): abstract storage to also support s3, along with local filesystem still 2025-08-01 04:33:08 +00:00
perf3ct 3ad0dd3600
feat(server): set up new storage service 2025-08-01 02:50:38 +00:00
perf3ct 68ceb1f9cb
feat(storage): implement s3 for storage 2025-08-01 00:27:13 +00:00
perfectra1n 0fbb106668 fix(tests): resolve typescript compilation and test compilation errors 2025-07-30 20:03:13 -07:00
perf3ct 32983c3fba feat(server): implement #106 for per-user watch directories 2025-07-31 00:10:10 +00:00
perf3ct e62e73a249 fix(tests): have the updated integration tests at least compile 2025-07-30 04:21:16 +00:00
perf3ct 7da99cd992 feat(server): implement websockets over sse 2025-07-30 02:04:44 +00:00
perf3ct d7a0a1f294 feat(server): do a *much* better job at determining file types thanks to infer rust package 2025-07-29 21:28:33 +00:00
perf3ct 96ea060450 fix(sync): fix issue where the webdav tasks weren't inserting source metadata into db 2025-07-29 21:27:40 +00:00
perf3ct e938ae3bd1 feat(debug): add some really...really noisy debugging for WebDAV URL paths
fadsf
2025-07-29 04:10:09 +00:00
perf3ct 8f1f502cc4 feat(tests): mom, take a picture, the tests pass 2025-07-29 02:28:39 +00:00
perf3ct b3e6630bd1 feat(tests): tests are mostly working now 2025-07-29 00:47:02 +00:00
perf3ct cfeb6c5c93 feat(tests): wrap the tests so that even if they fail, they still close their db connections 2025-07-28 18:15:08 +00:00
perf3ct c37014f924 feat(tests): work on resolving tests that don't pass given the large rewrite 2025-07-28 04:13:14 +00:00
perf3ct 319c1521c1 fix(labels): allow for nullable user_id on label fetch 2025-07-27 20:42:55 +00:00
perf3ct 24269ea513 feat(tests): resolve duplicated test coverage for webdav functionality 2025-07-27 20:36:54 +00:00
perf3ct 023d424293 feat(server/client): I have no words, hopefully this lesser abstraction and webdav tracking works now 2025-07-27 19:29:45 +00:00
perf3ct 2c0ef814d9 feat(client/server): implement better source sync output 2025-07-27 05:02:13 +00:00