Readur

Commit Graph

Author	SHA1	Message	Date
perf3ct	036941b3dc	feat(dev): trigger re-ocr on doc and docx	2025-09-02 23:04:31 +00:00
perf3ct	483d89132f	feat(office): add documentation around using antiword/catdoc for `doc` functionality	2025-09-02 20:29:17 +00:00
perf3ct	774efd1140	refactor(server): remove XML vs library comparison functionality Remove all comparison-related code used to evaluate XML vs library-based Office document extraction. The XML approach has proven superior, so the comparison functionality is no longer needed. Changes: - Remove extraction_comparator.rs (entire comparison engine) - Remove test_extraction_comparison.rs binary - Remove comparison mode logic from enhanced.rs - Simplify fallback_strategy.rs to use XML extraction only - Update OCR service to use XML extraction as primary method - Clean up database migration to remove comparison-specific settings - Remove test_extraction binary from Cargo.toml - Update integration tests to work with simplified extraction The Office document extraction now flows directly to XML-based extraction without any comparison checks, maintaining the superior extraction quality while removing unnecessary complexity.	2025-09-02 01:22:19 +00:00
perf3ct	f6eb7ba49f	feat(metrics): try to simplify webdav metrics some	2025-08-23 22:17:40 +00:00
perf3ct	1b4573f658	feat(webdav): resolve failing migration tests, and implement better error handling	2025-08-23 18:52:52 +00:00
perf3ct	b7dd64c8f6	feat(webdav): try to do better webdav errors to not slam webdav endpoints	2025-08-20 21:59:14 +00:00
perf3ct	d323aa53c3	fix(migrations): also resolve issue in the new generic source scan failure migration	2025-08-18 15:59:22 +00:00
perf3ct	6a64d9e6ed	feat(source): implement generic "SourceError" and then have it be propagated as "WebDAVerror", etc.	2025-08-17 22:05:58 +00:00
perf3ct	93c2863d01	feat(webdav): support capturing individual directory errors in webdav	2025-08-14 16:24:05 +00:00
perf3ct	aff7b907c7	fix(db): backfill data for sources given missing counts	2025-07-29 21:27:54 +00:00
perf3ct	0d65cab4aa	fix(migrations): resolve issue with latest migration and multi-language support	2025-07-18 19:30:51 +00:00
perf3ct	686596481c	fix(migrations): resolve new broken migration for multiple ocr languages	2025-07-14 20:52:42 +00:00
perf3ct	849c9f91c7	feat(lang): update backend to support multiple languages at the same time during OCR	2025-07-14 19:33:43 +00:00
perf3ct	0465777890	feat(client): show more fields for Documents	2025-07-10 21:02:15 +00:00
perf3ct	f2a050458b	fix(stats): create new get_queue_statistics function to avoid conflicts	2025-07-09 00:27:43 +00:00
perf3ct	f0dc0669bd	debug(tests): add some debug lines to see why CI is upset	2025-07-08 22:32:32 +00:00
perf3ct	7fa95e5e17	fix(migrations): resolve PostgreSQL function type mismatch in get_ocr_queue_stats	2025-07-08 22:30:21 +00:00
perf3ct	8153f9a4cb	fix(migrations): resolve PostgreSQL function type mismatch in get_ocr_queue_stats	2025-07-08 22:04:24 +00:00
perf3ct	2a59651fb9	fix(stats): try to fix stats export, again again	2025-07-08 20:16:33 +00:00
perf3ct	03555ed756	fix(tests): fix the crazy metrics collection issue	2025-07-08 16:52:23 +00:00
perf3ct	459b8622bb	feat(webdav): also add some crazy source automatic validation	2025-07-03 05:26:36 +00:00
perf3ct	69c40c10fa	feat(webdav): gracefully recover webdav from stops/crashes	2025-07-03 04:45:25 +00:00
perf3ct	6d40feadb3	fix(server): resolve issues with the retry ocr tests	2025-07-02 22:47:51 +00:00
perf3ct	ab03b8d73d	fix(server): resolve ocr test functionality failing due to db trigger	2025-07-02 22:38:13 +00:00
perf3ct	ffad8c4561	feat(tests): fix ocr_retry issues in tests	2025-07-02 21:30:36 +00:00
perf3ct	d4b57d2ae0	feat(server/client): implement retry functionality for both successful and failed documents	2025-07-02 00:06:47 +00:00
perf3ct	92b21350db	feat(webdav): track directory etags Γ£à Core Optimizations Implemented 1. ≡ƒôè New Database Schema: Added webdav_directories table to track directory ETags, file counts, and metadata 2. ≡ƒöì Smart Directory Checking: Before deep scans, check directory ETags with lightweight Depth: 0 PROPFIND requests 3. ΓÜí Skip Unchanged Directories: If directory ETag matches, skip the entire deep scan 4. ≡ƒùé∩╕Å N-Depth Subdirectory Tracking: Recursively track all subdirectories found during scans 5. ≡ƒÄ» Individual Subdirectory Checks: When parent unchanged, check each known subdirectory individually ≡ƒÜÇ Performance Benefits Before: Every sync = Full Depth: infinity scan of entire directory treeAfter: - First sync: Full scan + directory tracking setup - Subsequent syncs: Quick ETag checks ΓåÆ skip unchanged directories entirely - Changed directories: Only scan the specific changed subdirectories ≡ƒôü How It Works 1. Initial Request: PROPFIND Depth: 0 on /Documents ΓåÆ get directory ETag 2. Database Check: Compare with stored ETag for /Documents 3. If Unchanged: Check each known subdirectory (/Documents/2024, /Documents/Archive) individually 4. If Changed: Full recursive scan + update all directory tracking data	2025-07-01 21:22:16 +00:00
perf3ct	9e43df2fbe	feat(server/client): add metadata to file view	2025-06-30 19:13:16 +00:00
perf3ct	97fa50c1b5	feat(server/client): resolve failing tests	2025-06-28 21:21:05 +00:00
perf3ct	84577806ef	feat(server/client): add failed_documents table to handle failures, and move logic of failures	2025-06-28 20:52:58 +00:00
perf3ct	2d04f0094a	fix(ocr_status): populate the ocr queue with pending jobs and add easy 'retry' button	2025-06-28 18:08:00 +00:00
perf3ct	9f3371e4f3	feat(migration): disable OCR consistency trigger for OCR confidence backfill	2025-06-28 17:23:35 +00:00
perf3ct	69425b2201	feat(migration): instead of hardcoded guessing, re-enter those documents into the queue	2025-06-28 14:53:45 +00:00
perf3ct	e995653d69	fix(migrations): resolve issue in migration for ocr confidence	2025-06-28 14:51:06 +00:00
perfectra1n	582617ab88	fix(server/client): fix incorrect OCR measurements	2025-06-27 20:23:59 -07:00
perf3ct	e9496b921e	feat(server): set up oidc system and migrations	2025-06-26 18:52:57 +00:00
perf3ct	a0e75d4619	feat(server/client): implement feature of ignoring already deleted files, and add failed OCR queue tests	2025-06-24 17:20:33 +00:00
perf3ct	a6121c2849	fix(migrations): fix comment referencing old migration name	2025-06-23 21:10:44 +00:00
perf3ct	5510765035	feat(migrations): resolve migrations names and remove legacy migrations code	2025-06-23 21:08:43 +00:00
perf3ct	67d1e0ee2f	feat(webdav): move etag parser to own function, create required migration	2025-06-23 19:39:39 +00:00
perf3ct	5dae03635a	feat(ocr_queue): fix completed_today count	2025-06-22 16:04:17 +00:00
aaldebs99	2058e5db8d	fix(db): more labels migrations	2025-06-19 21:28:13 +00:00
aaldebs99	889f00bc71	fix(migrations): de-dupe migrations and fix labels migrations	2025-06-19 19:47:29 +00:00
aaldebs99	95d52f477e	fix(db): add labels sql table	2025-06-19 18:58:00 +00:00
perf3ct	d055e9f350	feat(server/client): implement labels for documents	2025-06-18 16:12:42 +00:00
perf3ct	58aaedf4a6	feat(server): add hash for documents	2025-06-17 15:41:42 +00:00
perf3ct	fad6756c8c	feat(server): stop image preprocessing in OCR	2025-06-17 00:35:03 +00:00
perf3ct	801038a26e	feat(server): break up large db.rs file into multiple files, and add more PDF guardrails	2025-06-17 00:25:21 +00:00
perf3ct	c656a96d91	feat(server): create folders within 'upload' path to manage thumbnails, processed images, etc.	2025-06-16 21:24:46 +00:00
perf3ct	bf7ec25dc1	feat(server): create more DB guardrails, and lots of missing tests	2025-06-15 22:14:02 +00:00

1 2

66 Commits