Commit Graph

47 Commits

Author SHA1 Message Date
perf3ct 9d9488954c
feat(lang): update backend to support multiple languages at the same time during OCR 2025-07-14 19:33:43 +00:00
perf3ct 5671038bd1
fix(dev): merge main into feature 2025-07-13 17:15:59 +00:00
perf3ct db41b15609
feat(client): show more fields for Documents 2025-07-10 21:02:15 +00:00
perf3ct c90c7201e5
feat(server/client): make sure that the documents endpoint isn't broken 2025-07-10 19:57:25 +00:00
perf3ct ea94dff8ba
fix(stats): create new get_queue_statistics function to avoid conflicts 2025-07-09 00:27:43 +00:00
perf3ct 2a68c0a066
feat(server): implement better error checking for sources 2025-07-07 19:10:45 +00:00
perf3ct 6bbf69842c
fix(server): resolve import issues 2025-07-03 23:58:11 +00:00
perf3ct 6547130fb1
feat(dev): also break up the large webdav_service.rs file into smaller ones 2025-07-03 19:57:31 +00:00
perf3ct 2b7d901b9d
feat(dev): break up the large documents.rs file 2025-07-03 19:47:31 +00:00
perf3ct 720351122a
feat(webdav): add validation statuses to sources 2025-07-03 14:03:26 +00:00
perf3ct c0835f436f
feat(webdav): also add some crazy source automatic validation 2025-07-03 05:26:36 +00:00
perf3ct 42c7edd9df
feat(webdav): gracefully recover webdav from stops/crashes 2025-07-03 04:45:25 +00:00
perf3ct 915fe92993
feat(webdav): also set up deep scanning button and fix unit tests 2025-07-03 04:24:26 +00:00
perf3ct 0ef233c3cc
fix(tests): resolve compilation error in tests and source scheduler 2025-07-02 23:49:46 +00:00
perf3ct 4f0497ba74
feat(tests): fix ocr_retry issues in tests 2025-07-02 21:48:01 +00:00
perf3ct dec4551fbd
feat(tests): fix ocr_retry issues in tests 2025-07-02 21:30:36 +00:00
perf3ct 8ed8701d5b
feat(server): implement DEBUG environment variable 2025-07-02 17:57:57 +00:00
perf3ct e4faf2cfd2
feat(server/client): implement retry functionality for both successful and failed documents 2025-07-02 00:06:47 +00:00
perf3ct fdc240fa5b
feat(webdav): track directory etags
✅ Core Optimizations Implemented

  1. 📊 New Database Schema: Added webdav_directories table to track
directory ETags, file counts, and metadata
  2. 🔍 Smart Directory Checking: Before deep scans, check directory
ETags with lightweight Depth: 0 PROPFIND requests
  3. ΓÜí Skip Unchanged Directories: If directory ETag matches, skip the
entire deep scan
  4. 🗂️ N-Depth Subdirectory Tracking: Recursively track all
subdirectories found during scans
  5. 🎯 Individual Subdirectory Checks: When parent unchanged, check
each known subdirectory individually

  🚀 Performance Benefits

  Before: Every sync = Full Depth: infinity scan of entire directory
treeAfter:
  - First sync: Full scan + directory tracking setup
  - Subsequent syncs: Quick ETag checks → skip unchanged directories
entirely
  - Changed directories: Only scan the specific changed subdirectories

  📁 How It Works

  1. Initial Request: PROPFIND Depth: 0 on /Documents → get directory
ETag
  2. Database Check: Compare with stored ETag for /Documents
  3. If Unchanged: Check each known subdirectory (/Documents/2024,
/Documents/Archive) individually
  4. If Changed: Full recursive scan + update all directory tracking
data
2025-07-01 21:22:16 +00:00
Jon Fuller a88f387aeb
Merge branch 'main' into feat/multiple-ocr-languages 2025-07-01 11:53:42 -07:00
perf3ct dd90e48fd2
feat(server): mark documents with 0 words as failed, and fix webdav unit tests 2025-06-30 22:43:25 +00:00
perf3ct bf073132a1
fix(tests): fix documents tests 2025-06-30 21:56:21 +00:00
perf3ct d9b695f0bd
feat(server/client): add metadata to file view 2025-06-30 19:13:16 +00:00
perf3ct 5f10a8b82c
feat(server): continue to try to wrangle the failed and ignored documents 2025-06-29 23:27:51 +00:00
perf3ct b4ddf034b0
feat(server/client): support multiple OCR languages 2025-06-29 22:51:06 +00:00
perf3ct 2b29032a42
fix(server): resolve compilation errors in constraint_validation.rs 2025-06-28 22:04:01 +00:00
perf3ct df254d59e3
feat(server/client): resolve failing tests 2025-06-28 21:21:05 +00:00
Jon Fuller a314f64ca9
Merge pull request #69 from readur/fix/ocr-confidence-1
fix(server/client): fix incorrect OCR measurements
2025-06-28 09:53:56 -07:00
perfectra1n 7f69cd2e5f fix(server/client): fix incorrect OCR measurements 2025-06-27 20:23:59 -07:00
perf3ct 7a623ca8d6
feat(server/client): easily undelete ignored files, if the user wishes to do so 2025-06-28 00:37:49 +00:00
perf3ct 6983469eff
fix(server): fix unclosed delimiter 2025-06-27 22:51:02 +00:00
Jon Fuller a8c6660450
Merge branch 'main' into feat/delete-low-confidence-documents 2025-06-27 15:17:50 -07:00
perf3ct a642eec3ce
feat(server/client): implement button deleting low confidence documents (e.g. documents that have no text) 2025-06-27 22:16:38 +00:00
perf3ct fc0324da80
feat(client/server): add a new badge for each source that shows the number of documents stored from each source 2025-06-27 21:32:50 +00:00
perf3ct 72708a05f3
feat(oidc): fix oidc, tests, and everything in between 2025-06-27 05:03:27 +00:00
perf3ct 10d9a1a661
feat(server): set up oidc system and migrations 2025-06-26 18:52:57 +00:00
perf3ct 920ad96f4d
fix(server): resolve compilation issues due to increased logging 2025-06-25 20:00:09 +00:00
perf3ct b428b40cbe
feat(server): implement better error for configuration issues 2025-06-25 19:37:16 +00:00
perf3ct afa0565634
feat(server/client): implement feature of ignoring already deleted files, and add failed OCR queue tests 2025-06-24 17:20:33 +00:00
aaldebs99 a5ebbd59bd feat(everything): Add document deletion 2025-06-20 03:49:16 +00:00
aaldebs99 3ecc82dfdf fix(frontend): label writing and fetching logic 2025-06-20 01:32:32 +00:00
perf3ct 309a61bcd4
feat(client): update failedOcr page for duplicates 2025-06-17 16:52:45 +00:00
perf3ct bdb136d615
feat(server/client): implement updated FailedOcrPage, duplicate management, and file hashing 2025-06-17 16:17:23 +00:00
perf3ct 75747016f0
feat(server): add hash for documents 2025-06-17 15:41:42 +00:00
perf3ct d7607923be
feat(server): create specific endpoint for fetching documents, fix client being served again 2025-06-17 04:05:57 +00:00
perf3ct 76529f83be
feat(client/server): implement a much better search 2025-06-17 02:41:16 +00:00
perf3ct e6ab56daa8
feat(server): break up large db.rs file into multiple files, and add more PDF guardrails 2025-06-17 00:25:21 +00:00