perf3ct
c1dbd06df2
feat(tests): add unit tests for new webdav functionality
2025-07-01 21:39:31 +00:00
perf3ct
92b21350db
feat(webdav): track directory etags
...
✅ Core Optimizations Implemented
1. 📊 New Database Schema: Added webdav_directories table to track
directory ETags, file counts, and metadata
2. 🔍 Smart Directory Checking: Before deep scans, check directory
ETags with lightweight Depth: 0 PROPFIND requests
3. ΓÜí Skip Unchanged Directories: If directory ETag matches, skip the
entire deep scan
4. 🗂️ N-Depth Subdirectory Tracking: Recursively track all
subdirectories found during scans
5. 🎯 Individual Subdirectory Checks: When parent unchanged, check
each known subdirectory individually
🚀 Performance Benefits
Before: Every sync = Full Depth: infinity scan of entire directory
treeAfter:
- First sync: Full scan + directory tracking setup
- Subsequent syncs: Quick ETag checks → skip unchanged directories
entirely
- Changed directories: Only scan the specific changed subdirectories
📁 How It Works
1. Initial Request: PROPFIND Depth: 0 on /Documents → get directory
ETag
2. Database Check: Compare with stored ETag for /Documents
3. If Unchanged: Check each known subdirectory (/Documents/2024,
/Documents/Archive) individually
4. If Changed: Full recursive scan + update all directory tracking
data
2025-07-01 21:22:16 +00:00
perf3ct
6a23a407bf
feat(client): update swagger ui endpoints
2025-07-01 20:54:45 +00:00
Jon Fuller
2e1a05fc8d
Merge branch 'main' into feat/multiple-ocr-languages
2025-07-01 11:53:42 -07:00
perf3ct
df281f3b26
feat(pdf): implement ocrmypdf to extract text from PDFs
2025-07-01 00:56:48 +00:00
Jon Fuller
706e20f35c
Merge branch 'main' into feat/debug-page
2025-06-30 17:19:31 -07:00
perf3ct
231f88f038
feat(debug): debug page actually works and does something
2025-07-01 00:15:48 +00:00
perf3ct
0052032772
fix(pdf): resolve PDF wordcount error
2025-07-01 00:10:49 +00:00
perf3ct
830f9d0b38
feat(server): mark documents with 0 words as failed, and fix webdav unit tests
2025-06-30 22:43:25 +00:00
perf3ct
69279344cb
fix(tests): fix documents tests
2025-06-30 21:56:21 +00:00
perf3ct
b38c1fca07
feat(server): fix serialization issues
2025-06-30 19:40:05 +00:00
perf3ct
9e43df2fbe
feat(server/client): add metadata to file view
2025-06-30 19:13:16 +00:00
perf3ct
fef28a33c6
feat(server): continue to try to wrangle the failed and ignored documents
2025-06-29 23:27:51 +00:00
perf3ct
87cfab9ff8
fix(tests): resolve compilation error in the multiple OCR functionality
2025-06-29 23:21:42 +00:00
perf3ct
197afc19f4
feat(tests): implement and update tests for multiple OCR languages
2025-06-29 23:03:37 +00:00
perf3ct
6b6890d529
feat(server/client): support multiple OCR languages
2025-06-29 22:51:06 +00:00
perf3ct
fbf89c213d
fix(tests): resolve a whole lot of test issues
2025-06-28 22:50:40 +00:00
perf3ct
edd0c7514f
fix(server): resolve compilation errors in constraint_validation.rs
2025-06-28 22:04:01 +00:00
perf3ct
97fa50c1b5
feat(server/client): resolve failing tests
2025-06-28 21:21:05 +00:00
perf3ct
84577806ef
feat(server/client): add failed_documents table to handle failures, and move logic of failures
2025-06-28 20:52:58 +00:00
Jon Fuller
fce56b660b
Merge pull request #72 from readur/feat/better-db-tests
...
feat(tests): add regression tests and better sql type safety tests
2025-06-28 12:43:52 -07:00
perf3ct
f4adafe2bd
feat(tests): add regression tests and better sql type safety tests
2025-06-28 19:25:15 +00:00
perf3ct
9be70dc245
feat(swagger): add missing oidc endpoints into swagger ui
2025-06-28 19:19:48 +00:00
perf3ct
099f4853a7
fix(server): resolve incorrect db type
2025-06-28 18:41:48 +00:00
perf3ct
fe1deb1e9d
fix(server): resolve compilation issues from queue.rs
2025-06-28 18:15:55 +00:00
perf3ct
2d04f0094a
fix(ocr_status): populate the ocr queue with pending jobs and add easy 'retry' button
2025-06-28 18:08:00 +00:00
Jon Fuller
5aae560d7e
Merge pull request #69 from readur/fix/ocr-confidence-1
...
fix(server/client): fix incorrect OCR measurements
2025-06-28 09:53:56 -07:00
perf3ct
9079529eb5
feat(tests): create generic migration tests
2025-06-28 16:38:12 +00:00
perfectra1n
582617ab88
fix(server/client): fix incorrect OCR measurements
2025-06-27 20:23:59 -07:00
perf3ct
cc0d647590
fix(server): resolve compilation issue in IgnoredFilesQuery
2025-06-28 01:01:51 +00:00
perf3ct
0b8dbfb8d9
feat(server/client): easily undelete ignored files, if the user wishes to do so
2025-06-28 00:37:49 +00:00
perf3ct
2c6bd92bf4
fix(server): fix unclosed delimiter
2025-06-27 22:51:02 +00:00
Jon Fuller
929f27eaa9
Merge branch 'main' into feat/delete-low-confidence-documents
2025-06-27 15:17:50 -07:00
perf3ct
aacfc96825
feat(server/client): implement button deleting low confidence documents (e.g. documents that have no text)
2025-06-27 22:16:38 +00:00
perf3ct
a75fca0c28
feat(client/server): add a new badge for each source that shows the number of documents stored from each source
2025-06-27 21:32:50 +00:00
perf3ct
341a91e1a7
fix(tests): move oidc tests to correct folder
2025-06-27 19:33:58 +00:00
perf3ct
57bb0ccd2c
fix(server): resolve broken imports on tests and test helpers
2025-06-27 18:46:41 +00:00
perf3ct
9a8bf72ff7
feat(server): reorganize components into their own modules and fix imports
2025-06-27 18:27:42 +00:00
Jon Fuller
b095cb951f
Merge pull request #55 from readur/feat/oidc-setup
...
feat(server): set up oidc system and migrations
2025-06-27 10:48:28 -07:00
perf3ct
0b6d96df03
fix(tests): resolve last OIDC test issues
2025-06-27 17:32:33 +00:00
perf3ct
12cdd0ffd6
fix(tests): resolve some difficult race conditions in test
2025-06-27 05:08:12 +00:00
perf3ct
3c5b7c7dfb
feat(oidc): fix oidc, tests, and everything in between
2025-06-27 05:03:27 +00:00
perf3ct
51907f81f2
fix(metrics): fix broken prometheus metrics
2025-06-26 22:14:42 +00:00
Jon Fuller
269ba4d46a
Merge pull request #56 from readur/fix/pdf-thumbnail-generation
...
feat(server): actually render PDF thumbnails
2025-06-26 14:14:25 -07:00
perf3ct
e626f3a131
feat(metrics): add more prometheus metrics, and create grafana dashboard
2025-06-26 21:14:00 +00:00
perf3ct
075657899f
feat(server): use poppler for pdf image generation
2025-06-26 20:39:42 +00:00
perf3ct
a94acd7ffe
feat(server): actually render PDF thumbnails?
2025-06-26 20:25:52 +00:00
perf3ct
e9496b921e
feat(server): set up oidc system and migrations
2025-06-26 18:52:57 +00:00
Jon Fuller
70451d728f
Merge pull request #46 from readur/fix/catch-pdf-extract-errors
...
fix(server): catch pdf-extract spammy logs
2025-06-25 21:35:09 -07:00
perf3ct
715b94ec66
feat(swagger): add a ton of docstrings to functions
2025-06-25 23:58:37 +00:00
perf3ct
20b90e92d3
feat(swagger): add missing endpoints to swagger-ui
2025-06-25 23:47:27 +00:00
perf3ct
40afb5ade5
fix(server): catch pdf-extract spammy logs
2025-06-25 23:26:11 +00:00
perf3ct
a5ca6e33f2
feat(server): decrease logging verbosity for ingestion
2025-06-25 21:41:46 +00:00
perf3ct
00d771c15f
fix(server): resolve compilation issues due to increased logging
2025-06-25 20:00:09 +00:00
perf3ct
bcd03bf0d4
fix(server): don't log postgres passwords
2025-06-25 19:44:58 +00:00
perf3ct
04bf3500fa
feat(server): implement better error for configuration issues
2025-06-25 19:37:16 +00:00
perf3ct
05a1a07494
fix(server): also fix these broken user isolation SQL statements
2025-06-24 17:43:58 +00:00
perf3ct
363bc2b9ef
fix(server): better error responses when creating users
2025-06-24 17:33:59 +00:00
perf3ct
3f3654c3cb
fix(server): resolve lack of user isolation
2025-06-24 17:28:28 +00:00
perf3ct
a0e75d4619
feat(server/client): implement feature of ignoring already deleted files, and add failed OCR queue tests
2025-06-24 17:20:33 +00:00
perf3ct
5510765035
feat(migrations): resolve migrations names and remove legacy migrations code
2025-06-23 21:08:43 +00:00
perf3ct
67d1e0ee2f
feat(webdav): move etag parser to own function, create required migration
2025-06-23 19:39:39 +00:00
perf3ct
113f1d8315
fix(tests): fix broken parser, thanks for finding that, unit tests!
2025-06-23 19:14:31 +00:00
perf3ct
b9847b8b6b
feat(server): normalize etags from webdav to properly check for file changes
2025-06-23 19:03:24 +00:00
perf3ct
33ae814a43
fix(tests): also fix unit tests
2025-06-22 21:31:11 +00:00
perf3ct
1555b8bd4d
feat(tests): resolve admin integration test issues
2025-06-22 17:28:45 +00:00
perf3ct
4ec4ecaa8d
feat(ci): fix other tests, part 9000
2025-06-21 18:08:34 +00:00
perf3ct
679ad04274
fix(deletion): properly handle concurrent deletion requests
2025-06-20 18:40:24 +00:00
perf3ct
8ae976eda8
feat(tests): resolve failing and ignored tests
2025-06-20 18:37:52 +00:00
perf3ct
09b338685d
fix(tests): repair the label tests
2025-06-20 18:10:27 +00:00
perf3ct
2c2d948aa2
fix(documents): remove old code in favor of document ingestion engine
2025-06-20 17:18:00 +00:00
perf3ct
eec1072677
Merge branch 'main' into feat/document-deletion
2025-06-20 17:11:26 +00:00
perf3ct
c4a9c51b98
feat(ingestion): have everything use the document ingestion engine
2025-06-20 16:53:06 +00:00
perf3ct
ac069de5bc
feat(ingestion): create ingestion engine to handle document creation, and centralize deduplication logic
2025-06-20 16:24:26 +00:00
aaldebs99
e3c276226a
feat(tests): add deletion unit tests
2025-06-20 16:09:27 +00:00
aaldebs99
1507532083
feat(everything): Add document deletion
2025-06-20 03:49:16 +00:00
aaldebs99
b24bf2c7d9
Merge branch 'main' into feat/document-labels
2025-06-19 18:40:50 -07:00
aaldebs99
4dd9162415
fix(frontend): label writing and fetching logic
2025-06-20 01:32:32 +00:00
aaldebs99
aeb98acea8
fi(backend): migrate python code to rust lol
2025-06-20 01:32:05 +00:00
perf3ct
7f20e59aa6
feat(tests): resolve issue with 'source' tests
2025-06-19 20:29:35 +00:00
aaldebs99
bfb971adce
fix(backend): lables handling
2025-06-19 19:47:49 +00:00
aaldebs99
2d518b40df
fix(backend): labels
2025-06-19 18:58:00 +00:00
Jon Fuller
7873913759
Merge branch 'main' into feat/document-labels
2025-06-19 11:32:03 -07:00
aaldebs99
215704f881
fix(server): static file routes
2025-06-19 18:29:52 +00:00
Jon Fuller
4d0d9d16b6
Merge branch 'main' into feat/document-labels
2025-06-18 19:07:54 -07:00
aaldebs99
865c91db67
chore(server): remove unused system user
2025-06-19 00:41:01 +00:00
perf3ct
d055e9f350
feat(server/client): implement labels for documents
2025-06-18 16:12:42 +00:00
perf3ct
4f36e40e38
feat(tests): resolve last test issues
2025-06-17 22:14:38 +00:00
perf3ct
14af90c657
feat(tests): fix the vast majority of both server and client tests
2025-06-17 22:06:12 +00:00
perf3ct
f905c220e0
feat(tests): add actual images as part of e2e and testing
2025-06-17 21:26:39 +00:00
perf3ct
24e7dff9a5
feat(client): update failedOcr page for duplicates
2025-06-17 16:52:45 +00:00
perf3ct
80d58b0f28
feat(server/client): implement updated FailedOcrPage, duplicate management, and file hashing
2025-06-17 16:17:23 +00:00
perf3ct
58aaedf4a6
feat(server): add hash for documents
2025-06-17 15:41:42 +00:00
perf3ct
babe5a6e46
fix(ocr_queue): don't slam the DB while we wait
2025-06-17 14:45:44 +00:00
perf3ct
b2a7faaddb
feat(server): create specific endpoint for fetching documents, fix client being served again
2025-06-17 04:05:57 +00:00
perf3ct
7eb036b153
feat(client/server): create endpoint for fetching individual files, and fix client not serving files
2025-06-17 03:38:16 +00:00
perf3ct
479c62a4f1
feat(client/server): advanced search, along with fixing build errors
2025-06-17 02:56:59 +00:00
perf3ct
4dda4d143d
feat(client/server): implement a much better search
2025-06-17 02:41:16 +00:00
perf3ct
bcd756ed20
feat(server/client): remove webdav feature from user's settings as it's in sources now
2025-06-17 01:57:56 +00:00
perf3ct
fad6756c8c
feat(server): stop image preprocessing in OCR
2025-06-17 00:35:03 +00:00
perf3ct
801038a26e
feat(server): break up large db.rs file into multiple files, and add more PDF guardrails
2025-06-17 00:25:21 +00:00
perf3ct
54868cdc57
feat(server): try to resume syncs after server restart
2025-06-16 23:21:43 +00:00
perf3ct
0d3fe26074
feat(server): also generate thumbnails for non-images, and resolve failing unit/integration tests
2025-06-16 22:51:29 +00:00
perf3ct
c43994e63c
feat(server): put more guardrails around PDF OCR size, and image size OCR
2025-06-16 22:39:00 +00:00
perf3ct
13e60fa655
feat(server): if there's no sync even running, allow sync to be cancelled
2025-06-16 21:39:41 +00:00
perf3ct
c656a96d91
feat(server): create folders within 'upload' path to manage thumbnails, processed images, etc.
2025-06-16 21:24:46 +00:00
perf3ct
af7129da0a
feat(server/client): fix thumbnails and quick search
2025-06-16 17:40:53 +00:00
perf3ct
4aa4359064
feat(server/client): update function used to display singular documents
2025-06-16 17:10:55 +00:00
perf3ct
fe56ecdb00
feat(server): implement queue system for ocr process as well, to fight resource exhaustion
2025-06-16 01:20:13 +00:00
perf3ct
bf7ec25dc1
feat(server): create more DB guardrails, and lots of missing tests
2025-06-15 22:14:02 +00:00
perf3ct
5b88c92937
feat(server/client): add pagination in client, resolve race condition in server
2025-06-15 21:48:59 +00:00
perf3ct
b21f2684bc
feat(server/client): add lots of OCR tweaks
2025-06-15 21:24:06 +00:00
perf3ct
a39fc807fa
feat(server): fix the sync scheduler for sources
2025-06-15 18:05:56 +00:00
perf3ct
cebae12363
feat(client): also show settings for s3 and local sources in the client
2025-06-15 18:00:35 +00:00
perf3ct
e5aaf31fdd
feat(server/client): working s3 and local source types
2025-06-15 17:51:04 +00:00
perf3ct
11c68c3d9f
feat(client): also update sources page and the various buttons
2025-06-15 17:06:38 +00:00
perf3ct
5dfc6e29f7
feat(async): create dedicated pools + runtime isolation for OCR
2025-06-15 16:47:55 +00:00
perf3ct
8ba35aae90
feat(async): create dedicated threads for ocr_runtime
2025-06-15 16:38:27 +00:00
perf3ct
af97f05116
feat(server): upgrade WebDAV settings on Sources page
2025-06-15 16:31:58 +00:00
perf3ct
317590f9c3
feat(server): create 'sources' concept and move WebDAV settings page to it
2025-06-15 16:12:18 +00:00
perf3ct
8ebffe4aa3
feat(db): try to improve db queue and number of connections
2025-06-15 05:04:48 +00:00
perf3ct
a8baa671ec
feat(server): I feel like I'm going to have to come back and fix this later
2025-06-15 04:58:22 +00:00
perf3ct
7feec817d0
feat(server): fix recursively scanning the uploads folder, and the quick search bar
2025-06-15 04:37:49 +00:00
perf3ct
0ae562f4c3
feat(server): fix breaking changes in deps, take 2
2025-06-15 04:08:34 +00:00
perf3ct
cfc6c85261
feat(server): upgrade all versions and resolve breaking changes
2025-06-15 02:23:35 +00:00
perf3ct
d21e51436b
feat(server): rewrite nearly everything to be async/follow best practices
2025-06-15 02:06:17 +00:00
perf3ct
9e877e7aa1
feat(server): webdav download and ocr actually works
2025-06-15 01:12:01 +00:00
perf3ct
8fed8c753e
feat(server): fix migration not working
2025-06-14 22:57:43 +00:00
perf3ct
9fa45f8891
feat(server): implement better ocr failure and guardrails
2025-06-14 22:13:04 +00:00
perf3ct
aa45cd06e0
feat(server): webdav integration nearly done
2025-06-14 16:21:28 +00:00
perf3ct
5b67232266
feat(server): webdav integration nearly done
2025-06-14 16:14:41 +00:00
perf3ct
57c118c049
feat(server): implement notifications and webdav
2025-06-14 01:34:56 +00:00
perf3ct
f7874f4541
feat(docs): clean up docs and make dev ex easier with variables
2025-06-13 23:21:45 +00:00
perf3ct
63b322ac7a
feat(server): add role capability, and fix tests
2025-06-13 20:58:36 +00:00
perf3ct
afd01e6075
fix(server): at least the watch folder doesn't blow up now
2025-06-13 20:11:22 +00:00
perf3ct
c7a0c25c23
fix(server): update some integration tests and create 'system' user
2025-06-13 19:56:25 +00:00
perf3ct
e672613d50
feat(client): refreshing the page no longer returns a 404
2025-06-13 19:22:57 +00:00
perf3ct
e3f1855711
feat(client/server): add nextcloud/webdav capability, add integration tests
2025-06-13 17:09:05 +00:00
perf3ct
57c6b370d2
feat(client): convert frontend to ts
2025-06-13 16:16:23 +00:00
perf3ct
725105d62f
feat(migrations): add missing migrations and fix metrics endpoint
2025-06-13 15:55:30 +00:00
perf3ct
00b2bfe22c
feat(server/client): the /documents endpoint works again, and so does the watch folder...kinda
2025-06-13 15:53:19 +00:00
perf3ct
e6e2ba76f5
feat(ocr): fix ocr variables
2025-06-13 15:24:25 +00:00
perf3ct
cd35f877b1
feat(migrations): try to fix the migrations service
2025-06-13 15:14:13 +00:00
perf3ct
e1e949cf65
feat(migrations): try to fix the migrations service
2025-06-13 14:27:31 +00:00
perf3ct
e61db1036e
feat(migrations): try to fix the migrations service
2025-06-13 14:19:45 +00:00
perfectra1n
3dcf753ff3
feat(migrations): improve migrations and split large SQL statements into smaller ones
2025-06-12 22:27:04 -07:00
perfectra1n
16a0a6ce5c
feat(migrations): improve migrations and split large SQL statements into smaller ones
2025-06-12 22:16:54 -07:00
perfectra1n
d61b1c3f4b
feat(server): implement ocr enhanced service throughout
2025-06-12 22:12:50 -07:00
perfectra1n
d5f419ca18
feat(client/server): update search tests, and upgrade OCR
2025-06-12 22:02:26 -07:00
perfectra1n
1a1f886f04
feat(client/server): update search tests, and upgrade OCR
2025-06-12 22:00:14 -07:00