perf3ct
dd963d0ecc
feat(server): allow also completed documents to be retried
2025-07-02 18:15:41 +00:00
perf3ct
8ed8701d5b
feat(server): implement DEBUG environment variable
2025-07-02 17:57:57 +00:00
perf3ct
0f3cb12c7a
fix(server): resolve NUMERIC db type and f64 rust type
2025-07-02 02:26:11 +00:00
perf3ct
e4faf2cfd2
feat(server/client): implement retry functionality for both successful and failed documents
2025-07-02 00:06:47 +00:00
perf3ct
c242a84326
feat(webdav): also fix the parser to include directories, and add tests
2025-07-01 22:03:06 +00:00
perf3ct
590cad3197
feat(tests): add unit tests for new webdav functionality
2025-07-01 21:39:31 +00:00
perf3ct
fdc240fa5b
feat(webdav): track directory etags
...
✅ Core Optimizations Implemented
1. 📊 New Database Schema: Added webdav_directories table to track
directory ETags, file counts, and metadata
2. 🔍 Smart Directory Checking: Before deep scans, check directory
ETags with lightweight Depth: 0 PROPFIND requests
3. ΓÜí Skip Unchanged Directories: If directory ETag matches, skip the
entire deep scan
4. 🗂️ N-Depth Subdirectory Tracking: Recursively track all
subdirectories found during scans
5. 🎯 Individual Subdirectory Checks: When parent unchanged, check
each known subdirectory individually
🚀 Performance Benefits
Before: Every sync = Full Depth: infinity scan of entire directory
treeAfter:
- First sync: Full scan + directory tracking setup
- Subsequent syncs: Quick ETag checks → skip unchanged directories
entirely
- Changed directories: Only scan the specific changed subdirectories
📁 How It Works
1. Initial Request: PROPFIND Depth: 0 on /Documents → get directory
ETag
2. Database Check: Compare with stored ETag for /Documents
3. If Unchanged: Check each known subdirectory (/Documents/2024,
/Documents/Archive) individually
4. If Changed: Full recursive scan + update all directory tracking
data
2025-07-01 21:22:16 +00:00
perf3ct
a2ea446e16
feat(client): update swagger ui endpoints
2025-07-01 20:54:45 +00:00
Jon Fuller
a88f387aeb
Merge branch 'main' into feat/multiple-ocr-languages
2025-07-01 11:53:42 -07:00
perf3ct
f7018575d8
feat(pdf): implement ocrmypdf to extract text from PDFs
2025-07-01 00:56:48 +00:00
Jon Fuller
59e80a1b92
Merge branch 'main' into feat/debug-page
2025-06-30 17:19:31 -07:00
perf3ct
2a1eeeda50
feat(debug): debug page actually works and does something
2025-07-01 00:15:48 +00:00
perf3ct
f26ab1e367
fix(pdf): resolve PDF wordcount error
2025-07-01 00:10:49 +00:00
perf3ct
dd90e48fd2
feat(server): mark documents with 0 words as failed, and fix webdav unit tests
2025-06-30 22:43:25 +00:00
perf3ct
bf073132a1
fix(tests): fix documents tests
2025-06-30 21:56:21 +00:00
perf3ct
b344b69da7
feat(server): fix serialization issues
2025-06-30 19:40:05 +00:00
perf3ct
d9b695f0bd
feat(server/client): add metadata to file view
2025-06-30 19:13:16 +00:00
perf3ct
5f10a8b82c
feat(server): continue to try to wrangle the failed and ignored documents
2025-06-29 23:27:51 +00:00
perf3ct
8d1a886139
fix(tests): resolve compilation error in the multiple OCR functionality
2025-06-29 23:21:42 +00:00
perf3ct
e0b0f49ba2
feat(tests): implement and update tests for multiple OCR languages
2025-06-29 23:03:37 +00:00
perf3ct
b4ddf034b0
feat(server/client): support multiple OCR languages
2025-06-29 22:51:06 +00:00
perf3ct
28a7e4eb45
fix(tests): resolve a whole lot of test issues
2025-06-28 22:50:40 +00:00
perf3ct
2b29032a42
fix(server): resolve compilation errors in constraint_validation.rs
2025-06-28 22:04:01 +00:00
perf3ct
df254d59e3
feat(server/client): resolve failing tests
2025-06-28 21:21:05 +00:00
perf3ct
34bc207e39
feat(server/client): add failed_documents table to handle failures, and move logic of failures
2025-06-28 20:52:58 +00:00
Jon Fuller
881e4c5a8e
Merge pull request #72 from readur/feat/better-db-tests
...
feat(tests): add regression tests and better sql type safety tests
2025-06-28 12:43:52 -07:00
perf3ct
25d6ecce6c
feat(tests): add regression tests and better sql type safety tests
2025-06-28 19:25:15 +00:00
perf3ct
e2633d7162
feat(swagger): add missing oidc endpoints into swagger ui
2025-06-28 19:19:48 +00:00
perf3ct
a792d2e6fd
fix(server): resolve incorrect db type
2025-06-28 18:41:48 +00:00
perf3ct
d18daa2c17
fix(server): resolve compilation issues from queue.rs
2025-06-28 18:15:55 +00:00
perf3ct
6dd580fa2f
fix(ocr_status): populate the ocr queue with pending jobs and add easy 'retry' button
2025-06-28 18:08:00 +00:00
Jon Fuller
a314f64ca9
Merge pull request #69 from readur/fix/ocr-confidence-1
...
fix(server/client): fix incorrect OCR measurements
2025-06-28 09:53:56 -07:00
perf3ct
7247f74456
feat(tests): create generic migration tests
2025-06-28 16:38:12 +00:00
perfectra1n
7f69cd2e5f
fix(server/client): fix incorrect OCR measurements
2025-06-27 20:23:59 -07:00
perf3ct
09a36bd0fb
fix(server): resolve compilation issue in IgnoredFilesQuery
2025-06-28 01:01:51 +00:00
perf3ct
7a623ca8d6
feat(server/client): easily undelete ignored files, if the user wishes to do so
2025-06-28 00:37:49 +00:00
perf3ct
6983469eff
fix(server): fix unclosed delimiter
2025-06-27 22:51:02 +00:00
Jon Fuller
a8c6660450
Merge branch 'main' into feat/delete-low-confidence-documents
2025-06-27 15:17:50 -07:00
perf3ct
a642eec3ce
feat(server/client): implement button deleting low confidence documents (e.g. documents that have no text)
2025-06-27 22:16:38 +00:00
perf3ct
fc0324da80
feat(client/server): add a new badge for each source that shows the number of documents stored from each source
2025-06-27 21:32:50 +00:00
perf3ct
9073006004
fix(tests): move oidc tests to correct folder
2025-06-27 19:33:58 +00:00
perf3ct
60dee61f16
fix(server): resolve broken imports on tests and test helpers
2025-06-27 18:46:41 +00:00
perf3ct
cdad6477ed
feat(server): reorganize components into their own modules and fix imports
2025-06-27 18:27:42 +00:00
Jon Fuller
c8f4886fc6
Merge pull request #55 from readur/feat/oidc-setup
...
feat(server): set up oidc system and migrations
2025-06-27 10:48:28 -07:00
perf3ct
af3aab3cda
fix(tests): resolve last OIDC test issues
2025-06-27 17:32:33 +00:00
perf3ct
98479325e3
fix(tests): resolve some difficult race conditions in test
2025-06-27 05:08:12 +00:00
perf3ct
72708a05f3
feat(oidc): fix oidc, tests, and everything in between
2025-06-27 05:03:27 +00:00
perf3ct
2c39f96dcf
fix(metrics): fix broken prometheus metrics
2025-06-26 22:14:42 +00:00
Jon Fuller
1c02dd480e
Merge pull request #56 from readur/fix/pdf-thumbnail-generation
...
feat(server): actually render PDF thumbnails
2025-06-26 14:14:25 -07:00
perf3ct
daac9599b8
feat(metrics): add more prometheus metrics, and create grafana dashboard
2025-06-26 21:14:00 +00:00
perf3ct
d278d50f3a
feat(server): use poppler for pdf image generation
2025-06-26 20:39:42 +00:00
perf3ct
d7aca60733
feat(server): actually render PDF thumbnails?
2025-06-26 20:25:52 +00:00
perf3ct
10d9a1a661
feat(server): set up oidc system and migrations
2025-06-26 18:52:57 +00:00
Jon Fuller
bae748f3df
Merge pull request #46 from readur/fix/catch-pdf-extract-errors
...
fix(server): catch pdf-extract spammy logs
2025-06-25 21:35:09 -07:00
perf3ct
6f4c4dae8b
feat(swagger): add a ton of docstrings to functions
2025-06-25 23:58:37 +00:00
perf3ct
fce0490196
feat(swagger): add missing endpoints to swagger-ui
2025-06-25 23:47:27 +00:00
perf3ct
d4d8ea625b
fix(server): catch pdf-extract spammy logs
2025-06-25 23:26:11 +00:00
perf3ct
f148d0827e
feat(server): decrease logging verbosity for ingestion
2025-06-25 21:41:46 +00:00
perf3ct
920ad96f4d
fix(server): resolve compilation issues due to increased logging
2025-06-25 20:00:09 +00:00
perf3ct
8ce911dc88
fix(server): don't log postgres passwords
2025-06-25 19:44:58 +00:00
perf3ct
b428b40cbe
feat(server): implement better error for configuration issues
2025-06-25 19:37:16 +00:00
perf3ct
6c2f16e666
fix(server): also fix these broken user isolation SQL statements
2025-06-24 17:43:58 +00:00
perf3ct
0bb6d4d4df
fix(server): better error responses when creating users
2025-06-24 17:33:59 +00:00
perf3ct
4ce21ef931
fix(server): resolve lack of user isolation
2025-06-24 17:28:28 +00:00
perf3ct
afa0565634
feat(server/client): implement feature of ignoring already deleted files, and add failed OCR queue tests
2025-06-24 17:20:33 +00:00
perf3ct
67191c95b7
feat(migrations): resolve migrations names and remove legacy migrations code
2025-06-23 21:08:43 +00:00
perf3ct
de45300c7a
feat(webdav): move etag parser to own function, create required migration
2025-06-23 19:39:39 +00:00
perf3ct
c747d0abc8
fix(tests): fix broken parser, thanks for finding that, unit tests!
2025-06-23 19:14:31 +00:00
perf3ct
472106a0f6
feat(server): normalize etags from webdav to properly check for file changes
2025-06-23 19:03:24 +00:00
perf3ct
6834e11542
fix(tests): also fix unit tests
2025-06-22 21:31:11 +00:00
perf3ct
e78a353751
feat(tests): resolve admin integration test issues
2025-06-22 17:28:45 +00:00
perf3ct
1cb5341c4e
feat(ci): fix other tests, part 9000
2025-06-21 18:08:34 +00:00
perf3ct
74c9c87906
fix(deletion): properly handle concurrent deletion requests
2025-06-20 18:40:24 +00:00
perf3ct
ecf5a0ea50
feat(tests): resolve failing and ignored tests
2025-06-20 18:37:52 +00:00
perf3ct
c1b3832ad1
fix(tests): repair the label tests
2025-06-20 18:10:27 +00:00
perf3ct
7b5914f972
fix(documents): remove old code in favor of document ingestion engine
2025-06-20 17:18:00 +00:00
perf3ct
5aaf90ba20
Merge branch 'main' into feat/document-deletion
2025-06-20 17:11:26 +00:00
perf3ct
a58c3abefc
feat(ingestion): have everything use the document ingestion engine
2025-06-20 16:53:06 +00:00
perf3ct
df8eeba2c2
feat(ingestion): create ingestion engine to handle document creation, and centralize deduplication logic
2025-06-20 16:24:26 +00:00
aaldebs99
597707f870
feat(tests): add deletion unit tests
2025-06-20 16:09:27 +00:00
aaldebs99
a5ebbd59bd
feat(everything): Add document deletion
2025-06-20 03:49:16 +00:00
aaldebs99
ec497a4a08
Merge branch 'main' into feat/document-labels
2025-06-19 18:40:50 -07:00
aaldebs99
3ecc82dfdf
fix(frontend): label writing and fetching logic
2025-06-20 01:32:32 +00:00
aaldebs99
e7b47b7d61
fi(backend): migrate python code to rust lol
2025-06-20 01:32:05 +00:00
perf3ct
741fcc2826
feat(tests): resolve issue with 'source' tests
2025-06-19 20:29:35 +00:00
aaldebs99
0e19ba3ea9
fix(backend): lables handling
2025-06-19 19:47:49 +00:00
aaldebs99
5de7f03a3e
fix(backend): labels
2025-06-19 18:58:00 +00:00
Jon Fuller
ae5a860065
Merge branch 'main' into feat/document-labels
2025-06-19 11:32:03 -07:00
aaldebs99
3cf1b8cd73
fix(server): static file routes
2025-06-19 18:29:52 +00:00
Jon Fuller
a335af6fff
Merge branch 'main' into feat/document-labels
2025-06-18 19:07:54 -07:00
aaldebs99
6942da2b20
chore(server): remove unused system user
2025-06-19 00:41:01 +00:00
perf3ct
4a54b0a8b7
feat(server/client): implement labels for documents
2025-06-18 16:12:42 +00:00
perf3ct
4c946ca9cc
feat(tests): resolve last test issues
2025-06-17 22:14:38 +00:00
perf3ct
261d71c5ae
feat(tests): fix the vast majority of both server and client tests
2025-06-17 22:06:12 +00:00
perf3ct
21b868a2e4
feat(tests): add actual images as part of e2e and testing
2025-06-17 21:26:39 +00:00
perf3ct
309a61bcd4
feat(client): update failedOcr page for duplicates
2025-06-17 16:52:45 +00:00
perf3ct
bdb136d615
feat(server/client): implement updated FailedOcrPage, duplicate management, and file hashing
2025-06-17 16:17:23 +00:00
perf3ct
75747016f0
feat(server): add hash for documents
2025-06-17 15:41:42 +00:00
perf3ct
d84193f444
fix(ocr_queue): don't slam the DB while we wait
2025-06-17 14:45:44 +00:00
perf3ct
d7607923be
feat(server): create specific endpoint for fetching documents, fix client being served again
2025-06-17 04:05:57 +00:00
perf3ct
f0f90d71de
feat(client/server): create endpoint for fetching individual files, and fix client not serving files
2025-06-17 03:38:16 +00:00
perf3ct
3ae542088b
feat(client/server): advanced search, along with fixing build errors
2025-06-17 02:56:59 +00:00
perf3ct
76529f83be
feat(client/server): implement a much better search
2025-06-17 02:41:16 +00:00
perf3ct
98a4b7479b
feat(server/client): remove webdav feature from user's settings as it's in sources now
2025-06-17 01:57:56 +00:00
perf3ct
8de1e153a1
feat(server): stop image preprocessing in OCR
2025-06-17 00:35:03 +00:00
perf3ct
e6ab56daa8
feat(server): break up large db.rs file into multiple files, and add more PDF guardrails
2025-06-17 00:25:21 +00:00
perf3ct
27c38bf0fe
feat(server): try to resume syncs after server restart
2025-06-16 23:21:43 +00:00
perf3ct
a47960a059
feat(server): also generate thumbnails for non-images, and resolve failing unit/integration tests
2025-06-16 22:51:29 +00:00
perf3ct
abdea3226f
feat(server): put more guardrails around PDF OCR size, and image size OCR
2025-06-16 22:39:00 +00:00
perf3ct
b9f2014509
feat(server): if there's no sync even running, allow sync to be cancelled
2025-06-16 21:39:41 +00:00
perf3ct
0ccceb768a
feat(server): create folders within 'upload' path to manage thumbnails, processed images, etc.
2025-06-16 21:24:46 +00:00
perf3ct
6f3aa771c0
feat(server/client): fix thumbnails and quick search
2025-06-16 17:40:53 +00:00
perf3ct
d51f2793e9
feat(server/client): update function used to display singular documents
2025-06-16 17:10:55 +00:00
perf3ct
e33240a811
feat(server): implement queue system for ocr process as well, to fight resource exhaustion
2025-06-16 01:20:13 +00:00
perf3ct
91b16de082
feat(server): create more DB guardrails, and lots of missing tests
2025-06-15 22:14:02 +00:00
perf3ct
a7cf67f90d
feat(server/client): add pagination in client, resolve race condition in server
2025-06-15 21:48:59 +00:00
perf3ct
42bc72ded4
feat(server/client): add lots of OCR tweaks
2025-06-15 21:24:06 +00:00
perf3ct
0cc77ed8ac
feat(server): fix the sync scheduler for sources
2025-06-15 18:05:56 +00:00
perf3ct
6004f3a001
feat(client): also show settings for s3 and local sources in the client
2025-06-15 18:00:35 +00:00
perf3ct
ea8ad2c262
feat(server/client): working s3 and local source types
2025-06-15 17:51:04 +00:00
perf3ct
df51e61d06
feat(client): also update sources page and the various buttons
2025-06-15 17:06:38 +00:00
perf3ct
41774056c7
feat(async): create dedicated pools + runtime isolation for OCR
2025-06-15 16:47:55 +00:00
perf3ct
853c9b7c2e
feat(async): create dedicated threads for ocr_runtime
2025-06-15 16:38:27 +00:00
perf3ct
0f14b5c8e7
feat(server): upgrade WebDAV settings on Sources page
2025-06-15 16:31:58 +00:00
perf3ct
59e5356a25
feat(server): create 'sources' concept and move WebDAV settings page to it
2025-06-15 16:12:18 +00:00
perf3ct
691c5e6bb8
feat(db): try to improve db queue and number of connections
2025-06-15 05:04:48 +00:00
perf3ct
6898d85981
feat(server): I feel like I'm going to have to come back and fix this later
2025-06-15 04:58:22 +00:00
perf3ct
4aa3d77e40
feat(server): fix recursively scanning the uploads folder, and the quick search bar
2025-06-15 04:37:49 +00:00
perf3ct
99521b4ca0
feat(server): fix breaking changes in deps, take 2
2025-06-15 04:08:34 +00:00
perf3ct
f8853ce6a6
feat(server): upgrade all versions and resolve breaking changes
2025-06-15 02:23:35 +00:00
perf3ct
e8dd7a788e
feat(server): rewrite nearly everything to be async/follow best practices
2025-06-15 02:06:17 +00:00
perf3ct
f2136cbd7b
feat(server): webdav download and ocr actually works
2025-06-15 01:12:01 +00:00
perf3ct
9e1acbf1b5
feat(server): fix migration not working
2025-06-14 22:57:43 +00:00
perf3ct
9fa45f8891
feat(server): implement better ocr failure and guardrails
2025-06-14 22:13:04 +00:00
perf3ct
aa45cd06e0
feat(server): webdav integration nearly done
2025-06-14 16:21:28 +00:00
perf3ct
5b67232266
feat(server): webdav integration nearly done
2025-06-14 16:14:41 +00:00
perf3ct
57c118c049
feat(server): implement notifications and webdav
2025-06-14 01:34:56 +00:00
perf3ct
f7874f4541
feat(docs): clean up docs and make dev ex easier with variables
2025-06-13 23:21:45 +00:00
perf3ct
63b322ac7a
feat(server): add role capability, and fix tests
2025-06-13 20:58:36 +00:00
perf3ct
afd01e6075
fix(server): at least the watch folder doesn't blow up now
2025-06-13 20:11:22 +00:00
perf3ct
c7a0c25c23
fix(server): update some integration tests and create 'system' user
2025-06-13 19:56:25 +00:00
perf3ct
e672613d50
feat(client): refreshing the page no longer returns a 404
2025-06-13 19:22:57 +00:00
perf3ct
e3f1855711
feat(client/server): add nextcloud/webdav capability, add integration tests
2025-06-13 17:09:05 +00:00
perf3ct
57c6b370d2
feat(client): convert frontend to ts
2025-06-13 16:16:23 +00:00
perf3ct
725105d62f
feat(migrations): add missing migrations and fix metrics endpoint
2025-06-13 15:55:30 +00:00
perf3ct
00b2bfe22c
feat(server/client): the /documents endpoint works again, and so does the watch folder...kinda
2025-06-13 15:53:19 +00:00
perf3ct
e6e2ba76f5
feat(ocr): fix ocr variables
2025-06-13 15:24:25 +00:00
perf3ct
cd35f877b1
feat(migrations): try to fix the migrations service
2025-06-13 15:14:13 +00:00
perf3ct
e1e949cf65
feat(migrations): try to fix the migrations service
2025-06-13 14:27:31 +00:00
perf3ct
e61db1036e
feat(migrations): try to fix the migrations service
2025-06-13 14:19:45 +00:00