Readur

Commit Graph

Author	SHA1	Message	Date
perf3ct	036941b3dc	feat(dev): trigger re-ocr on doc and docx	2025-09-02 23:04:31 +00:00
perf3ct	483d89132f	feat(office): add documentation around using antiword/catdoc for `doc` functionality	2025-09-02 20:29:17 +00:00
perf3ct	774efd1140	refactor(server): remove XML vs library comparison functionality Remove all comparison-related code used to evaluate XML vs library-based Office document extraction. The XML approach has proven superior, so the comparison functionality is no longer needed. Changes: - Remove extraction_comparator.rs (entire comparison engine) - Remove test_extraction_comparison.rs binary - Remove comparison mode logic from enhanced.rs - Simplify fallback_strategy.rs to use XML extraction only - Update OCR service to use XML extraction as primary method - Clean up database migration to remove comparison-specific settings - Remove test_extraction binary from Cargo.toml - Update integration tests to work with simplified extraction The Office document extraction now flows directly to XML-based extraction without any comparison checks, maintaining the superior extraction quality while removing unnecessary complexity.	2025-09-02 01:22:19 +00:00

Author

SHA1

Message

Date

perf3ct

036941b3dc

feat(dev): trigger re-ocr on doc and docx

2025-09-02 23:04:31 +00:00

perf3ct

483d89132f

feat(office): add documentation around using antiword/catdoc for `doc` functionality

2025-09-02 20:29:17 +00:00

perf3ct

774efd1140

refactor(server): remove XML vs library comparison functionality

Remove all comparison-related code used to evaluate XML vs library-based
Office document extraction. The XML approach has proven superior, so the
comparison functionality is no longer needed.

Changes:
- Remove extraction_comparator.rs (entire comparison engine)
- Remove test_extraction_comparison.rs binary
- Remove comparison mode logic from enhanced.rs
- Simplify fallback_strategy.rs to use XML extraction only
- Update OCR service to use XML extraction as primary method
- Clean up database migration to remove comparison-specific settings
- Remove test_extraction binary from Cargo.toml
- Update integration tests to work with simplified extraction

The Office document extraction now flows directly to XML-based
extraction
without any comparison checks, maintaining the superior extraction
quality
while removing unnecessary complexity.

2025-09-02 01:22:19 +00:00

3 Commits