From 2e6c1ef2381ace30af72d235e7e6ba841faad1c3 Mon Sep 17 00:00:00 2001
From: perf3ct <jonfuller2012@gmail.com>
Date: Mon, 21 Jul 2025 23:34:57 +0000
Subject: [PATCH] feat(docs): add docs about multiple ocr languages

---
 README.md                        |   2 +
 docs/multi-language-ocr-guide.md | 247 +++++++++++++++++++++++++++++++
 2 files changed, 249 insertions(+)
 create mode 100644 docs/multi-language-ocr-guide.md

diff --git a/README.md b/README.md
index a9c5e73..d01cb28 100644
--- a/README.md
+++ b/README.md
@@ -54,6 +54,7 @@ open http://localhost:8000
 - [👥 User Management](docs/user-management-guide.md) - Authentication, roles, and administration
 - [🏷️ Labels & Organization](docs/labels-and-organization.md) - Document tagging and categorization
 - [🔎 Advanced Search](docs/advanced-search.md) - Search modes, syntax, and optimization
+- [🌍 Multi-Language OCR Guide](docs/multi-language-ocr-guide.md) - Process documents in multiple languages simultaneously
 - [🔐 OIDC Setup](docs/oidc-setup.md) - Single Sign-On integration
 
 ### Deployment & Operations
@@ -69,6 +70,7 @@ open http://localhost:8000
 - [🔍 OCR Optimization](docs/dev/OCR_OPTIMIZATION_GUIDE.md) - Improve OCR performance
 - [🗄️ Database Best Practices](docs/dev/DATABASE_GUARDRAILS.md) - Concurrency and safety
 - [📊 Queue Architecture](docs/dev/QUEUE_IMPROVEMENTS.md) - Background job processing
+- [⚠️ Error System Guide](docs/dev/ERROR_SYSTEM.md) - Comprehensive error handling architecture
 
 ## 🏗️ Architecture
 
diff --git a/docs/multi-language-ocr-guide.md b/docs/multi-language-ocr-guide.md
new file mode 100644
index 0000000..b76f4cc
--- /dev/null
+++ b/docs/multi-language-ocr-guide.md
@@ -0,0 +1,247 @@
+# Multi-Language OCR Guide
+
+Readur supports powerful multi-language OCR capabilities that allow you to process documents in multiple languages simultaneously for optimal text extraction accuracy.
+
+## 🌍 Overview
+
+The multi-language OCR system allows you to:
+- **Process documents in up to 4 languages simultaneously** for best results
+- **Set preferred languages** that apply to all your document uploads
+- **Retry failed OCR** with different language combinations
+- **Automatically optimize** text extraction by using multiple language models
+
+## 🚀 Getting Started
+
+### Setting Your Language Preferences
+
+1. **Navigate to Settings** in your account
+2. **Select OCR Languages** section
+3. **Choose up to 4 preferred languages** - these will be used for all new uploads
+4. **Set a primary language** - this language gets processing priority
+5. **Save your preferences**
+
+**Example preferred language setup:**
+- Primary: English (`eng`)
+- Additional: Spanish (`spa`), French (`fra`)
+- Result: Documents processed with English priority, plus Spanish and French recognition
+
+### Language Selection During Upload
+
+When uploading documents, you can:
+
+1. **Use your default preferences** - no action needed
+2. **Override for specific documents:**
+   - Click the language selector in the upload area
+   - Choose different languages for this upload session
+   - These languages will be applied to all files in the current upload
+
+## 📋 Available Languages
+
+Readur supports 67+ languages including:
+
+### Major World Languages
+- **English** (`eng`) - Default and most reliable
+- **Spanish** (`spa`) - Excellent accuracy
+- **French** (`fra`) - High quality results
+- **German** (`deu`) - Strong performance
+- **Italian** (`ita`) - Good accuracy
+- **Portuguese** (`por`) - Reliable processing
+- **Russian** (`rus`) - Solid results
+
+### Asian Languages  
+- **Chinese Simplified** (`chi_sim`)
+- **Chinese Traditional** (`chi_tra`)
+- **Japanese** (`jpn`)
+- **Korean** (`kor`)
+- **Hindi** (`hin`)
+- **Thai** (`tha`)
+- **Vietnamese** (`vie`)
+
+### European Languages
+- **Dutch** (`nld`)
+- **Swedish** (`swe`)
+- **Norwegian** (`nor`)
+- **Danish** (`dan`)
+- **Finnish** (`fin`)
+- **Polish** (`pol`)
+- **Czech** (`ces`)
+
+### And Many More
+Including Arabic (`ara`), Hebrew (`heb`), Turkish (`tur`), and dozens of other languages.
+
+> **Tip:** For the complete list of available languages, visit the OCR Languages page in your settings or call the API endpoint: `GET /api/ocr/languages`
+
+## 🛠️ Using the API
+
+### Get Available Languages
+```bash
+curl -H "Authorization: Bearer YOUR_TOKEN" \
+     https://your-readur-instance.com/api/ocr/languages
+```
+
+**Response:**
+```json
+{
+  "available_languages": [
+    {
+      "code": "eng",
+      "name": "English",
+      "installed": true
+    },
+    {
+      "code": "spa", 
+      "name": "Spanish",
+      "installed": true
+    }
+  ],
+  "current_user_language": "eng"
+}
+```
+
+### Update Language Preferences
+```bash
+curl -X PUT \
+     -H "Authorization: Bearer YOUR_TOKEN" \
+     -H "Content-Type: application/json" \
+     -d '{
+       "preferred_languages": ["eng", "spa", "fra"],
+       "primary_language": "eng"
+     }' \
+     https://your-readur-instance.com/api/settings
+```
+
+### Retry OCR with Different Languages
+```bash
+curl -X POST \
+     -H "Authorization: Bearer YOUR_TOKEN" \
+     -H "Content-Type: application/json" \
+     -d '{
+       "languages": ["eng", "deu"]
+     }' \
+     https://your-readur-instance.com/api/documents/DOCUMENT_ID/ocr/retry
+```
+
+## 🎯 Best Practices
+
+### Language Selection Strategy
+
+**For Mixed-Language Documents:**
+- Choose 2-3 languages that appear in your document
+- Always include English as a fallback (most reliable)
+- Put the dominant language first as your primary language
+
+**Examples:**
+- **Business document with English/Spanish:** `["eng", "spa"]`
+- **European legal document:** `["eng", "fra", "deu"]`
+- **Academic paper with multiple references:** `["eng", "spa", "ita"]`
+
+### Performance Optimization
+
+**Do:**
+- ✅ Limit to 2-4 languages for best performance
+- ✅ Include English when processing mixed content
+- ✅ Use specific language combinations for consistent document types
+- ✅ Set realistic expectations for complex multilingual documents
+
+**Don't:**
+- ❌ Select languages not present in your documents
+- ❌ Use more than 4 languages simultaneously
+- ❌ Expect perfect results with very low-quality scans
+- ❌ Mix completely unrelated language families unnecessarily
+
+## 🔄 Retrying OCR Processing
+
+If OCR results are poor, you can retry with different languages:
+
+### Via Web Interface
+1. **Navigate to the document** with poor OCR results
+2. **Click "Retry OCR"** button
+3. **Select different languages** that better match your document
+4. **Start retry process**
+
+### Common Retry Scenarios
+
+**Scenario 1: Wrong Language Detected**
+- Original: English-only processing of Spanish document
+- Solution: Retry with `["spa", "eng"]`
+
+**Scenario 2: Mixed Language Document**
+- Original: Single language processing
+- Solution: Add 2-3 relevant languages
+
+**Scenario 3: Poor Quality Scan**
+- Original: Fast processing with limited languages
+- Solution: Try with primary language + English fallback
+
+## 📊 Monitoring OCR Results
+
+### Understanding OCR Confidence
+- **90%+** - Excellent results, high accuracy
+- **70-89%** - Good results, minor errors possible  
+- **50-69%** - Moderate results, review recommended
+- **Below 50%** - Poor results, consider retry with different languages
+
+### Language-Specific Performance
+Different languages have varying accuracy rates:
+- **Latin-based scripts** (English, Spanish, French): Highest accuracy
+- **Germanic languages** (German, Dutch): Very good accuracy
+- **Asian languages** (Chinese, Japanese): Good accuracy with proper font recognition
+- **Arabic/Hebrew scripts**: Moderate accuracy, depends on text quality
+
+## 🐛 Troubleshooting
+
+### Common Issues
+
+**Problem:** "Language not available" error
+**Solution:** 
+- Check language code spelling (e.g., `eng` not `english`)
+- Verify language is installed on the server
+- Contact administrator if language should be available
+
+**Problem:** Poor OCR results despite correct language
+**Solutions:**
+- Ensure document scan quality is sufficient (300+ DPI recommended)
+- Try adding English as a fallback language
+- Consider document preprocessing (contrast, rotation correction)
+- Retry with fewer languages for better performance
+
+**Problem:** Slow processing with multiple languages  
+**Solutions:**
+- Reduce number of selected languages to 2-3
+- Use languages only present in your document
+- Consider processing during off-peak hours
+
+### Getting Help
+
+If you're experiencing issues:
+
+1. **Check the OCR Health page** - `GET /api/ocr/health`
+2. **Review your language selection** - ensure languages match document content
+3. **Try with English fallback** - adds reliability to processing
+4. **Contact support** with document ID and language combination used
+
+## 🔮 Advanced Features
+
+### Planned Enhancements
+- **Auto-language detection**: Automatic suggestion of optimal language combinations
+- **Custom language models**: Upload your own specialized language data
+- **Batch language updates**: Change languages for multiple documents at once
+- **Language-specific confidence thresholds**: Fine-tune accuracy requirements per language
+
+### Integration Options
+The multi-language OCR system integrates with:
+- **Document management workflows**
+- **Automated processing pipelines**  
+- **Third-party applications via REST API**
+- **Webhook notifications for completion**
+
+## 📚 Additional Resources
+
+- **API Documentation**: Complete endpoint reference
+- **Language Codes Reference**: Full list of supported language codes
+- **Performance Guidelines**: Optimization recommendations
+- **Migration Guide**: Upgrading from single-language setup
+
+---
+
+**Need Help?** Contact support or check the system health dashboard for real-time OCR capability status.
\ No newline at end of file