feat(docs): add docs and update readme

This commit is contained in:
perf3ct 2025-07-22 19:30:40 +00:00
parent 7b54381693
commit dfd0caf6f8
6 changed files with 674 additions and 14 deletions

View File

@ -7,20 +7,24 @@ A powerful, modern document management system built with Rust and React. Readur
## ✨ Features
- 🔐 **Secure Authentication**: JWT-based user authentication with bcrypt password hashing + OIDC/SSO support
- 👥 **User Management**: Role-based access control with Admin and User roles
- 📤 **Smart File Upload**: Drag-and-drop support for PDF, images, text files, and Office documents
- 🔍 **Advanced OCR**: Automatic text extraction using Tesseract for searchable document content
- 🔎 **Powerful Search**: PostgreSQL full-text search with multiple modes (simple, phrase, fuzzy, boolean)
- 🔗 **Multi-Source Sync**: WebDAV, Local Folders, and S3-compatible storage integration
- 🏷️ **Labels & Organization**: Comprehensive tagging system with color-coding and hierarchical structure
- 👁️ **Folder Monitoring**: Non-destructive file watching with intelligent sync scheduling
- 📊 **Health Monitoring**: Proactive source validation and system health tracking
- 🔔 **Notifications**: Real-time alerts for sync events, OCR completion, and system status
- 🎨 **Modern UI**: Beautiful React frontend with Material-UI components and responsive design
- 🐳 **Docker Ready**: Complete containerization with production-ready multi-stage builds
- ⚡ **High Performance**: Rust backend for speed and reliability
- 📊 **Analytics Dashboard**: Document statistics and processing status overview
| Feature | Description | Documentation |
|---------|-------------|---------------|
| 🔐 **Secure Authentication** | JWT-based user authentication with bcrypt password hashing + OIDC/SSO support | [User Management](docs/user-management-guide.md), [OIDC Setup](docs/oidc-setup.md) |
| 👥 **User Management** | Role-based access control with Admin and User roles | [User Management Guide](docs/user-management-guide.md) |
| 📤 **Smart File Upload** | Drag-and-drop support for PDF, images, text files, and Office documents | [File Upload Guide](docs/file-upload-guide.md) |
| 🔍 **Advanced OCR** | Automatic text extraction using Tesseract for searchable document content | [OCR Optimization](docs/dev/OCR_OPTIMIZATION_GUIDE.md) |
| 🌍 **Multi-Language OCR** | Process documents in multiple languages simultaneously with automatic language detection | [Multi-Language OCR Guide](docs/multi-language-ocr-guide.md) |
| 🔎 **Powerful Search** | PostgreSQL full-text search with multiple modes (simple, phrase, fuzzy, boolean) | [Advanced Search Guide](docs/advanced-search.md) |
| 🔗 **Multi-Source Sync** | WebDAV, Local Folders, and S3-compatible storage integration | [Sources Guide](docs/sources-guide.md) |
| 🏷️ **Labels & Organization** | Comprehensive tagging system with color-coding and hierarchical structure | [Labels & Organization](docs/labels-and-organization.md) |
| 👁️ **Folder Monitoring** | Non-destructive file watching with intelligent sync scheduling | [Watch Folder Guide](docs/WATCH_FOLDER.md) |
| 📊 **Health Monitoring** | Proactive source validation and system health tracking | [Health Monitoring Guide](docs/health-monitoring-guide.md) |
| 🔔 **Notifications** | Real-time alerts for sync events, OCR completion, and system status | [Notifications Guide](docs/notifications-guide.md) |
| 🔌 **Swagger UI** | Built-in interactive API documentation accessible from your profile menu | [Swagger UI Guide](docs/swagger-ui-guide.md) |
| 🎨 **Modern UI** | Beautiful React frontend with Material-UI components and responsive design | [User Guide](docs/user-guide.md) |
| 🐳 **Docker Ready** | Complete containerization with production-ready multi-stage builds | [Installation Guide](docs/installation.md), [Deployment Guide](docs/deployment.md) |
| ⚡ **High Performance** | Rust backend for speed and reliability | [Architecture Documentation](docs/dev/architecture.md) |
| 📊 **Analytics Dashboard** | Document statistics and processing status overview | [Analytics Dashboard Guide](docs/analytics-dashboard-guide.md) |
## 🚀 Quick Start

View File

@ -0,0 +1,169 @@
# 📊 Analytics Dashboard Guide
The Analytics Dashboard provides comprehensive insights into your document management system, showing statistics, processing status, and usage patterns.
## Dashboard Overview
Access the Analytics Dashboard through:
- **Main Navigation** → Analytics
- **Admin Panel** → System Analytics (admin users)
- **API Endpoints** for programmatic access
## Document Statistics
### Processing Metrics
- **Total Documents** - Complete count of all documents in the system
- **OCR Success Rate** - Percentage of successful text extractions
- **Processing Speed** - Average documents processed per hour/day
- **Storage Usage** - Total disk space used by documents and metadata
### Document Types
- **File Format Breakdown** - Distribution of PDF, images, Office docs
- **Source Distribution** - Documents by upload method (manual, WebDAV, S3, local)
- **Size Distribution** - Document size ranges and storage impact
- **Language Detection** - OCR language distribution statistics
## Processing Status Overview
### Real-time Status
- **Queue Length** - Current documents awaiting processing
- **Active Jobs** - Documents currently being processed
- **Recent Completions** - Recently finished processing jobs
- **Error Count** - Failed processing attempts requiring attention
### Processing History
- **Hourly Trends** - Processing volume over time
- **Daily Patterns** - Peak usage times and quiet periods
- **Success Rates** - Historical OCR and processing reliability
- **Performance Metrics** - Processing speed improvements over time
## User Activity Analytics
### Usage Patterns
- **Active Users** - Daily/weekly/monthly active user counts
- **Upload Activity** - Document upload frequency by user
- **Search Activity** - Most common search terms and patterns
- **Feature Usage** - Which features are used most frequently
### Access Patterns
- **Login Statistics** - User authentication frequency
- **Session Duration** - Average time spent in the application
- **Popular Documents** - Most accessed and searched documents
- **Peak Hours** - Busiest times for system usage
## Source Performance
### Sync Statistics
- **Source Health** - Status of all configured data sources
- **Sync Frequency** - How often sources are synchronized
- **Discovery Rate** - New documents found per sync cycle
- **Error Rates** - Failed sync attempts by source type
### Source Comparison
- **Volume by Source** - Document counts from each source
- **Performance Metrics** - Sync speed and reliability comparison
- **Storage Usage** - Disk usage by source type
- **Processing Success** - OCR success rates by source
## System Performance
### Resource Utilization
- **CPU Usage** - System load over time
- **Memory Usage** - RAM consumption patterns
- **Disk I/O** - Storage read/write activity
- **Network Usage** - Bandwidth utilization for remote sources
### Health Indicators
- **Uptime Statistics** - System availability metrics
- **Response Times** - API and web interface performance
- **Error Rates** - System error frequency and types
- **Queue Health** - Background job processing efficiency
## Custom Reports
### Report Builder
Create custom analytics reports with:
- **Date Range Selection** - Custom time periods for analysis
- **Metric Selection** - Choose specific statistics to include
- **Filtering Options** - Filter by user, source, document type
- **Export Formats** - Download as PDF, Excel, or CSV
### Scheduled Reports
- **Daily Summaries** - Automated daily statistics via email
- **Weekly Reports** - Comprehensive weekly performance reports
- **Monthly Analytics** - Detailed monthly usage and health reports
- **Custom Schedules** - Configure custom report frequencies
## Data Export
### Export Options
- **CSV Format** - Raw data for spreadsheet analysis
- **JSON Format** - Structured data for programmatic use
- **PDF Reports** - Formatted reports for sharing
- **Excel Workbooks** - Multi-sheet reports with charts
### API Access
Programmatic access to analytics data:
```bash
# Get document statistics
GET /api/analytics/documents
# Get processing metrics
GET /api/analytics/processing
# Get user activity data
GET /api/analytics/users
# Get system performance
GET /api/analytics/system
```
## Dashboard Customization
### Widget Configuration
- **Add/Remove Widgets** - Customize which metrics are displayed
- **Widget Positioning** - Drag and drop to reorganize layout
- **Refresh Intervals** - Set automatic data refresh rates
- **Display Options** - Choose chart types and visualization styles
### User Preferences
- **Default Views** - Set your preferred dashboard configuration
- **Notification Thresholds** - Configure alerts for specific metrics
- **Color Schemes** - Customize dashboard appearance
- **Timezone Settings** - Display data in your local timezone
## Monitoring and Alerts
### Threshold Monitoring
Set alerts for key metrics:
- **Storage Usage** - Alert when disk usage exceeds thresholds
- **Processing Delays** - Notify when queue length grows too large
- **Error Rates** - Alert when failure rates exceed normal levels
- **Performance Degradation** - Monitor response time increases
### Integration Options
- **Email Alerts** - Receive notifications via email
- **Webhook Integration** - Send alerts to external monitoring systems
- **Slack/Teams** - Push notifications to team chat channels
- **Custom Scripts** - Trigger automated responses to alerts
## Troubleshooting
### Data Not Updating
- Check system time synchronization
- Verify analytics service is running
- Review database connectivity
- Clear browser cache and refresh
### Performance Issues
- Monitor database query performance
- Check for large datasets requiring pagination
- Review concurrent user limits
- Consider increasing system resources
### Missing Data Points
- Verify log collection is enabled
- Check data retention policies
- Review source configuration
- Ensure proper permissions for analytics access

65
docs/file-upload-guide.md Normal file
View File

@ -0,0 +1,65 @@
# 📤 Smart File Upload Guide
Readur provides an intuitive drag-and-drop file upload system that supports multiple document formats and batch processing.
## Supported File Types
- **PDF Files** (.pdf) - Direct text extraction and OCR for scanned PDFs
- **Images** (.png, .jpg, .jpeg, .tiff, .bmp, .webp) - Full OCR text extraction
- **Text Files** (.txt, .rtf) - Direct text import
- **Office Documents** (.docx, .doc, .xlsx, .xls, .pptx, .ppt) - Text extraction and OCR
## Upload Methods
### Drag & Drop
1. Navigate to the main dashboard
2. Drag files from your computer directly onto the upload area
3. Multiple files can be selected and dropped simultaneously
4. Progress indicators show upload and processing status
### Browse & Select
1. Click the "Upload Documents" button
2. Use the file browser to select one or multiple files
3. Click "Open" to begin the upload process
## Batch Processing
- Upload multiple files at once for efficient processing
- Each file is processed independently for OCR and text extraction
- Real-time status updates show processing progress
- Failed uploads can be retried individually
## Processing Pipeline
1. **File Validation** - Verify file type and size limits
2. **Storage** - Secure file storage with backup
3. **OCR Processing** - Automatic text extraction using Tesseract
4. **Indexing** - Full-text search indexing in PostgreSQL
5. **Metadata Extraction** - File properties and document information
## Best Practices
- **File Size**: Keep individual files under 50MB for optimal performance
- **File Names**: Use descriptive names for better organization
- **Batch Size**: Upload 10-20 files at once for best performance
- **Network**: Stable internet connection recommended for large uploads
## Troubleshooting
### Upload Fails
- Check file size limits
- Verify file format is supported
- Ensure stable internet connection
- Try uploading fewer files at once
### OCR Issues
- Ensure images have good contrast and resolution
- PDF files may need higher quality scans
- Check the [OCR Optimization Guide](dev/OCR_OPTIMIZATION_GUIDE.md) for advanced tips
## Security
- All uploads are scanned for malicious content
- Files are stored securely with proper access controls
- User permissions apply to all uploaded documents
- Automatic backup ensures data safety

View File

@ -0,0 +1,128 @@
# 📊 Health Monitoring Guide
Readur includes comprehensive health monitoring to ensure system reliability and proactive issue detection.
## Overview
The health monitoring system continuously validates:
- Data source connectivity and status
- System resource utilization
- Processing queue health
- Database performance
- OCR engine availability
## Monitoring Dashboard
Access health information through:
- **Admin Panel** → Health Status
- **API Endpoints** for programmatic monitoring
- **Real-time Alerts** for immediate issue notification
## Source Health Validation
### WebDAV Sources
- Connection testing every 5 minutes
- Authentication validation
- Network latency monitoring
- Error rate tracking
### Local Folder Sources
- Directory accessibility checks
- Permission validation
- Disk space monitoring
- File system health
### S3-Compatible Sources
- Bucket accessibility
- Credential validation
- Region connectivity
- API rate limit monitoring
## System Health Metrics
### Performance Indicators
- **CPU Usage** - System load monitoring
- **Memory Usage** - RAM utilization tracking
- **Disk Space** - Storage capacity alerts
- **Queue Length** - Processing backlog size
### Processing Health
- **OCR Success Rate** - Text extraction reliability
- **Processing Speed** - Documents per minute
- **Error Rates** - Failed operation tracking
- **Retry Attempts** - Automatic recovery metrics
## Alert Configuration
### Alert Types
- **Critical** - System failures requiring immediate attention
- **Warning** - Performance degradation or resource limits
- **Info** - Status updates and maintenance notifications
### Notification Methods
- **In-App Notifications** - Real-time dashboard alerts
- **Email Alerts** - Configurable email notifications
- **Webhook Integration** - External system notifications
## Health Check Endpoints
### API Health Checks
```bash
# System health overview
GET /api/health
# Detailed component status
GET /api/health/detailed
# Source-specific health
GET /api/health/sources/{source_id}
```
### Response Format
```json
{
"status": "healthy",
"timestamp": "2024-01-01T00:00:00Z",
"components": {
"database": "healthy",
"ocr_engine": "healthy",
"file_storage": "healthy",
"sources": {
"webdav_1": "healthy",
"local_1": "warning"
}
}
}
```
## Troubleshooting
### Common Issues
- **Source Disconnected** - Check network connectivity and credentials
- **High Queue Length** - Scale processing resources or optimize OCR
- **Memory Warnings** - Review document processing batch sizes
- **Disk Space Low** - Clean up temporary files or expand storage
### Recovery Actions
- **Automatic Retry** - Failed operations retry with exponential backoff
- **Graceful Degradation** - System continues operating with reduced functionality
- **Manual Intervention** - Admin tools for resolving complex issues
## Configuration
Health monitoring can be configured in your environment:
```env
# Health check intervals (seconds)
HEALTH_CHECK_INTERVAL=300
SOURCE_CHECK_INTERVAL=600
# Alert thresholds
CPU_WARNING_THRESHOLD=80
MEMORY_WARNING_THRESHOLD=85
DISK_WARNING_THRESHOLD=90
# Notification settings
HEALTH_EMAIL_ALERTS=true
WEBHOOK_URL=https://your-monitoring-system.com/webhook
```

150
docs/notifications-guide.md Normal file
View File

@ -0,0 +1,150 @@
# 🔔 Notifications Guide
Readur provides comprehensive real-time notifications to keep you informed about document processing, sync events, and system status.
## Notification Types
### Document Processing
- **OCR Completion** - Text extraction finished for uploaded documents
- **Processing Errors** - Failed OCR or document processing alerts
- **Batch Processing** - Status updates for multiple document uploads
- **Quality Warnings** - Low-quality OCR results requiring attention
### Sync Events
- **Source Sync Complete** - WebDAV, S3, or local folder synchronization finished
- **New Documents Found** - Fresh documents discovered during sync
- **Sync Errors** - Connection issues or permission problems
- **Conflict Resolution** - File conflicts requiring user intervention
### System Status
- **Health Alerts** - System performance warnings or failures
- **Maintenance Windows** - Scheduled maintenance notifications
- **Security Events** - Login attempts, permission changes
- **Storage Warnings** - Disk space or quota limitations
## Notification Delivery
### In-App Notifications
- **Real-time Badge** - Notification counter in the top navigation
- **Notification Panel** - Expandable list of recent alerts
- **Toast Messages** - Immediate pop-up notifications for urgent items
- **Dashboard Widgets** - Status cards showing notification summaries
### Email Notifications
- **Immediate Alerts** - Critical system or processing failures
- **Daily Digest** - Summary of processing activity and status
- **Weekly Reports** - System health and usage statistics
- **Custom Triggers** - User-configured alert conditions
## Notification Settings
### User Preferences
Access notification settings through:
1. Click your profile in the top-right corner
2. Select "Notification Settings"
3. Configure your preferences for each notification type
### Notification Categories
- **Critical** - System failures, security alerts (always enabled)
- **Important** - Processing errors, sync failures
- **Informational** - Completion notifications, status updates
- **Promotional** - Feature updates, tips (can be disabled)
### Delivery Preferences
- **In-App Only** - Notifications appear only within Readur
- **Email + In-App** - Notifications sent to both locations
- **Email Only** - Notifications sent only via email
- **Disabled** - No notifications for this category
## Advanced Configuration
### Admin Settings
Administrators can configure system-wide notification policies:
```env
# Email notification settings
SMTP_HOST=smtp.your-domain.com
SMTP_PORT=587
SMTP_USERNAME=notifications@your-domain.com
SMTP_PASSWORD=your-password
# Notification thresholds
CRITICAL_ERROR_THRESHOLD=5
WARNING_BATCH_SIZE=100
DIGEST_FREQUENCY=daily
# Webhook integrations
SLACK_WEBHOOK_URL=https://hooks.slack.com/...
TEAMS_WEBHOOK_URL=https://your-org.webhook.office.com/...
```
### Webhook Integration
Send notifications to external systems:
#### Slack Integration
```json
{
"channel": "#readur-alerts",
"username": "Readur Bot",
"text": "OCR processing completed for 15 documents",
"attachments": [
{
"color": "good",
"fields": [
{"title": "Success Rate", "value": "93%", "short": true},
{"title": "Processing Time", "value": "2m 34s", "short": true}
]
}
]
}
```
#### Teams Integration
```json
{
"@type": "MessageCard",
"themeColor": "0076D7",
"summary": "Readur Notification",
"sections": [{
"activityTitle": "Document Processing Complete",
"activitySubtitle": "15 documents processed successfully",
"facts": [
{"name": "Success Rate", "value": "93%"},
{"name": "Processing Time", "value": "2m 34s"}
]
}]
}
```
## Managing Notifications
### Notification History
- **View All** - Complete history of notifications
- **Filter by Type** - Show only specific notification categories
- **Search** - Find notifications by content or date
- **Archive** - Mark notifications as read or hide them
### Bulk Actions
- **Mark All Read** - Clear all unread notification badges
- **Delete Old** - Remove notifications older than specified date
- **Export** - Download notification history as CSV or JSON
## Troubleshooting
### Missing Notifications
- Check notification settings in your profile
- Verify email address is correct and confirmed
- Check spam/junk folder for email notifications
- Ensure browser notifications are enabled
### Too Many Notifications
- Adjust notification thresholds in settings
- Disable informational categories
- Switch to daily digest mode for non-critical items
- Use filters to focus on important notifications
### Email Delivery Issues
- Verify SMTP configuration (admin only)
- Check email server reputation and SPF records
- Test email delivery with notification test feature
- Review email bounce logs in admin panel

144
docs/swagger-ui-guide.md Normal file
View File

@ -0,0 +1,144 @@
# 🔌 Swagger UI Guide
Readur includes built-in Swagger UI for interactive API documentation and testing. Access it easily through your user profile menu.
## Accessing Swagger UI
1. **Login to Readur** - Authenticate with your user credentials
2. **Click Your Profile** - Click on your profile avatar in the top-right corner
3. **Select "API Documentation"** - Choose the Swagger UI option from the dropdown menu
4. **Interactive Documentation** - Explore and test all available API endpoints
## API Documentation Features
### Endpoint Explorer
- **Complete API Reference** - All REST endpoints with detailed descriptions
- **Request/Response Examples** - Sample data for every endpoint
- **Parameter Details** - Required and optional parameters with types
- **Authentication Info** - JWT token requirements and usage
### Interactive Testing
- **Try It Out** - Execute API calls directly from the documentation
- **Real Data** - Test with your actual Readur data and configuration
- **Response Validation** - See actual responses and status codes
- **Error Handling** - View error responses and troubleshooting info
## API Categories
### Authentication
- **Login/Logout** - User authentication endpoints
- **Token Management** - JWT token refresh and validation
- **User Registration** - New user account creation
- **Password Reset** - Password recovery workflows
### Document Management
- **Upload Documents** - Single and batch file upload endpoints
- **Document Retrieval** - Get document metadata and content
- **Document Search** - Full-text search with various modes
- **Document Operations** - Update, delete, and organize documents
### User Management
- **User CRUD** - Create, read, update, delete user accounts
- **Role Management** - Assign and modify user roles
- **Permission Control** - Manage access rights and restrictions
- **User Preferences** - Personal settings and configurations
### Source Management
- **Source Configuration** - WebDAV, S3, and local folder setup
- **Sync Operations** - Manual and automated synchronization
- **Source Health** - Status monitoring and health checks
- **Source Statistics** - Usage metrics and performance data
### System Administration
- **Health Monitoring** - System status and performance metrics
- **Analytics Data** - Usage statistics and reporting endpoints
- **Configuration** - System settings and environment variables
- **Maintenance** - Backup, cleanup, and administrative tasks
## Authentication in Swagger UI
### Using JWT Tokens
1. **Login via API** - Use `/api/auth/login` endpoint to get a JWT token
2. **Copy Token** - Copy the returned JWT token
3. **Authorize** - Click the "Authorize" button in Swagger UI
4. **Enter Token** - Paste your JWT token in the format: `Bearer your_token_here`
5. **Test Endpoints** - All authenticated endpoints now work with your credentials
### Token Management
- **Token Expiry** - Tokens expire after a configured time period
- **Refresh Tokens** - Use refresh token endpoint to get new access tokens
- **Logout** - Invalidate tokens using the logout endpoint
- **Multiple Sessions** - Each browser session needs its own token
## Best Practices
### Development Usage
- **Test First** - Use Swagger UI to test API endpoints before implementing
- **Validate Responses** - Check response formats match your expectations
- **Error Scenarios** - Test error conditions and edge cases
- **Performance Testing** - Monitor response times for optimization
### Production Considerations
- **Access Control** - Swagger UI respects the same authentication as the main app
- **Rate Limiting** - API rate limits apply to Swagger UI requests
- **Logging** - All API calls from Swagger UI are logged normally
- **Security** - Use HTTPS in production for secure token transmission
## Common Use Cases
### Frontend Development
- **API Integration** - Test endpoints before implementing in your frontend
- **Data Formats** - Understand expected request/response formats
- **Error Handling** - Learn about error codes and messages
- **Feature Testing** - Validate new features work as expected
### System Integration
- **Third-party Tools** - Test integration with external systems
- **Automation Scripts** - Develop scripts using API documentation
- **Monitoring Systems** - Integrate health check endpoints
- **Data Migration** - Use bulk operations for data import/export
### Troubleshooting
- **Debug Issues** - Test API calls to isolate problems
- **Validate Permissions** - Check if user roles have correct access
- **Network Testing** - Verify connectivity and response times
- **Data Verification** - Confirm data integrity and processing status
## Advanced Features
### Custom Headers
- **Request Customization** - Add custom headers to API requests
- **Content-Type** - Specify different content types for uploads
- **User-Agent** - Set custom user agent strings
- **Cache Control** - Control caching behavior for responses
### Bulk Operations
- **Batch Uploads** - Test multiple file uploads simultaneously
- **Bulk Updates** - Update multiple documents or users at once
- **Mass Operations** - Perform administrative tasks in bulk
- **Data Export** - Export large datasets via API
## Configuration Options
Administrators can configure Swagger UI access:
```env
# Enable/disable Swagger UI
SWAGGER_UI_ENABLED=true
# Customize Swagger UI path
SWAGGER_UI_PATH=/docs
# Authentication requirements
SWAGGER_REQUIRE_AUTH=true
# Rate limiting for API documentation
SWAGGER_RATE_LIMIT=1000
```
## Security Considerations
- **Authentication Required** - Swagger UI requires the same login as the main application
- **Role-Based Access** - API endpoints respect user role permissions
- **Audit Logging** - All API calls are logged for security monitoring
- **Token Security** - JWT tokens should be kept secure and not shared