feat(docs): add more user facing docs, update README, and move dev docs to correct folder
This commit is contained in:
parent
102e7d8b3f
commit
a7883c1b63
787
README.md
787
README.md
|
|
@ -16,10 +16,6 @@ A powerful, modern document management system built with Rust and React. Readur
|
|||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Using Docker Compose (Recommended)
|
||||
|
||||
The fastest way to get Readur running:
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/perfectra1n/readur
|
||||
|
|
@ -38,278 +34,26 @@ open http://localhost:8000
|
|||
|
||||
> ⚠️ **Important**: Change the default admin password immediately after first login!
|
||||
|
||||
### What You Get
|
||||
## 📚 Documentation
|
||||
|
||||
After deployment, you'll have:
|
||||
- **Web Interface**: Modern document management UI at `http://localhost:8000`
|
||||
- **PostgreSQL Database**: Document metadata and full-text search indexes
|
||||
- **File Storage**: Persistent document storage with OCR processing
|
||||
- **Watch Folder**: Automatic file ingestion from mounted directories
|
||||
- **REST API**: Full API access for integrations
|
||||
### Getting Started
|
||||
- [📦 Installation Guide](docs/installation.md) - Docker & manual installation instructions
|
||||
- [🔧 Configuration](docs/configuration.md) - Environment variables and settings
|
||||
- [📖 User Guide](docs/user-guide.md) - How to use Readur effectively
|
||||
|
||||
## 🐳 Docker Deployment Guide
|
||||
### Deployment & Operations
|
||||
- [🚀 Deployment Guide](docs/deployment.md) - Production deployment, SSL, monitoring
|
||||
- [🔄 Reverse Proxy Setup](docs/REVERSE_PROXY.md) - Nginx, Traefik, and more
|
||||
- [📁 Watch Folder Guide](docs/WATCH_FOLDER.md) - Automatic document ingestion
|
||||
|
||||
### Production Docker Compose
|
||||
### Development
|
||||
- [🏗️ Developer Documentation](docs/dev/) - Architecture, development setup, testing
|
||||
- [🔌 API Reference](docs/api-reference.md) - REST API documentation
|
||||
|
||||
For production deployments, create a custom `docker-compose.prod.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
image: readur:latest
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
# Core Configuration
|
||||
- DATABASE_URL=postgresql://readur:${DB_PASSWORD}@postgres:5432/readur
|
||||
- JWT_SECRET=${JWT_SECRET}
|
||||
- SERVER_ADDRESS=0.0.0.0:8000
|
||||
|
||||
# File Storage
|
||||
- UPLOAD_PATH=/app/uploads
|
||||
- WATCH_FOLDER=/app/watch
|
||||
- ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,doc,docx
|
||||
|
||||
# Watch Folder Settings
|
||||
- WATCH_INTERVAL_SECONDS=30
|
||||
- FILE_STABILITY_CHECK_MS=500
|
||||
- MAX_FILE_AGE_HOURS=168
|
||||
|
||||
# OCR Configuration
|
||||
- OCR_LANGUAGE=eng
|
||||
- CONCURRENT_OCR_JOBS=4
|
||||
- OCR_TIMEOUT_SECONDS=300
|
||||
- MAX_FILE_SIZE_MB=100
|
||||
|
||||
# Performance Tuning
|
||||
- MEMORY_LIMIT_MB=1024
|
||||
- CPU_PRIORITY=normal
|
||||
- ENABLE_COMPRESSION=true
|
||||
|
||||
volumes:
|
||||
# Document storage
|
||||
- ./data/uploads:/app/uploads
|
||||
|
||||
# Watch folder - mount your network drives here
|
||||
- /mnt/nfs/documents:/app/watch
|
||||
# or SMB: - /mnt/smb/shared:/app/watch
|
||||
# or S3: - /mnt/s3/bucket:/app/watch
|
||||
|
||||
depends_on:
|
||||
- postgres
|
||||
restart: unless-stopped
|
||||
|
||||
# Resource limits for production
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 2G
|
||||
cpus: '2.0'
|
||||
reservations:
|
||||
memory: 512M
|
||||
cpus: '0.5'
|
||||
|
||||
postgres:
|
||||
image: postgres:15
|
||||
environment:
|
||||
- POSTGRES_USER=readur
|
||||
- POSTGRES_PASSWORD=${DB_PASSWORD}
|
||||
- POSTGRES_DB=readur
|
||||
- POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8
|
||||
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
- ./postgres-config:/etc/postgresql/conf.d:ro
|
||||
|
||||
# PostgreSQL optimization for document search
|
||||
command: >
|
||||
postgres
|
||||
-c shared_buffers=256MB
|
||||
-c effective_cache_size=1GB
|
||||
-c max_connections=100
|
||||
-c default_text_search_config=pg_catalog.english
|
||||
|
||||
restart: unless-stopped
|
||||
|
||||
# Don't expose port in production
|
||||
# ports:
|
||||
# - "5433:5432"
|
||||
|
||||
volumes:
|
||||
postgres_data:
|
||||
driver: local
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
#### Port Configuration
|
||||
|
||||
Readur supports flexible port configuration:
|
||||
|
||||
```bash
|
||||
# Method 1: Specify full server address
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
|
||||
# Method 2: Use separate host and port (recommended)
|
||||
SERVER_HOST=0.0.0.0
|
||||
SERVER_PORT=8000
|
||||
|
||||
# For development: Configure frontend port
|
||||
CLIENT_PORT=5173
|
||||
BACKEND_PORT=8000
|
||||
```
|
||||
|
||||
#### Security Configuration
|
||||
|
||||
Create a `.env` file for your secrets:
|
||||
|
||||
```bash
|
||||
# Generate secure secrets
|
||||
JWT_SECRET=$(openssl rand -base64 64)
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
|
||||
# Save to .env file
|
||||
cat > .env << EOF
|
||||
JWT_SECRET=${JWT_SECRET}
|
||||
DB_PASSWORD=${DB_PASSWORD}
|
||||
EOF
|
||||
```
|
||||
|
||||
Deploy with:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml --env-file .env up -d
|
||||
```
|
||||
|
||||
### Network Filesystem Mounts
|
||||
|
||||
#### NFS Mounts
|
||||
```bash
|
||||
# Mount NFS share
|
||||
sudo mount -t nfs 192.168.1.100:/documents /mnt/nfs/documents
|
||||
|
||||
# Add to docker-compose.yml
|
||||
volumes:
|
||||
- /mnt/nfs/documents:/app/watch
|
||||
environment:
|
||||
- WATCH_INTERVAL_SECONDS=60
|
||||
- FILE_STABILITY_CHECK_MS=1000
|
||||
- FORCE_POLLING_WATCH=1
|
||||
```
|
||||
|
||||
#### SMB/CIFS Mounts
|
||||
```bash
|
||||
# Mount SMB share
|
||||
sudo mount -t cifs //server/share /mnt/smb/shared -o username=user,password=pass
|
||||
|
||||
# Docker volume configuration
|
||||
volumes:
|
||||
- /mnt/smb/shared:/app/watch
|
||||
environment:
|
||||
- WATCH_INTERVAL_SECONDS=30
|
||||
- FILE_STABILITY_CHECK_MS=2000
|
||||
```
|
||||
|
||||
#### S3 Mounts (using s3fs)
|
||||
```bash
|
||||
# Mount S3 bucket
|
||||
s3fs mybucket /mnt/s3/bucket -o passwd_file=~/.passwd-s3fs
|
||||
|
||||
# Docker configuration for S3
|
||||
volumes:
|
||||
- /mnt/s3/bucket:/app/watch
|
||||
environment:
|
||||
- WATCH_INTERVAL_SECONDS=120
|
||||
- FILE_STABILITY_CHECK_MS=5000
|
||||
- FORCE_POLLING_WATCH=1
|
||||
```
|
||||
|
||||
### SSL/HTTPS Setup
|
||||
|
||||
Use a reverse proxy like Nginx or Traefik:
|
||||
|
||||
#### Nginx Configuration
|
||||
```nginx
|
||||
server {
|
||||
listen 443 ssl http2;
|
||||
server_name readur.yourdomain.com;
|
||||
|
||||
ssl_certificate /path/to/cert.pem;
|
||||
ssl_certificate_key /path/to/key.pem;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:8000;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# For file uploads
|
||||
client_max_body_size 100M;
|
||||
proxy_read_timeout 300s;
|
||||
proxy_send_timeout 300s;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Traefik Configuration
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.routers.readur.rule=Host(`readur.yourdomain.com`)"
|
||||
- "traefik.http.routers.readur.tls=true"
|
||||
- "traefik.http.routers.readur.tls.certresolver=letsencrypt"
|
||||
```
|
||||
|
||||
> 📘 **For detailed reverse proxy configurations** including Apache, Caddy, custom ports, load balancing, and advanced scenarios, see [REVERSE_PROXY.md](./REVERSE_PROXY.md).
|
||||
|
||||
### Health Checks
|
||||
|
||||
Add health checks to your Docker configuration:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
```
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# backup.sh - Automated backup script
|
||||
|
||||
# Backup database
|
||||
docker exec readur-postgres-1 pg_dump -U readur readur | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
|
||||
|
||||
# Backup uploaded files
|
||||
tar -czf uploads_backup_$(date +%Y%m%d_%H%M%S).tar.gz -C ./data uploads/
|
||||
|
||||
# Clean old backups (keep 30 days)
|
||||
find . -name "backup_*.sql.gz" -mtime +30 -delete
|
||||
find . -name "uploads_backup_*.tar.gz" -mtime +30 -delete
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
Monitor your deployment with Docker stats:
|
||||
|
||||
```bash
|
||||
# Real-time resource usage
|
||||
docker stats
|
||||
|
||||
# Container logs
|
||||
docker compose logs -f readur
|
||||
|
||||
# Watch folder activity
|
||||
docker compose logs -f readur | grep watcher
|
||||
```
|
||||
### Advanced Topics
|
||||
- [🔍 OCR Optimization](docs/dev/OCR_OPTIMIZATION_GUIDE.md) - Improve OCR performance
|
||||
- [🗄️ Database Best Practices](docs/dev/DATABASE_GUARDRAILS.md) - Concurrency and safety
|
||||
- [📊 Queue Architecture](docs/dev/QUEUE_IMPROVEMENTS.md) - Background job processing
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
|
|
@ -327,495 +71,24 @@ docker compose logs -f readur | grep watcher
|
|||
|
||||
## 📋 System Requirements
|
||||
|
||||
### Minimum Requirements
|
||||
- **CPU**: 2 cores
|
||||
- **RAM**: 2GB
|
||||
- **Storage**: 10GB free space
|
||||
- **OS**: Linux, macOS, or Windows with Docker
|
||||
### Minimum
|
||||
- 2 CPU cores, 2GB RAM, 10GB storage
|
||||
- Docker or manual installation prerequisites
|
||||
|
||||
### Recommended for Production
|
||||
- **CPU**: 4+ cores
|
||||
- **RAM**: 4GB+
|
||||
- **Storage**: 50GB+ SSD
|
||||
- **Network**: Stable internet connection for OCR processing
|
||||
|
||||
## 🛠️ Manual Installation
|
||||
|
||||
For development or custom deployments without Docker:
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Install these dependencies on your system:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y \
|
||||
tesseract-ocr tesseract-ocr-eng \
|
||||
libtesseract-dev libleptonica-dev \
|
||||
postgresql postgresql-contrib \
|
||||
pkg-config libclang-dev
|
||||
|
||||
# macOS (requires Homebrew)
|
||||
brew install tesseract leptonica postgresql rust nodejs npm
|
||||
|
||||
# Install Rust (if not already installed)
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||
```
|
||||
|
||||
### Backend Setup
|
||||
|
||||
1. **Configure Database**:
|
||||
```bash
|
||||
# Create database and user
|
||||
sudo -u postgres psql
|
||||
CREATE DATABASE readur;
|
||||
CREATE USER readur_user WITH ENCRYPTED PASSWORD 'your_password';
|
||||
GRANT ALL PRIVILEGES ON DATABASE readur TO readur_user;
|
||||
\q
|
||||
```
|
||||
|
||||
2. **Environment Configuration**:
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.example .env
|
||||
|
||||
# Edit configuration
|
||||
nano .env
|
||||
```
|
||||
|
||||
Required environment variables:
|
||||
```env
|
||||
DATABASE_URL=postgresql://readur_user:your_password@localhost/readur
|
||||
JWT_SECRET=your-super-secret-jwt-key-change-this
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
UPLOAD_PATH=./uploads
|
||||
WATCH_FOLDER=./watch
|
||||
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,gif,bmp,tiff,txt,rtf,doc,docx
|
||||
```
|
||||
|
||||
3. **Build and Run Backend**:
|
||||
```bash
|
||||
# Install dependencies and run
|
||||
cargo build --release
|
||||
cargo run
|
||||
```
|
||||
|
||||
### Frontend Setup
|
||||
|
||||
1. **Install Dependencies**:
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
```
|
||||
|
||||
2. **Development Mode**:
|
||||
```bash
|
||||
npm run dev
|
||||
# Frontend available at http://localhost:5173
|
||||
```
|
||||
|
||||
3. **Production Build**:
|
||||
```bash
|
||||
npm run build
|
||||
# Built files in frontend/dist/
|
||||
```
|
||||
|
||||
## 📖 User Guide
|
||||
|
||||
### Getting Started
|
||||
|
||||
1. **First Login**: Use the default admin credentials to access the system
|
||||
2. **Upload Documents**: Drag and drop files or use the upload button
|
||||
3. **Wait for Processing**: OCR processing happens automatically in the background
|
||||
4. **Search and Organize**: Use the powerful search features to find your documents
|
||||
|
||||
### Supported File Types
|
||||
|
||||
| Type | Extensions | OCR Support | Notes |
|
||||
|------|-----------|-------------|-------|
|
||||
| **PDF** | `.pdf` | ✅ | Text extraction + OCR for scanned pages |
|
||||
| **Images** | `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | ✅ | Full OCR text extraction |
|
||||
| **Text** | `.txt`, `.rtf` | ❌ | Direct text indexing |
|
||||
| **Office** | `.doc`, `.docx` | ⚠️ | Limited support |
|
||||
|
||||
### Using the Interface
|
||||
|
||||
#### Dashboard
|
||||
- **Document Statistics**: Total documents, storage usage, OCR status
|
||||
- **Recent Activity**: Latest uploads and processing status
|
||||
- **Quick Actions**: Fast access to upload and search
|
||||
|
||||
#### Document Management
|
||||
- **List/Grid View**: Toggle between different viewing modes
|
||||
- **Sorting**: Sort by date, name, size, or file type
|
||||
- **Filtering**: Filter by tags, file types, and OCR status
|
||||
- **Bulk Actions**: Select multiple documents for batch operations
|
||||
|
||||
#### Advanced Search
|
||||
- **Full-text Search**: Search within document content
|
||||
- **Metadata Filters**: Filter by upload date, file size, type
|
||||
- **Tag System**: Organize documents with custom tags
|
||||
- **OCR Status**: Find processed vs. pending documents
|
||||
|
||||
#### Folder Watching
|
||||
- **Non-destructive**: Unlike paperless-ngx, source files remain untouched
|
||||
- **Automatic Processing**: New files are detected and processed automatically
|
||||
- **Configurable**: Set custom watch directories
|
||||
|
||||
### Tips for Best Results
|
||||
|
||||
1. **OCR Quality**: Higher resolution images (300+ DPI) produce better OCR results
|
||||
2. **File Organization**: Use consistent naming conventions for easier searching
|
||||
3. **Regular Backups**: Backup both database and file storage regularly
|
||||
4. **Performance**: For large document collections, consider increasing server resources
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
All application settings can be configured via environment variables:
|
||||
|
||||
#### Core Configuration
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `DATABASE_URL` | `postgresql://readur:readur@localhost/readur` | PostgreSQL connection string |
|
||||
| `JWT_SECRET` | `your-secret-key` | Secret key for JWT tokens ⚠️ **Change in production!** |
|
||||
| `SERVER_ADDRESS` | `0.0.0.0:8000` | Server bind address and port |
|
||||
|
||||
#### File Storage & Upload
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `UPLOAD_PATH` | `./uploads` | Document storage directory |
|
||||
| `ALLOWED_FILE_TYPES` | `pdf,txt,doc,docx,png,jpg,jpeg` | Comma-separated allowed file extensions |
|
||||
|
||||
#### Watch Folder Configuration
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `WATCH_FOLDER` | `./watch` | Directory to monitor for new files |
|
||||
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems (seconds) |
|
||||
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file write completion (milliseconds) |
|
||||
| `MAX_FILE_AGE_HOURS` | _(none)_ | Skip files older than this many hours |
|
||||
| `FORCE_POLLING_WATCH` | _(none)_ | Force polling mode even for local filesystems |
|
||||
|
||||
#### OCR & Processing Settings
|
||||
*Note: These settings can also be configured per-user via the web interface*
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `OCR_LANGUAGE` | `eng` | OCR language code (eng, fra, deu, spa, etc.) |
|
||||
| `CONCURRENT_OCR_JOBS` | `4` | Maximum parallel OCR processes |
|
||||
| `OCR_TIMEOUT_SECONDS` | `300` | OCR processing timeout per file |
|
||||
| `MAX_FILE_SIZE_MB` | `50` | Maximum file size for processing |
|
||||
| `AUTO_ROTATE_IMAGES` | `true` | Automatically rotate images for better OCR |
|
||||
| `ENABLE_IMAGE_PREPROCESSING` | `true` | Apply image enhancement before OCR |
|
||||
|
||||
#### Search & Performance
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `SEARCH_RESULTS_PER_PAGE` | `25` | Default number of search results per page |
|
||||
| `SEARCH_SNIPPET_LENGTH` | `200` | Length of text snippets in search results |
|
||||
| `FUZZY_SEARCH_THRESHOLD` | `0.8` | Similarity threshold for fuzzy search (0.0-1.0) |
|
||||
| `MEMORY_LIMIT_MB` | `512` | Memory limit for OCR processes |
|
||||
| `CPU_PRIORITY` | `normal` | CPU priority: `low`, `normal`, `high` |
|
||||
|
||||
#### Data Management
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `RETENTION_DAYS` | _(none)_ | Auto-delete documents after N days |
|
||||
| `ENABLE_AUTO_CLEANUP` | `false` | Enable automatic cleanup of old documents |
|
||||
| `ENABLE_COMPRESSION` | `false` | Compress stored documents to save space |
|
||||
| `ENABLE_BACKGROUND_OCR` | `true` | Process OCR in background queue |
|
||||
|
||||
### Example Production Configuration
|
||||
|
||||
```env
|
||||
# Core settings
|
||||
DATABASE_URL=postgresql://readur:secure_password@postgres:5432/readur
|
||||
JWT_SECRET=your-very-long-random-secret-key-generated-with-openssl
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
|
||||
# File handling
|
||||
UPLOAD_PATH=/app/uploads
|
||||
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,rtf,doc,docx
|
||||
|
||||
# Watch folder for NFS mount
|
||||
WATCH_FOLDER=/mnt/nfs/documents
|
||||
WATCH_INTERVAL_SECONDS=60
|
||||
FILE_STABILITY_CHECK_MS=1000
|
||||
MAX_FILE_AGE_HOURS=168
|
||||
FORCE_POLLING_WATCH=1
|
||||
|
||||
# OCR optimization
|
||||
OCR_LANGUAGE=eng
|
||||
CONCURRENT_OCR_JOBS=8
|
||||
OCR_TIMEOUT_SECONDS=600
|
||||
MAX_FILE_SIZE_MB=200
|
||||
AUTO_ROTATE_IMAGES=true
|
||||
ENABLE_IMAGE_PREPROCESSING=true
|
||||
|
||||
# Performance tuning
|
||||
MEMORY_LIMIT_MB=2048
|
||||
CPU_PRIORITY=high
|
||||
ENABLE_COMPRESSION=true
|
||||
ENABLE_BACKGROUND_OCR=true
|
||||
|
||||
# Search optimization
|
||||
SEARCH_RESULTS_PER_PAGE=50
|
||||
SEARCH_SNIPPET_LENGTH=300
|
||||
FUZZY_SEARCH_THRESHOLD=0.7
|
||||
|
||||
# Data management
|
||||
RETENTION_DAYS=2555 # 7 years
|
||||
ENABLE_AUTO_CLEANUP=true
|
||||
```
|
||||
|
||||
### Runtime Settings vs Environment Variables
|
||||
|
||||
Some settings can be configured in two ways:
|
||||
|
||||
1. **Environment Variables**: Set at container startup, affects the entire application
|
||||
2. **User Settings**: Configured per-user via the web interface, stored in database
|
||||
|
||||
**Environment variables take precedence** and provide system-wide defaults. User settings override these defaults for individual users where applicable.
|
||||
|
||||
Settings configurable via web interface:
|
||||
- OCR language preferences
|
||||
- Search result limits
|
||||
- File type restrictions
|
||||
- OCR processing options
|
||||
- Data retention policies
|
||||
|
||||
### Configuration Priority
|
||||
|
||||
Settings are applied in this order (later values override earlier ones):
|
||||
|
||||
1. **Application defaults** (built into the code)
|
||||
2. **Environment variables** (system-wide configuration)
|
||||
3. **User settings** (per-user database settings via web interface)
|
||||
|
||||
This allows for flexible deployment where system administrators can set defaults while users can customize their experience.
|
||||
|
||||
### Quick Reference - Essential Variables
|
||||
|
||||
For a minimal production deployment, configure these essential variables:
|
||||
|
||||
```bash
|
||||
# Security (REQUIRED)
|
||||
JWT_SECRET=your-secure-random-key-here
|
||||
DATABASE_URL=postgresql://user:password@host:port/database
|
||||
|
||||
# File Storage
|
||||
UPLOAD_PATH=/app/uploads
|
||||
WATCH_FOLDER=/path/to/mounted/folder
|
||||
|
||||
# Watch Folder (for network mounts)
|
||||
WATCH_INTERVAL_SECONDS=60
|
||||
FORCE_POLLING_WATCH=1
|
||||
|
||||
# Performance
|
||||
CONCURRENT_OCR_JOBS=4
|
||||
MAX_FILE_SIZE_MB=100
|
||||
```
|
||||
|
||||
### Database Tuning
|
||||
|
||||
For better search performance with large document collections:
|
||||
|
||||
```sql
|
||||
-- Increase shared_buffers for better caching
|
||||
ALTER SYSTEM SET shared_buffers = '256MB';
|
||||
|
||||
-- Optimize for full-text search
|
||||
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
|
||||
|
||||
-- Restart PostgreSQL after changes
|
||||
```
|
||||
|
||||
## 🔌 API Reference
|
||||
|
||||
### Authentication Endpoints
|
||||
|
||||
```bash
|
||||
# Register new user
|
||||
POST /api/auth/register
|
||||
Content-Type: application/json
|
||||
{
|
||||
"username": "john_doe",
|
||||
"email": "john@example.com",
|
||||
"password": "secure_password"
|
||||
}
|
||||
|
||||
# Login
|
||||
POST /api/auth/login
|
||||
Content-Type: application/json
|
||||
{
|
||||
"username": "john_doe",
|
||||
"password": "secure_password"
|
||||
}
|
||||
|
||||
# Get current user
|
||||
GET /api/auth/me
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### Document Management
|
||||
|
||||
```bash
|
||||
# Upload document
|
||||
POST /api/documents
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: multipart/form-data
|
||||
file: <binary_file_data>
|
||||
|
||||
# List documents
|
||||
GET /api/documents?limit=50&offset=0
|
||||
Authorization: Bearer <jwt_token>
|
||||
|
||||
# Download document
|
||||
GET /api/documents/{id}/download
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### Search
|
||||
|
||||
```bash
|
||||
# Search documents
|
||||
GET /api/search?query=contract&limit=20
|
||||
Authorization: Bearer <jwt_token>
|
||||
|
||||
# Advanced search with filters
|
||||
GET /api/search?query=invoice&mime_types=application/pdf&tags=important
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
# Backend tests
|
||||
cargo test
|
||||
|
||||
# Frontend tests
|
||||
cd frontend && npm test
|
||||
|
||||
# Integration tests with Docker
|
||||
docker compose -f docker-compose.test.yml up --build
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
```bash
|
||||
# Install cargo-tarpaulin for coverage
|
||||
cargo install cargo-tarpaulin
|
||||
|
||||
# Generate coverage report
|
||||
cargo tarpaulin --out Html
|
||||
```
|
||||
|
||||
## 🔒 Security Considerations
|
||||
|
||||
### Production Deployment
|
||||
|
||||
1. **Change Default Credentials**: Update admin password immediately
|
||||
2. **Use Strong JWT Secret**: Generate a secure random key
|
||||
3. **Enable HTTPS**: Use a reverse proxy with SSL/TLS
|
||||
4. **Database Security**: Use strong passwords and restrict network access
|
||||
5. **File Permissions**: Ensure proper file system permissions
|
||||
6. **Regular Updates**: Keep dependencies and base images updated
|
||||
|
||||
### Recommended Production Setup
|
||||
|
||||
```bash
|
||||
# Use environment-specific secrets
|
||||
JWT_SECRET=$(openssl rand -base64 64)
|
||||
|
||||
# Restrict database access
|
||||
# Only allow connections from application container
|
||||
|
||||
# Use read-only file system where possible
|
||||
# Mount uploads and watch folders as separate volumes
|
||||
```
|
||||
|
||||
## 🚀 Deployment Options
|
||||
|
||||
### Docker Swarm
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
readur:
|
||||
image: readur:latest
|
||||
deploy:
|
||||
replicas: 2
|
||||
restart_policy:
|
||||
condition: on-failure
|
||||
networks:
|
||||
- readur-network
|
||||
secrets:
|
||||
- jwt_secret
|
||||
- db_password
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: readur
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: readur
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: readur
|
||||
image: readur:latest
|
||||
env:
|
||||
- name: JWT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: readur-secrets
|
||||
key: jwt-secret
|
||||
```
|
||||
|
||||
### Cloud Platforms
|
||||
|
||||
- **AWS**: Use ECS with RDS PostgreSQL
|
||||
- **Google Cloud**: Deploy to Cloud Run with Cloud SQL
|
||||
- **Azure**: Use Container Instances with Azure Database
|
||||
- **DigitalOcean**: App Platform with Managed Database
|
||||
- 4+ CPU cores, 4GB+ RAM, 50GB+ SSD
|
||||
- See [deployment guide](docs/deployment.md) for details
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
|
||||
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) and [Development Setup](docs/dev/development.md) for details.
|
||||
|
||||
### Development Setup
|
||||
## 🔒 Security
|
||||
|
||||
```bash
|
||||
# Fork and clone the repository
|
||||
git clone https://github.com/yourusername/readur.git
|
||||
cd readur
|
||||
|
||||
# Create a feature branch
|
||||
git checkout -b feature/amazing-feature
|
||||
|
||||
# Make your changes and test
|
||||
cargo test
|
||||
cd frontend && npm test
|
||||
|
||||
# Submit a pull request
|
||||
```
|
||||
|
||||
### Code Style
|
||||
|
||||
- **Rust**: Follow `rustfmt` and `clippy` recommendations
|
||||
- **Frontend**: Use Prettier and ESLint configurations
|
||||
- **Commits**: Use conventional commit format
|
||||
- Change default credentials immediately
|
||||
- Use HTTPS in production
|
||||
- Regular security updates
|
||||
- See [deployment guide](docs/deployment.md#security-considerations) for security best practices
|
||||
|
||||
## 📝 License
|
||||
|
||||
|
|
@ -830,9 +103,9 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
|||
|
||||
## 📞 Support
|
||||
|
||||
- **Documentation**: Check this README and inline code comments
|
||||
- **Issues**: Report bugs and request features on GitHub Issues
|
||||
- **Discussions**: Join community discussions on GitHub Discussions
|
||||
- **Documentation**: Start with the [User Guide](docs/user-guide.md)
|
||||
- **Issues**: Report bugs on [GitHub Issues](https://github.com/perfectra1n/readur/issues)
|
||||
- **Discussions**: Join our [GitHub Discussions](https://github.com/perfectra1n/readur/discussions)
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -1,31 +1,68 @@
|
|||
# Watch Folder Documentation
|
||||
# Watch Folder Guide
|
||||
|
||||
The watch folder feature automatically monitors a directory for new OCR-able files and processes them without deleting the original files. This is perfect for scenarios where files are mounted from various filesystem types including NFS, SMB, S3, and local storage.
|
||||
The watch folder feature automatically monitors a directory for new files and processes them with OCR, making them searchable in Readur. Your original files are never modified or deleted - Readur simply copies and processes them while leaving the originals untouched.
|
||||
|
||||
## Features
|
||||
## What is Watch Folder?
|
||||
|
||||
### 🔄 Cross-Filesystem Compatibility
|
||||
- **Automatic Detection**: Detects filesystem type and chooses optimal watching strategy
|
||||
- **Local Filesystems**: Uses efficient inotify-based watching for ext4, NTFS, APFS, etc.
|
||||
- **Network Filesystems**: Uses polling-based watching for NFS, SMB/CIFS, S3 mounts
|
||||
- **Hybrid Fallback**: Gracefully falls back to polling if inotify fails
|
||||
Watch folder allows you to:
|
||||
- **Drop files anywhere** - Point Readur to any folder (local, network drive, cloud mount)
|
||||
- **Automatic processing** - New files are automatically detected and processed
|
||||
- **Non-destructive** - Original files remain exactly where you put them
|
||||
- **Background operation** - Processing happens in the background while you continue working
|
||||
|
||||
### 📁 Smart File Processing
|
||||
- **OCR-able File Detection**: Only processes supported file types (PDF, images, text, Word docs)
|
||||
- **Duplicate Prevention**: Checks for existing files with same name and size
|
||||
- **File Stability**: Waits for files to finish being written before processing
|
||||
- **System File Exclusion**: Skips hidden files, temporary files, and system directories
|
||||
Perfect for scenarios where you want to automatically process files from:
|
||||
- Network drives (NFS, SMB shares)
|
||||
- Cloud storage mounts (Google Drive, Dropbox, OneDrive)
|
||||
- Local folders where you save scanned documents
|
||||
- Shared team folders
|
||||
|
||||
### ⚙️ Configuration Options
|
||||
## How It Works
|
||||
|
||||
| Environment Variable | Default | Description |
|
||||
|---------------------|---------|-------------|
|
||||
| `WATCH_FOLDER` | `./watch` | Path to the folder to monitor |
|
||||
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems |
|
||||
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file stability |
|
||||
| `MAX_FILE_AGE_HOURS` | `none` | Skip files older than specified hours |
|
||||
| `ALLOWED_FILE_TYPES` | `pdf,png,jpg,jpeg,tiff,bmp,txt,doc,docx` | Allowed file extensions |
|
||||
| `FORCE_POLLING_WATCH` | `unset` | Force polling mode even for local filesystems |
|
||||
1. **Point Readur to your folder** - Set the `WATCH_FOLDER` path to any directory you want monitored
|
||||
2. **Drop files** - Add documents to that folder (PDFs, images, text files, Word docs)
|
||||
3. **Automatic detection** - Readur notices new files within seconds (local) or minutes (network)
|
||||
4. **OCR processing** - Files are automatically processed to extract searchable text
|
||||
5. **Search and find** - Your documents become searchable in the Readur web interface
|
||||
|
||||
## Key Features
|
||||
|
||||
✅ **Works with any storage type** - Local drives, network shares, cloud mounts
|
||||
✅ **Smart processing** - Only processes supported file types
|
||||
✅ **Duplicate prevention** - Won't process the same file twice
|
||||
✅ **Safe operation** - Never modifies or deletes your original files
|
||||
✅ **Background processing** - Doesn't interrupt your workflow
|
||||
|
||||
## Quick Setup
|
||||
|
||||
### Basic Setup (Docker Compose)
|
||||
|
||||
1. **Edit your docker-compose.yml**:
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
image: readur:latest
|
||||
volumes:
|
||||
# Mount your folder to the watch directory
|
||||
- /path/to/your/documents:/app/watch
|
||||
environment:
|
||||
- WATCH_FOLDER=/app/watch
|
||||
```
|
||||
|
||||
2. **Start Readur**:
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
3. **Start dropping files** into `/path/to/your/documents` - they'll be automatically processed!
|
||||
|
||||
### Configuration Options
|
||||
|
||||
| Setting | Default | What it does |
|
||||
|---------|---------|-------------|
|
||||
| `WATCH_FOLDER` | `./watch` | Which folder to monitor |
|
||||
| `WATCH_INTERVAL_SECONDS` | `30` | How often to check for new files (network drives) |
|
||||
| `MAX_FILE_AGE_HOURS` | _(none)_ | Ignore files older than this |
|
||||
| `ALLOWED_FILE_TYPES` | `pdf,png,jpg,jpeg,tiff,bmp,txt,doc,docx` | Which file types to process |
|
||||
|
||||
## Usage
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,618 @@
|
|||
# API Reference
|
||||
|
||||
Readur provides a comprehensive REST API for integrating with external systems and building custom workflows.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Base URL](#base-url)
|
||||
- [Authentication](#authentication)
|
||||
- [Error Handling](#error-handling)
|
||||
- [Rate Limiting](#rate-limiting)
|
||||
- [Endpoints](#endpoints)
|
||||
- [Authentication](#authentication-endpoints)
|
||||
- [Documents](#document-endpoints)
|
||||
- [Search](#search-endpoints)
|
||||
- [OCR Queue](#ocr-queue-endpoints)
|
||||
- [Settings](#settings-endpoints)
|
||||
- [Sources](#sources-endpoints)
|
||||
- [Labels](#labels-endpoints)
|
||||
- [Users](#user-endpoints)
|
||||
- [WebSocket API](#websocket-api)
|
||||
- [Examples](#examples)
|
||||
|
||||
## Base URL
|
||||
|
||||
```
|
||||
http://localhost:8000/api
|
||||
```
|
||||
|
||||
For production deployments, replace with your configured domain and ensure HTTPS is used.
|
||||
|
||||
## Authentication
|
||||
|
||||
Readur uses JWT (JSON Web Token) authentication. Include the token in the Authorization header:
|
||||
|
||||
```
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### Obtaining a Token
|
||||
|
||||
```bash
|
||||
POST /api/auth/login
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"username": "admin",
|
||||
"password": "your_password"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"user": {
|
||||
"id": 1,
|
||||
"username": "admin",
|
||||
"email": "admin@example.com",
|
||||
"role": "admin"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
All API errors follow a consistent format:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"code": "VALIDATION_ERROR",
|
||||
"message": "Invalid request parameters",
|
||||
"details": {
|
||||
"field": "email",
|
||||
"reason": "Invalid email format"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Common HTTP status codes:
|
||||
- `200` - Success
|
||||
- `201` - Created
|
||||
- `400` - Bad Request
|
||||
- `401` - Unauthorized
|
||||
- `403` - Forbidden
|
||||
- `404` - Not Found
|
||||
- `422` - Validation Error
|
||||
- `500` - Internal Server Error
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
API requests are rate-limited to prevent abuse:
|
||||
- Authenticated users: 1000 requests per hour
|
||||
- Unauthenticated users: 100 requests per hour
|
||||
|
||||
Rate limit headers:
|
||||
```
|
||||
X-RateLimit-Limit: 1000
|
||||
X-RateLimit-Remaining: 999
|
||||
X-RateLimit-Reset: 1640995200
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
### Authentication Endpoints
|
||||
|
||||
#### Register New User
|
||||
|
||||
```bash
|
||||
POST /api/auth/register
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"username": "john_doe",
|
||||
"email": "john@example.com",
|
||||
"password": "secure_password"
|
||||
}
|
||||
```
|
||||
|
||||
#### Login
|
||||
|
||||
```bash
|
||||
POST /api/auth/login
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"username": "john_doe",
|
||||
"password": "secure_password"
|
||||
}
|
||||
```
|
||||
|
||||
#### Get Current User
|
||||
|
||||
```bash
|
||||
GET /api/auth/me
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Logout
|
||||
|
||||
```bash
|
||||
POST /api/auth/logout
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### Document Endpoints
|
||||
|
||||
#### Upload Document
|
||||
|
||||
```bash
|
||||
POST /api/documents
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
file: <binary_file_data>
|
||||
tags: ["invoice", "2024"] # Optional
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"filename": "invoice_2024.pdf",
|
||||
"mime_type": "application/pdf",
|
||||
"size": 1048576,
|
||||
"uploaded_at": "2024-01-01T00:00:00Z",
|
||||
"ocr_status": "pending"
|
||||
}
|
||||
```
|
||||
|
||||
#### List Documents
|
||||
|
||||
```bash
|
||||
GET /api/documents?limit=50&offset=0&sort=-uploaded_at
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
Query parameters:
|
||||
- `limit` - Number of results (default: 50, max: 100)
|
||||
- `offset` - Pagination offset
|
||||
- `sort` - Sort field (prefix with `-` for descending)
|
||||
- `mime_type` - Filter by MIME type
|
||||
- `ocr_status` - Filter by OCR status
|
||||
- `tag` - Filter by tag
|
||||
|
||||
#### Get Document Details
|
||||
|
||||
```bash
|
||||
GET /api/documents/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Download Document
|
||||
|
||||
```bash
|
||||
GET /api/documents/{id}/download
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Delete Document
|
||||
|
||||
```bash
|
||||
DELETE /api/documents/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Update Document
|
||||
|
||||
```bash
|
||||
PATCH /api/documents/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"tags": ["invoice", "paid", "2024"]
|
||||
}
|
||||
```
|
||||
|
||||
### Search Endpoints
|
||||
|
||||
#### Search Documents
|
||||
|
||||
```bash
|
||||
GET /api/search?query=invoice&limit=20
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
Query parameters:
|
||||
- `query` - Search query (required)
|
||||
- `limit` - Number of results
|
||||
- `offset` - Pagination offset
|
||||
- `mime_types` - Comma-separated MIME types
|
||||
- `tags` - Comma-separated tags
|
||||
- `date_from` - Start date (ISO 8601)
|
||||
- `date_to` - End date (ISO 8601)
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"filename": "invoice_2024.pdf",
|
||||
"snippet": "...invoice for services rendered in Q1 2024...",
|
||||
"score": 0.95,
|
||||
"highlights": ["invoice", "2024"]
|
||||
}
|
||||
],
|
||||
"total": 42,
|
||||
"limit": 20,
|
||||
"offset": 0
|
||||
}
|
||||
```
|
||||
|
||||
#### Advanced Search
|
||||
|
||||
```bash
|
||||
POST /api/search/advanced
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"query": "invoice",
|
||||
"filters": {
|
||||
"mime_types": ["application/pdf"],
|
||||
"tags": ["unpaid"],
|
||||
"date_range": {
|
||||
"from": "2024-01-01",
|
||||
"to": "2024-12-31"
|
||||
},
|
||||
"file_size": {
|
||||
"min": 1024,
|
||||
"max": 10485760
|
||||
}
|
||||
},
|
||||
"options": {
|
||||
"fuzzy": true,
|
||||
"snippet_length": 200
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### OCR Queue Endpoints
|
||||
|
||||
#### Get Queue Status
|
||||
|
||||
```bash
|
||||
GET /api/queue/status
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"pending": 15,
|
||||
"processing": 3,
|
||||
"completed_today": 127,
|
||||
"failed_today": 2,
|
||||
"average_processing_time": 4.5
|
||||
}
|
||||
```
|
||||
|
||||
#### Reprocess Document
|
||||
|
||||
```bash
|
||||
POST /api/documents/{id}/reprocess
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Get Failed OCR Jobs
|
||||
|
||||
```bash
|
||||
GET /api/queue/failed
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### Settings Endpoints
|
||||
|
||||
#### Get User Settings
|
||||
|
||||
```bash
|
||||
GET /api/settings
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Update User Settings
|
||||
|
||||
```bash
|
||||
PUT /api/settings
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"ocr_language": "eng",
|
||||
"search_results_per_page": 50,
|
||||
"enable_notifications": true
|
||||
}
|
||||
```
|
||||
|
||||
### Sources Endpoints
|
||||
|
||||
#### List Sources
|
||||
|
||||
```bash
|
||||
GET /api/sources
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Create Source
|
||||
|
||||
```bash
|
||||
POST /api/sources
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "Network Drive",
|
||||
"type": "local_folder",
|
||||
"config": {
|
||||
"path": "/mnt/network/documents",
|
||||
"scan_interval": 3600
|
||||
},
|
||||
"enabled": true
|
||||
}
|
||||
```
|
||||
|
||||
#### Update Source
|
||||
|
||||
```bash
|
||||
PUT /api/sources/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"enabled": false
|
||||
}
|
||||
```
|
||||
|
||||
#### Delete Source
|
||||
|
||||
```bash
|
||||
DELETE /api/sources/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Sync Source
|
||||
|
||||
```bash
|
||||
POST /api/sources/{id}/sync
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### Labels Endpoints
|
||||
|
||||
#### List Labels
|
||||
|
||||
```bash
|
||||
GET /api/labels
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Create Label
|
||||
|
||||
```bash
|
||||
POST /api/labels
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "Important",
|
||||
"color": "#FF0000"
|
||||
}
|
||||
```
|
||||
|
||||
#### Update Label
|
||||
|
||||
```bash
|
||||
PUT /api/labels/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "Very Important",
|
||||
"color": "#FF00FF"
|
||||
}
|
||||
```
|
||||
|
||||
#### Delete Label
|
||||
|
||||
```bash
|
||||
DELETE /api/labels/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
### User Endpoints
|
||||
|
||||
#### List Users (Admin Only)
|
||||
|
||||
```bash
|
||||
GET /api/users
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Get User
|
||||
|
||||
```bash
|
||||
GET /api/users/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
#### Update User
|
||||
|
||||
```bash
|
||||
PUT /api/users/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"email": "newemail@example.com",
|
||||
"role": "user"
|
||||
}
|
||||
```
|
||||
|
||||
#### Delete User (Admin Only)
|
||||
|
||||
```bash
|
||||
DELETE /api/users/{id}
|
||||
Authorization: Bearer <jwt_token>
|
||||
```
|
||||
|
||||
## WebSocket API
|
||||
|
||||
Connect to receive real-time updates:
|
||||
|
||||
```javascript
|
||||
const ws = new WebSocket('ws://localhost:8000/ws');
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
console.log('Event:', data);
|
||||
};
|
||||
|
||||
// Authenticate
|
||||
ws.send(JSON.stringify({
|
||||
type: 'auth',
|
||||
token: 'your_jwt_token'
|
||||
}));
|
||||
```
|
||||
|
||||
Event types:
|
||||
- `document.uploaded` - New document uploaded
|
||||
- `ocr.completed` - OCR processing completed
|
||||
- `ocr.failed` - OCR processing failed
|
||||
- `source.sync.completed` - Source sync finished
|
||||
|
||||
## Examples
|
||||
|
||||
### Python Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Configuration
|
||||
BASE_URL = "http://localhost:8000/api"
|
||||
USERNAME = "admin"
|
||||
PASSWORD = "your_password"
|
||||
|
||||
# Login
|
||||
response = requests.post(f"{BASE_URL}/auth/login", json={
|
||||
"username": USERNAME,
|
||||
"password": PASSWORD
|
||||
})
|
||||
token = response.json()["token"]
|
||||
headers = {"Authorization": f"Bearer {token}"}
|
||||
|
||||
# Upload document
|
||||
with open("document.pdf", "rb") as f:
|
||||
files = {"file": ("document.pdf", f, "application/pdf")}
|
||||
response = requests.post(
|
||||
f"{BASE_URL}/documents",
|
||||
headers=headers,
|
||||
files=files
|
||||
)
|
||||
document_id = response.json()["id"]
|
||||
|
||||
# Search documents
|
||||
response = requests.get(
|
||||
f"{BASE_URL}/search",
|
||||
headers=headers,
|
||||
params={"query": "invoice 2024"}
|
||||
)
|
||||
results = response.json()["results"]
|
||||
```
|
||||
|
||||
### JavaScript Example
|
||||
|
||||
```javascript
|
||||
// Configuration
|
||||
const BASE_URL = 'http://localhost:8000/api';
|
||||
|
||||
// Login
|
||||
async function login(username, password) {
|
||||
const response = await fetch(`${BASE_URL}/auth/login`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ username, password })
|
||||
});
|
||||
const data = await response.json();
|
||||
return data.token;
|
||||
}
|
||||
|
||||
// Upload document
|
||||
async function uploadDocument(token, file) {
|
||||
const formData = new FormData();
|
||||
formData.append('file', file);
|
||||
|
||||
const response = await fetch(`${BASE_URL}/documents`, {
|
||||
method: 'POST',
|
||||
headers: { 'Authorization': `Bearer ${token}` },
|
||||
body: formData
|
||||
});
|
||||
return response.json();
|
||||
}
|
||||
|
||||
// Search documents
|
||||
async function searchDocuments(token, query) {
|
||||
const response = await fetch(
|
||||
`${BASE_URL}/search?query=${encodeURIComponent(query)}`,
|
||||
{
|
||||
headers: { 'Authorization': `Bearer ${token}` }
|
||||
}
|
||||
);
|
||||
return response.json();
|
||||
}
|
||||
```
|
||||
|
||||
### cURL Examples
|
||||
|
||||
```bash
|
||||
# Login
|
||||
TOKEN=$(curl -s -X POST http://localhost:8000/api/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"admin","password":"your_password"}' \
|
||||
| jq -r .token)
|
||||
|
||||
# Upload document
|
||||
curl -X POST http://localhost:8000/api/documents \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-F "file=@document.pdf"
|
||||
|
||||
# Search documents
|
||||
curl -X GET "http://localhost:8000/api/search?query=invoice" \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
|
||||
# Get document
|
||||
curl -X GET http://localhost:8000/api/documents/550e8400-e29b-41d4-a716-446655440000 \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
|
||||
## OpenAPI Specification
|
||||
|
||||
The complete OpenAPI specification is available at:
|
||||
```
|
||||
GET /api/openapi.json
|
||||
```
|
||||
|
||||
You can use this with tools like Swagger UI or to generate client libraries.
|
||||
|
||||
## SDK Support
|
||||
|
||||
Official SDKs are planned for:
|
||||
- Python
|
||||
- JavaScript/TypeScript
|
||||
- Go
|
||||
- Ruby
|
||||
|
||||
Check the [GitHub repository](https://github.com/perfectra1n/readur) for the latest SDK availability.
|
||||
|
|
@ -0,0 +1,261 @@
|
|||
# Configuration Guide
|
||||
|
||||
This guide covers all configuration options available in Readur through environment variables and runtime settings.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Environment Variables](#environment-variables)
|
||||
- [Core Configuration](#core-configuration)
|
||||
- [File Storage & Upload](#file-storage--upload)
|
||||
- [Watch Folder Configuration](#watch-folder-configuration)
|
||||
- [OCR & Processing Settings](#ocr--processing-settings)
|
||||
- [Search & Performance](#search--performance)
|
||||
- [Data Management](#data-management)
|
||||
- [Port Configuration](#port-configuration)
|
||||
- [Example Configurations](#example-configurations)
|
||||
- [Configuration Priority](#configuration-priority)
|
||||
- [Runtime Settings vs Environment Variables](#runtime-settings-vs-environment-variables)
|
||||
- [Database Tuning](#database-tuning)
|
||||
|
||||
## Environment Variables
|
||||
|
||||
All application settings can be configured via environment variables:
|
||||
|
||||
### Core Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `DATABASE_URL` | `postgresql://readur:readur@localhost/readur` | PostgreSQL connection string |
|
||||
| `JWT_SECRET` | `your-secret-key` | Secret key for JWT tokens ⚠️ **Change in production!** |
|
||||
| `SERVER_ADDRESS` | `0.0.0.0:8000` | Server bind address and port |
|
||||
|
||||
### File Storage & Upload
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `UPLOAD_PATH` | `./uploads` | Document storage directory |
|
||||
| `ALLOWED_FILE_TYPES` | `pdf,txt,doc,docx,png,jpg,jpeg` | Comma-separated allowed file extensions |
|
||||
|
||||
### Watch Folder Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `WATCH_FOLDER` | `./watch` | Directory to monitor for new files |
|
||||
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems (seconds) |
|
||||
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file write completion (milliseconds) |
|
||||
| `MAX_FILE_AGE_HOURS` | _(none)_ | Skip files older than this many hours |
|
||||
| `FORCE_POLLING_WATCH` | _(none)_ | Force polling mode even for local filesystems |
|
||||
|
||||
### OCR & Processing Settings
|
||||
|
||||
*Note: These settings can also be configured per-user via the web interface*
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `OCR_LANGUAGE` | `eng` | OCR language code (eng, fra, deu, spa, etc.) |
|
||||
| `CONCURRENT_OCR_JOBS` | `4` | Maximum parallel OCR processes |
|
||||
| `OCR_TIMEOUT_SECONDS` | `300` | OCR processing timeout per file |
|
||||
| `MAX_FILE_SIZE_MB` | `50` | Maximum file size for processing |
|
||||
| `AUTO_ROTATE_IMAGES` | `true` | Automatically rotate images for better OCR |
|
||||
| `ENABLE_IMAGE_PREPROCESSING` | `true` | Apply image enhancement before OCR |
|
||||
|
||||
### Search & Performance
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `SEARCH_RESULTS_PER_PAGE` | `25` | Default number of search results per page |
|
||||
| `SEARCH_SNIPPET_LENGTH` | `200` | Length of text snippets in search results |
|
||||
| `FUZZY_SEARCH_THRESHOLD` | `0.8` | Similarity threshold for fuzzy search (0.0-1.0) |
|
||||
| `MEMORY_LIMIT_MB` | `512` | Memory limit for OCR processes |
|
||||
| `CPU_PRIORITY` | `normal` | CPU priority: `low`, `normal`, `high` |
|
||||
|
||||
### Data Management
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `RETENTION_DAYS` | _(none)_ | Auto-delete documents after N days |
|
||||
| `ENABLE_AUTO_CLEANUP` | `false` | Enable automatic cleanup of old documents |
|
||||
| `ENABLE_COMPRESSION` | `false` | Compress stored documents to save space |
|
||||
| `ENABLE_BACKGROUND_OCR` | `true` | Process OCR in background queue |
|
||||
|
||||
## Port Configuration
|
||||
|
||||
Readur supports flexible port configuration:
|
||||
|
||||
```bash
|
||||
# Method 1: Specify full server address
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
|
||||
# Method 2: Use separate host and port (recommended)
|
||||
SERVER_HOST=0.0.0.0
|
||||
SERVER_PORT=8000
|
||||
|
||||
# For development: Configure frontend port
|
||||
CLIENT_PORT=5173
|
||||
BACKEND_PORT=8000
|
||||
```
|
||||
|
||||
## Example Configurations
|
||||
|
||||
### Development Configuration
|
||||
|
||||
```env
|
||||
# Basic development setup
|
||||
DATABASE_URL=postgresql://readur:readur@localhost/readur
|
||||
JWT_SECRET=dev-secret-key-not-for-production
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
UPLOAD_PATH=./uploads
|
||||
WATCH_FOLDER=./watch
|
||||
OCR_LANGUAGE=eng
|
||||
CONCURRENT_OCR_JOBS=2
|
||||
```
|
||||
|
||||
### Production Configuration
|
||||
|
||||
```env
|
||||
# Core settings
|
||||
DATABASE_URL=postgresql://readur:secure_password@postgres:5432/readur
|
||||
JWT_SECRET=your-very-long-random-secret-key-generated-with-openssl
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
|
||||
# File handling
|
||||
UPLOAD_PATH=/app/uploads
|
||||
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,rtf,doc,docx
|
||||
|
||||
# Watch folder for NFS mount
|
||||
WATCH_FOLDER=/mnt/nfs/documents
|
||||
WATCH_INTERVAL_SECONDS=60
|
||||
FILE_STABILITY_CHECK_MS=1000
|
||||
MAX_FILE_AGE_HOURS=168
|
||||
FORCE_POLLING_WATCH=1
|
||||
|
||||
# OCR optimization
|
||||
OCR_LANGUAGE=eng
|
||||
CONCURRENT_OCR_JOBS=8
|
||||
OCR_TIMEOUT_SECONDS=600
|
||||
MAX_FILE_SIZE_MB=200
|
||||
AUTO_ROTATE_IMAGES=true
|
||||
ENABLE_IMAGE_PREPROCESSING=true
|
||||
|
||||
# Performance tuning
|
||||
MEMORY_LIMIT_MB=2048
|
||||
CPU_PRIORITY=high
|
||||
ENABLE_COMPRESSION=true
|
||||
ENABLE_BACKGROUND_OCR=true
|
||||
|
||||
# Search optimization
|
||||
SEARCH_RESULTS_PER_PAGE=50
|
||||
SEARCH_SNIPPET_LENGTH=300
|
||||
FUZZY_SEARCH_THRESHOLD=0.7
|
||||
|
||||
# Data management
|
||||
RETENTION_DAYS=2555 # 7 years
|
||||
ENABLE_AUTO_CLEANUP=true
|
||||
```
|
||||
|
||||
### Network Filesystem Configuration
|
||||
|
||||
```env
|
||||
# For NFS mounts
|
||||
WATCH_FOLDER=/mnt/nfs/documents
|
||||
WATCH_INTERVAL_SECONDS=60
|
||||
FILE_STABILITY_CHECK_MS=1000
|
||||
FORCE_POLLING_WATCH=1
|
||||
|
||||
# For SMB/CIFS mounts
|
||||
WATCH_FOLDER=/mnt/smb/shared
|
||||
WATCH_INTERVAL_SECONDS=30
|
||||
FILE_STABILITY_CHECK_MS=2000
|
||||
|
||||
# For S3 mounts (using s3fs)
|
||||
WATCH_FOLDER=/mnt/s3/bucket
|
||||
WATCH_INTERVAL_SECONDS=120
|
||||
FILE_STABILITY_CHECK_MS=5000
|
||||
FORCE_POLLING_WATCH=1
|
||||
```
|
||||
|
||||
## Configuration Priority
|
||||
|
||||
Settings are applied in this order (later values override earlier ones):
|
||||
|
||||
1. **Application defaults** (built into the code)
|
||||
2. **Environment variables** (system-wide configuration)
|
||||
3. **User settings** (per-user database settings via web interface)
|
||||
|
||||
This allows for flexible deployment where system administrators can set defaults while users can customize their experience.
|
||||
|
||||
## Runtime Settings vs Environment Variables
|
||||
|
||||
Some settings can be configured in two ways:
|
||||
|
||||
1. **Environment Variables**: Set at container startup, affects the entire application
|
||||
2. **User Settings**: Configured per-user via the web interface, stored in database
|
||||
|
||||
**Environment variables take precedence** and provide system-wide defaults. User settings override these defaults for individual users where applicable.
|
||||
|
||||
Settings configurable via web interface:
|
||||
- OCR language preferences
|
||||
- Search result limits
|
||||
- File type restrictions
|
||||
- OCR processing options
|
||||
- Data retention policies
|
||||
|
||||
## Database Tuning
|
||||
|
||||
For better search performance with large document collections:
|
||||
|
||||
```sql
|
||||
-- Increase shared_buffers for better caching
|
||||
ALTER SYSTEM SET shared_buffers = '256MB';
|
||||
|
||||
-- Optimize for full-text search
|
||||
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
|
||||
|
||||
-- Restart PostgreSQL after changes
|
||||
```
|
||||
|
||||
## Security Configuration
|
||||
|
||||
### Generating Secure Secrets
|
||||
|
||||
```bash
|
||||
# Generate secure JWT secret
|
||||
JWT_SECRET=$(openssl rand -base64 64)
|
||||
|
||||
# Generate secure database password
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
|
||||
# Save to .env file
|
||||
cat > .env << EOF
|
||||
JWT_SECRET=${JWT_SECRET}
|
||||
DB_PASSWORD=${DB_PASSWORD}
|
||||
EOF
|
||||
```
|
||||
|
||||
### Quick Reference - Essential Variables
|
||||
|
||||
For a minimal production deployment, configure these essential variables:
|
||||
|
||||
```bash
|
||||
# Security (REQUIRED)
|
||||
JWT_SECRET=your-secure-random-key-here
|
||||
DATABASE_URL=postgresql://user:password@host:port/database
|
||||
|
||||
# File Storage
|
||||
UPLOAD_PATH=/app/uploads
|
||||
WATCH_FOLDER=/path/to/mounted/folder
|
||||
|
||||
# Watch Folder (for network mounts)
|
||||
WATCH_INTERVAL_SECONDS=60
|
||||
FORCE_POLLING_WATCH=1
|
||||
|
||||
# Performance
|
||||
CONCURRENT_OCR_JOBS=4
|
||||
MAX_FILE_SIZE_MB=100
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Review [deployment options](deployment.md) for production setup
|
||||
- Learn about [folder watching](WATCH_FOLDER.md) for automatic document ingestion
|
||||
- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md) for your use case
|
||||
|
|
@ -0,0 +1,403 @@
|
|||
# Deployment Guide
|
||||
|
||||
This guide covers production deployment strategies, SSL setup, monitoring, backups, and best practices for running Readur in production.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Production Docker Compose](#production-docker-compose)
|
||||
- [Network Filesystem Mounts](#network-filesystem-mounts)
|
||||
- [NFS Mounts](#nfs-mounts)
|
||||
- [SMB/CIFS Mounts](#smbcifs-mounts)
|
||||
- [S3 Mounts](#s3-mounts)
|
||||
- [SSL/HTTPS Setup](#sslhttps-setup)
|
||||
- [Nginx Configuration](#nginx-configuration)
|
||||
- [Traefik Configuration](#traefik-configuration)
|
||||
- [Health Checks](#health-checks)
|
||||
- [Backup Strategy](#backup-strategy)
|
||||
- [Monitoring](#monitoring)
|
||||
- [Deployment Platforms](#deployment-platforms)
|
||||
- [Docker Swarm](#docker-swarm)
|
||||
- [Kubernetes](#kubernetes)
|
||||
- [Cloud Platforms](#cloud-platforms)
|
||||
- [Security Considerations](#security-considerations)
|
||||
|
||||
## Production Docker Compose
|
||||
|
||||
For production deployments, create a custom `docker-compose.prod.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
image: readur:latest
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
# Core Configuration
|
||||
- DATABASE_URL=postgresql://readur:${DB_PASSWORD}@postgres:5432/readur
|
||||
- JWT_SECRET=${JWT_SECRET}
|
||||
- SERVER_ADDRESS=0.0.0.0:8000
|
||||
|
||||
# File Storage
|
||||
- UPLOAD_PATH=/app/uploads
|
||||
- WATCH_FOLDER=/app/watch
|
||||
- ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,doc,docx
|
||||
|
||||
# Watch Folder Settings
|
||||
- WATCH_INTERVAL_SECONDS=30
|
||||
- FILE_STABILITY_CHECK_MS=500
|
||||
- MAX_FILE_AGE_HOURS=168
|
||||
|
||||
# OCR Configuration
|
||||
- OCR_LANGUAGE=eng
|
||||
- CONCURRENT_OCR_JOBS=4
|
||||
- OCR_TIMEOUT_SECONDS=300
|
||||
- MAX_FILE_SIZE_MB=100
|
||||
|
||||
# Performance Tuning
|
||||
- MEMORY_LIMIT_MB=1024
|
||||
- CPU_PRIORITY=normal
|
||||
- ENABLE_COMPRESSION=true
|
||||
|
||||
volumes:
|
||||
# Document storage
|
||||
- ./data/uploads:/app/uploads
|
||||
|
||||
# Watch folder - mount your network drives here
|
||||
- /mnt/nfs/documents:/app/watch
|
||||
# or SMB: - /mnt/smb/shared:/app/watch
|
||||
# or S3: - /mnt/s3/bucket:/app/watch
|
||||
|
||||
depends_on:
|
||||
- postgres
|
||||
restart: unless-stopped
|
||||
|
||||
# Resource limits for production
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 2G
|
||||
cpus: '2.0'
|
||||
reservations:
|
||||
memory: 512M
|
||||
cpus: '0.5'
|
||||
|
||||
postgres:
|
||||
image: postgres:15
|
||||
environment:
|
||||
- POSTGRES_USER=readur
|
||||
- POSTGRES_PASSWORD=${DB_PASSWORD}
|
||||
- POSTGRES_DB=readur
|
||||
- POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8
|
||||
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
- ./postgres-config:/etc/postgresql/conf.d:ro
|
||||
|
||||
# PostgreSQL optimization for document search
|
||||
command: >
|
||||
postgres
|
||||
-c shared_buffers=256MB
|
||||
-c effective_cache_size=1GB
|
||||
-c max_connections=100
|
||||
-c default_text_search_config=pg_catalog.english
|
||||
|
||||
restart: unless-stopped
|
||||
|
||||
# Don't expose port in production
|
||||
# ports:
|
||||
# - "5433:5432"
|
||||
|
||||
volumes:
|
||||
postgres_data:
|
||||
driver: local
|
||||
```
|
||||
|
||||
Deploy with environment file:
|
||||
```bash
|
||||
# Create .env file with secrets
|
||||
cat > .env << EOF
|
||||
JWT_SECRET=$(openssl rand -base64 64)
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
EOF
|
||||
|
||||
# Deploy
|
||||
docker compose -f docker-compose.prod.yml --env-file .env up -d
|
||||
```
|
||||
|
||||
## Network Filesystem Mounts
|
||||
|
||||
### NFS Mounts
|
||||
|
||||
```bash
|
||||
# Mount NFS share
|
||||
sudo mount -t nfs 192.168.1.100:/documents /mnt/nfs/documents
|
||||
|
||||
# Add to docker-compose.yml
|
||||
volumes:
|
||||
- /mnt/nfs/documents:/app/watch
|
||||
environment:
|
||||
- WATCH_INTERVAL_SECONDS=60
|
||||
- FILE_STABILITY_CHECK_MS=1000
|
||||
- FORCE_POLLING_WATCH=1
|
||||
```
|
||||
|
||||
### SMB/CIFS Mounts
|
||||
|
||||
```bash
|
||||
# Mount SMB share
|
||||
sudo mount -t cifs //server/share /mnt/smb/shared -o username=user,password=pass
|
||||
|
||||
# Docker volume configuration
|
||||
volumes:
|
||||
- /mnt/smb/shared:/app/watch
|
||||
environment:
|
||||
- WATCH_INTERVAL_SECONDS=30
|
||||
- FILE_STABILITY_CHECK_MS=2000
|
||||
```
|
||||
|
||||
### S3 Mounts
|
||||
|
||||
```bash
|
||||
# Mount S3 bucket using s3fs
|
||||
s3fs mybucket /mnt/s3/bucket -o passwd_file=~/.passwd-s3fs
|
||||
|
||||
# Docker configuration for S3
|
||||
volumes:
|
||||
- /mnt/s3/bucket:/app/watch
|
||||
environment:
|
||||
- WATCH_INTERVAL_SECONDS=120
|
||||
- FILE_STABILITY_CHECK_MS=5000
|
||||
- FORCE_POLLING_WATCH=1
|
||||
```
|
||||
|
||||
## SSL/HTTPS Setup
|
||||
|
||||
### Nginx Configuration
|
||||
|
||||
```nginx
|
||||
server {
|
||||
listen 443 ssl http2;
|
||||
server_name readur.yourdomain.com;
|
||||
|
||||
ssl_certificate /path/to/cert.pem;
|
||||
ssl_certificate_key /path/to/key.pem;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:8000;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# For file uploads
|
||||
client_max_body_size 100M;
|
||||
proxy_read_timeout 300s;
|
||||
proxy_send_timeout 300s;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Traefik Configuration
|
||||
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.routers.readur.rule=Host(`readur.yourdomain.com`)"
|
||||
- "traefik.http.routers.readur.tls=true"
|
||||
- "traefik.http.routers.readur.tls.certresolver=letsencrypt"
|
||||
```
|
||||
|
||||
> 📘 **For more reverse proxy configurations** including Apache, Caddy, custom ports, load balancing, and advanced scenarios, see [REVERSE_PROXY.md](./REVERSE_PROXY.md).
|
||||
|
||||
## Health Checks
|
||||
|
||||
Add health checks to your Docker configuration:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
readur:
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
```
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
Create an automated backup script:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# backup.sh - Automated backup script
|
||||
|
||||
BACKUP_DIR="/path/to/backups"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
|
||||
# Create backup directory
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
# Backup database
|
||||
docker exec readur-postgres-1 pg_dump -U readur readur | gzip > "$BACKUP_DIR/db_backup_$DATE.sql.gz"
|
||||
|
||||
# Backup uploaded files
|
||||
tar -czf "$BACKUP_DIR/uploads_backup_$DATE.tar.gz" -C ./data uploads/
|
||||
|
||||
# Clean old backups (keep 30 days)
|
||||
find "$BACKUP_DIR" -name "db_backup_*.sql.gz" -mtime +30 -delete
|
||||
find "$BACKUP_DIR" -name "uploads_backup_*.tar.gz" -mtime +30 -delete
|
||||
|
||||
echo "Backup completed: $DATE"
|
||||
```
|
||||
|
||||
Add to crontab for daily backups:
|
||||
```bash
|
||||
0 2 * * * /path/to/backup.sh >> /var/log/readur-backup.log 2>&1
|
||||
```
|
||||
|
||||
### Restore from Backup
|
||||
|
||||
```bash
|
||||
# Restore database
|
||||
gunzip -c db_backup_20240101_020000.sql.gz | docker exec -i readur-postgres-1 psql -U readur readur
|
||||
|
||||
# Restore files
|
||||
tar -xzf uploads_backup_20240101_020000.tar.gz -C ./data
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Monitor your deployment with Docker stats:
|
||||
|
||||
```bash
|
||||
# Real-time resource usage
|
||||
docker stats
|
||||
|
||||
# Container logs
|
||||
docker compose logs -f readur
|
||||
|
||||
# Watch folder activity
|
||||
docker compose logs -f readur | grep watcher
|
||||
|
||||
# PostgreSQL query performance
|
||||
docker exec readur-postgres-1 psql -U readur -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
Readur exposes metrics at `/metrics` endpoint:
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'readur'
|
||||
static_configs:
|
||||
- targets: ['readur:8000']
|
||||
```
|
||||
|
||||
## Deployment Platforms
|
||||
|
||||
### Docker Swarm
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
readur:
|
||||
image: readur:latest
|
||||
deploy:
|
||||
replicas: 2
|
||||
restart_policy:
|
||||
condition: on-failure
|
||||
placement:
|
||||
constraints: [node.role == worker]
|
||||
networks:
|
||||
- readur-network
|
||||
secrets:
|
||||
- jwt_secret
|
||||
- db_password
|
||||
|
||||
secrets:
|
||||
jwt_secret:
|
||||
external: true
|
||||
db_password:
|
||||
external: true
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: readur
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: readur
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: readur
|
||||
image: readur:latest
|
||||
env:
|
||||
- name: JWT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: readur-secrets
|
||||
key: jwt-secret
|
||||
resources:
|
||||
limits:
|
||||
memory: "2Gi"
|
||||
cpu: "2"
|
||||
requests:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
```
|
||||
|
||||
### Cloud Platforms
|
||||
|
||||
- **AWS**: Use ECS with RDS PostgreSQL
|
||||
- **Google Cloud**: Deploy to Cloud Run with Cloud SQL
|
||||
- **Azure**: Use Container Instances with Azure Database
|
||||
- **DigitalOcean**: App Platform with Managed Database
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Production Checklist
|
||||
|
||||
- [ ] Change default admin password
|
||||
- [ ] Generate strong JWT secret
|
||||
- [ ] Use HTTPS/SSL in production
|
||||
- [ ] Restrict database network access
|
||||
- [ ] Set proper file permissions
|
||||
- [ ] Enable firewall rules
|
||||
- [ ] Regular security updates
|
||||
- [ ] Monitor access logs
|
||||
- [ ] Implement rate limiting
|
||||
- [ ] Enable audit logging
|
||||
|
||||
### Recommended Production Setup
|
||||
|
||||
```bash
|
||||
# Generate secure secrets
|
||||
JWT_SECRET=$(openssl rand -base64 64)
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
|
||||
# Restrict file permissions
|
||||
chmod 600 .env
|
||||
chmod 700 ./data/uploads
|
||||
|
||||
# Use read-only root filesystem
|
||||
docker run --read-only --tmpfs /tmp ...
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Configure [monitoring and alerting](monitoring-usage)
|
||||
- Review [security best practices](security)
|
||||
- Set up [automated backups](#backup-strategy)
|
||||
- Explore [database guardrails](dev/DATABASE_GUARDRAILS.md)
|
||||
|
|
@ -0,0 +1,47 @@
|
|||
# Developer Documentation
|
||||
|
||||
This directory contains technical documentation for developers working on Readur.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
### 🏗️ Architecture & Design
|
||||
- [**Architecture Overview**](architecture.md) - System design, components, and data flow
|
||||
- [**Database Guardrails**](DATABASE_GUARDRAILS.md) - Concurrency safety and database best practices
|
||||
|
||||
### 🛠️ Development
|
||||
- [**Development Guide**](development.md) - Setup, contributing, code style guidelines
|
||||
- [**Testing Guide**](TESTING.md) - Comprehensive testing strategy and instructions
|
||||
|
||||
### ⚙️ Technical Guides
|
||||
- [**OCR Optimization**](OCR_OPTIMIZATION_GUIDE.md) - Performance tuning and best practices
|
||||
- [**Queue Improvements**](QUEUE_IMPROVEMENTS.md) - Background job processing architecture
|
||||
- [**Deployment Summary**](DEPLOYMENT_SUMMARY.md) - Technical deployment overview
|
||||
|
||||
## 🚀 Quick Start for Developers
|
||||
|
||||
1. **Read the [Architecture Overview](architecture.md)** to understand the system design
|
||||
2. **Follow the [Development Guide](development.md)** to set up your local environment
|
||||
3. **Review the [Testing Guide](TESTING.md)** to understand our testing approach
|
||||
4. **Check [Database Guardrails](DATABASE_GUARDRAILS.md)** for data safety patterns
|
||||
|
||||
## 📖 Related User Documentation
|
||||
|
||||
- [Installation Guide](../installation.md) - How to install and run Readur
|
||||
- [Configuration Guide](../configuration.md) - Environment variables and settings
|
||||
- [User Guide](../user-guide.md) - How to use Readur features
|
||||
- [API Reference](../api-reference.md) - REST API documentation
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Please read our [Development Guide](development.md) for:
|
||||
- Setting up your development environment
|
||||
- Code style guidelines
|
||||
- Testing requirements
|
||||
- Pull request process
|
||||
|
||||
## 🏷️ Document Categories
|
||||
|
||||
- **📘 User Docs**: Installation, configuration, user guide
|
||||
- **🔧 Operations**: Deployment, monitoring, troubleshooting
|
||||
- **💻 Developer**: Architecture, development setup, testing
|
||||
- **🔌 Integration**: API reference, webhooks, extensions
|
||||
|
|
@ -0,0 +1,350 @@
|
|||
# Architecture Overview
|
||||
|
||||
This document provides a comprehensive overview of Readur's architecture, design decisions, and technical implementation details.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [System Architecture](#system-architecture)
|
||||
- [Technology Stack](#technology-stack)
|
||||
- [Component Overview](#component-overview)
|
||||
- [Backend (Rust/Axum)](#backend-rustaxum)
|
||||
- [Frontend (React)](#frontend-react)
|
||||
- [Database (PostgreSQL)](#database-postgresql)
|
||||
- [OCR Engine](#ocr-engine)
|
||||
- [Data Flow](#data-flow)
|
||||
- [Security Architecture](#security-architecture)
|
||||
- [Performance Considerations](#performance-considerations)
|
||||
- [Scalability](#scalability)
|
||||
- [Design Patterns](#design-patterns)
|
||||
|
||||
## System Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ React Frontend │────│ Rust Backend │────│ PostgreSQL DB │
|
||||
│ (Port 8000) │ │ (Axum API) │ │ (Port 5433) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
│ ┌─────────────────┐ │
|
||||
└──────────────│ File Storage │─────────────┘
|
||||
│ + OCR Engine │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
### High-Level Components
|
||||
|
||||
1. **Web Interface**: Modern React SPA with Material-UI
|
||||
2. **API Server**: High-performance Rust backend using Axum
|
||||
3. **Database**: PostgreSQL with full-text search capabilities
|
||||
4. **File Storage**: Local or network-mounted filesystem
|
||||
5. **OCR Processing**: Tesseract integration for text extraction
|
||||
6. **Background Jobs**: Async task processing for OCR and file watching
|
||||
|
||||
## Technology Stack
|
||||
|
||||
### Backend
|
||||
- **Language**: Rust (for performance and memory safety)
|
||||
- **Web Framework**: Axum (async, fast, type-safe)
|
||||
- **Database ORM**: SQLx (compile-time checked queries)
|
||||
- **Authentication**: JWT tokens with bcrypt password hashing
|
||||
- **Async Runtime**: Tokio
|
||||
- **Serialization**: Serde
|
||||
|
||||
### Frontend
|
||||
- **Framework**: React 18 with TypeScript
|
||||
- **UI Library**: Material-UI (MUI)
|
||||
- **State Management**: React Context + Hooks
|
||||
- **Build Tool**: Vite
|
||||
- **HTTP Client**: Axios
|
||||
- **Routing**: React Router
|
||||
|
||||
### Infrastructure
|
||||
- **Database**: PostgreSQL 14+ with pgvector extension
|
||||
- **OCR**: Tesseract 4.0+
|
||||
- **Container**: Docker with multi-stage builds
|
||||
- **Reverse Proxy**: Nginx/Traefik compatible
|
||||
|
||||
## Component Overview
|
||||
|
||||
### Backend (Rust/Axum)
|
||||
|
||||
The backend is structured following clean architecture principles:
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs # Application entry and server setup
|
||||
├── config.rs # Configuration management
|
||||
├── models.rs # Domain models and DTOs
|
||||
├── error.rs # Error handling
|
||||
├── auth.rs # Authentication middleware
|
||||
├── routes/ # HTTP route handlers
|
||||
│ ├── auth.rs # Authentication endpoints
|
||||
│ ├── documents.rs # Document CRUD operations
|
||||
│ ├── search.rs # Search functionality
|
||||
│ └── ...
|
||||
├── db/ # Database operations
|
||||
│ ├── documents.rs # Document queries
|
||||
│ ├── users.rs # User queries
|
||||
│ └── ...
|
||||
├── services/ # Business logic
|
||||
│ ├── ocr.rs # OCR processing
|
||||
│ ├── file_service.rs # File management
|
||||
│ └── watcher.rs # Folder watching
|
||||
└── tests/ # Integration tests
|
||||
```
|
||||
|
||||
Key design decisions:
|
||||
- **Async-first**: All I/O operations are async
|
||||
- **Type safety**: Leverages Rust's type system
|
||||
- **Error handling**: Comprehensive error types
|
||||
- **Dependency injection**: Clean separation of concerns
|
||||
|
||||
### Frontend (React)
|
||||
|
||||
The frontend follows a component-based architecture:
|
||||
|
||||
```
|
||||
frontend/src/
|
||||
├── components/ # Reusable UI components
|
||||
│ ├── DocumentList/
|
||||
│ ├── SearchBar/
|
||||
│ └── ...
|
||||
├── pages/ # Page-level components
|
||||
│ ├── Dashboard/
|
||||
│ ├── Documents/
|
||||
│ └── ...
|
||||
├── services/ # API integration
|
||||
│ ├── api.ts # Base API client
|
||||
│ ├── auth.ts # Auth service
|
||||
│ └── documents.ts # Document service
|
||||
├── hooks/ # Custom React hooks
|
||||
├── contexts/ # React contexts
|
||||
└── utils/ # Utility functions
|
||||
```
|
||||
|
||||
### Database (PostgreSQL)
|
||||
|
||||
Schema design optimized for document management:
|
||||
|
||||
```sql
|
||||
-- Core tables
|
||||
users # User accounts
|
||||
documents # Document metadata
|
||||
document_content # Extracted text content
|
||||
document_tags # Many-to-many tags
|
||||
sources # File sources (folders, S3, etc.)
|
||||
ocr_queue # OCR processing queue
|
||||
|
||||
-- Search optimization
|
||||
document_search_index # Full-text search index
|
||||
```
|
||||
|
||||
Key features:
|
||||
- **Full-text search**: PostgreSQL's powerful search capabilities
|
||||
- **JSONB fields**: Flexible metadata storage
|
||||
- **Triggers**: Automatic search index updates
|
||||
- **Views**: Optimized query patterns
|
||||
|
||||
### OCR Engine
|
||||
|
||||
OCR processing pipeline:
|
||||
|
||||
1. **File Detection**: New files detected via upload or folder watch
|
||||
2. **Queue Management**: Files added to processing queue
|
||||
3. **Pre-processing**: Image enhancement and optimization
|
||||
4. **Text Extraction**: Tesseract OCR with language detection
|
||||
5. **Post-processing**: Text cleaning and formatting
|
||||
6. **Database Storage**: Indexed for search
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Document Upload Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
User->>Frontend: Upload Document
|
||||
Frontend->>API: POST /api/documents
|
||||
API->>FileStorage: Save File
|
||||
API->>Database: Create Document Record
|
||||
API->>OCRQueue: Add to Queue
|
||||
API-->>Frontend: Document Created
|
||||
OCRWorker->>OCRQueue: Poll for Jobs
|
||||
OCRWorker->>FileStorage: Read File
|
||||
OCRWorker->>Tesseract: Extract Text
|
||||
OCRWorker->>Database: Update with Content
|
||||
OCRWorker->>Frontend: WebSocket Update
|
||||
```
|
||||
|
||||
### Search Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
User->>Frontend: Enter Search Query
|
||||
Frontend->>API: GET /api/search
|
||||
API->>Database: Full-text Search
|
||||
Database->>API: Ranked Results
|
||||
API->>Frontend: Search Results
|
||||
Frontend->>User: Display Results
|
||||
```
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
- **JWT Tokens**: Stateless authentication
|
||||
- **Role-Based Access**: Admin, User roles
|
||||
- **Token Refresh**: Automatic token renewal
|
||||
- **Password Security**: Bcrypt with salt rounds
|
||||
|
||||
### API Security
|
||||
|
||||
- **CORS**: Configurable allowed origins
|
||||
- **Rate Limiting**: Prevent abuse
|
||||
- **Input Validation**: Comprehensive validation
|
||||
- **SQL Injection**: Parameterized queries via SQLx
|
||||
|
||||
### File Security
|
||||
|
||||
- **Upload Validation**: File type and size checks
|
||||
- **Virus Scanning**: Optional ClamAV integration
|
||||
- **Access Control**: Document-level permissions
|
||||
- **Secure Storage**: Filesystem permissions
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Backend Optimization
|
||||
|
||||
- **Connection Pooling**: Database connection reuse
|
||||
- **Async I/O**: Non-blocking operations
|
||||
- **Caching**: In-memory caching for hot data
|
||||
- **Query Optimization**: Indexed searches
|
||||
|
||||
### Frontend Optimization
|
||||
|
||||
- **Code Splitting**: Lazy loading of routes
|
||||
- **Virtual Scrolling**: Large document lists
|
||||
- **Memoization**: Prevent unnecessary re-renders
|
||||
- **Service Workers**: Offline capability
|
||||
|
||||
### OCR Optimization
|
||||
|
||||
- **Parallel Processing**: Multiple concurrent jobs
|
||||
- **Image Pre-processing**: Enhance OCR accuracy
|
||||
- **Resource Limits**: Memory and CPU constraints
|
||||
- **Queue Priority**: Smart job scheduling
|
||||
|
||||
## Scalability
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
```yaml
|
||||
# Multiple backend instances
|
||||
backend-1:
|
||||
image: readur:latest
|
||||
environment:
|
||||
- INSTANCE_ID=1
|
||||
|
||||
backend-2:
|
||||
image: readur:latest
|
||||
environment:
|
||||
- INSTANCE_ID=2
|
||||
|
||||
# Load balancer
|
||||
nginx:
|
||||
upstream backend {
|
||||
server backend-1:8000;
|
||||
server backend-2:8000;
|
||||
}
|
||||
```
|
||||
|
||||
### Database Scaling
|
||||
|
||||
- **Read Replicas**: Distribute read load
|
||||
- **Connection Pooling**: PgBouncer
|
||||
- **Partitioning**: Time-based partitions
|
||||
- **Archival**: Move old documents
|
||||
|
||||
### Storage Scaling
|
||||
|
||||
- **S3 Compatible**: Object storage support
|
||||
- **CDN Integration**: Static file delivery
|
||||
- **Distributed Storage**: GlusterFS/Ceph
|
||||
- **Archive Tiering**: Hot/cold storage
|
||||
|
||||
## Design Patterns
|
||||
|
||||
### Backend Patterns
|
||||
|
||||
1. **Repository Pattern**: Database abstraction
|
||||
2. **Service Layer**: Business logic separation
|
||||
3. **Middleware Chain**: Request processing
|
||||
4. **Error Boundaries**: Graceful error handling
|
||||
|
||||
### Frontend Patterns
|
||||
|
||||
1. **Container/Presenter**: Component separation
|
||||
2. **Custom Hooks**: Logic reuse
|
||||
3. **Context Provider**: State management
|
||||
4. **HOCs**: Cross-cutting concerns
|
||||
|
||||
### Database Patterns
|
||||
|
||||
1. **Soft Deletes**: Data preservation
|
||||
2. **Audit Trails**: Change tracking
|
||||
3. **Materialized Views**: Performance
|
||||
4. **Event Sourcing**: Optional audit log
|
||||
|
||||
## Future Architecture Considerations
|
||||
|
||||
### Microservices Migration
|
||||
|
||||
Potential service boundaries:
|
||||
- Authentication Service
|
||||
- Document Service
|
||||
- OCR Service
|
||||
- Search Service
|
||||
- Notification Service
|
||||
|
||||
### Event-Driven Architecture
|
||||
|
||||
- Message Queue (RabbitMQ/Kafka)
|
||||
- Event Sourcing
|
||||
- CQRS Pattern
|
||||
- Async communication
|
||||
|
||||
### Cloud-Native Features
|
||||
|
||||
- Kubernetes deployment
|
||||
- Service mesh (Istio)
|
||||
- Distributed tracing
|
||||
- Cloud storage integration
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Metrics
|
||||
|
||||
- Prometheus metrics endpoint
|
||||
- Custom business metrics
|
||||
- Performance counters
|
||||
- Resource utilization
|
||||
|
||||
### Logging
|
||||
|
||||
- Structured logging (JSON)
|
||||
- Log aggregation ready
|
||||
- Correlation IDs
|
||||
- Debug levels
|
||||
|
||||
### Tracing
|
||||
|
||||
- OpenTelemetry support
|
||||
- Distributed tracing
|
||||
- Performance profiling
|
||||
- Request tracking
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Review [deployment options](deployment.md)
|
||||
- Explore [performance tuning](OCR_OPTIMIZATION_GUIDE.md)
|
||||
- Understand [database design](DATABASE_GUARDRAILS.md)
|
||||
- Learn about [testing strategy](TESTING.md)
|
||||
|
|
@ -0,0 +1,434 @@
|
|||
# Development Guide
|
||||
|
||||
This guide covers contributing to Readur, setting up a development environment, testing, and code style guidelines.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Development Setup](#development-setup)
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Local Development](#local-development)
|
||||
- [Development with Docker](#development-with-docker)
|
||||
- [Project Structure](#project-structure)
|
||||
- [Testing](#testing)
|
||||
- [Backend Tests](#backend-tests)
|
||||
- [Frontend Tests](#frontend-tests)
|
||||
- [Integration Tests](#integration-tests)
|
||||
- [E2E Tests](#e2e-tests)
|
||||
- [Code Style](#code-style)
|
||||
- [Rust Guidelines](#rust-guidelines)
|
||||
- [Frontend Guidelines](#frontend-guidelines)
|
||||
- [Contributing](#contributing)
|
||||
- [Getting Started](#getting-started)
|
||||
- [Pull Request Process](#pull-request-process)
|
||||
- [Commit Guidelines](#commit-guidelines)
|
||||
- [Debugging](#debugging)
|
||||
- [Performance Profiling](#performance-profiling)
|
||||
|
||||
## Development Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Rust 1.70+ and Cargo
|
||||
- Node.js 18+ and npm
|
||||
- PostgreSQL 14+
|
||||
- Tesseract OCR 4.0+
|
||||
- Git
|
||||
|
||||
### Local Development
|
||||
|
||||
1. **Clone the repository**:
|
||||
```bash
|
||||
git clone https://github.com/perfectra1n/readur.git
|
||||
cd readur
|
||||
```
|
||||
|
||||
2. **Set up the database**:
|
||||
```bash
|
||||
# Create development database
|
||||
sudo -u postgres psql
|
||||
CREATE DATABASE readur_dev;
|
||||
CREATE USER readur_dev WITH ENCRYPTED PASSWORD 'dev_password';
|
||||
GRANT ALL PRIVILEGES ON DATABASE readur_dev TO readur_dev;
|
||||
\q
|
||||
```
|
||||
|
||||
3. **Configure environment**:
|
||||
```bash
|
||||
# Copy example environment
|
||||
cp .env.example .env.development
|
||||
|
||||
# Edit with your settings
|
||||
DATABASE_URL=postgresql://readur_dev:dev_password@localhost/readur_dev
|
||||
JWT_SECRET=dev-secret-key
|
||||
```
|
||||
|
||||
4. **Run database migrations**:
|
||||
```bash
|
||||
# Install sqlx-cli if needed
|
||||
cargo install sqlx-cli
|
||||
|
||||
# Run migrations
|
||||
sqlx migrate run
|
||||
```
|
||||
|
||||
5. **Start the backend**:
|
||||
```bash
|
||||
# Development mode with auto-reload
|
||||
cargo watch -x run
|
||||
|
||||
# Or without auto-reload
|
||||
cargo run
|
||||
```
|
||||
|
||||
6. **Start the frontend**:
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### Development with Docker
|
||||
|
||||
For a consistent development environment:
|
||||
|
||||
```bash
|
||||
# Start all services
|
||||
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
|
||||
|
||||
# Backend available at: http://localhost:8000
|
||||
# Frontend dev server at: http://localhost:5173
|
||||
# PostgreSQL at: localhost:5433
|
||||
```
|
||||
|
||||
The development compose file includes:
|
||||
- Volume mounts for hot reloading
|
||||
- Exposed database port
|
||||
- Debug logging enabled
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
readur/
|
||||
├── src/ # Rust backend source
|
||||
│ ├── main.rs # Application entry point
|
||||
│ ├── config.rs # Configuration management
|
||||
│ ├── models.rs # Database models
|
||||
│ ├── routes/ # API route handlers
|
||||
│ ├── db/ # Database operations
|
||||
│ ├── ocr.rs # OCR processing
|
||||
│ └── tests/ # Integration tests
|
||||
├── frontend/ # React frontend
|
||||
│ ├── src/
|
||||
│ │ ├── components/ # React components
|
||||
│ │ ├── pages/ # Page components
|
||||
│ │ ├── services/ # API services
|
||||
│ │ └── App.tsx # Main app component
|
||||
│ └── tests/ # Frontend tests
|
||||
├── migrations/ # Database migrations
|
||||
├── docs/ # Documentation
|
||||
└── tests/ # E2E and integration tests
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Readur has comprehensive test coverage across unit, integration, and end-to-end tests.
|
||||
|
||||
### Backend Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
cargo test
|
||||
|
||||
# Run with output
|
||||
cargo test -- --nocapture
|
||||
|
||||
# Run specific test
|
||||
cargo test test_document_upload
|
||||
|
||||
# Run tests with coverage
|
||||
cargo install cargo-tarpaulin
|
||||
cargo tarpaulin --out Html
|
||||
```
|
||||
|
||||
Test categories:
|
||||
- **Unit tests**: In `src/tests/`
|
||||
- **Integration tests**: In `tests/`
|
||||
- **Database tests**: Require `TEST_DATABASE_URL`
|
||||
|
||||
Example test:
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_document_creation() {
|
||||
let doc = Document::new("test.pdf", "application/pdf");
|
||||
assert_eq!(doc.filename, "test.pdf");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Frontend Tests
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
|
||||
# Run unit tests
|
||||
npm test
|
||||
|
||||
# Run with coverage
|
||||
npm run test:coverage
|
||||
|
||||
# Run in watch mode
|
||||
npm run test:watch
|
||||
```
|
||||
|
||||
Example test:
|
||||
```typescript
|
||||
import { render, screen } from '@testing-library/react';
|
||||
import DocumentList from './DocumentList';
|
||||
|
||||
test('renders document list', () => {
|
||||
render(<DocumentList documents={[]} />);
|
||||
expect(screen.getByText(/No documents/i)).toBeInTheDocument();
|
||||
});
|
||||
```
|
||||
|
||||
### Integration Tests
|
||||
|
||||
```bash
|
||||
# Run integration tests
|
||||
docker compose -f docker-compose.test.yml up --abort-on-container-exit
|
||||
|
||||
# Or manually
|
||||
cargo test --test '*' -- --test-threads=1
|
||||
```
|
||||
|
||||
### E2E Tests
|
||||
|
||||
Using Playwright for end-to-end testing:
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
|
||||
# Install Playwright
|
||||
npm run e2e:install
|
||||
|
||||
# Run E2E tests
|
||||
npm run e2e
|
||||
|
||||
# Run in UI mode
|
||||
npm run e2e:ui
|
||||
```
|
||||
|
||||
## Code Style
|
||||
|
||||
### Rust Guidelines
|
||||
|
||||
We follow the official Rust style guide with some additions:
|
||||
|
||||
```bash
|
||||
# Format code
|
||||
cargo fmt
|
||||
|
||||
# Check linting
|
||||
cargo clippy -- -D warnings
|
||||
|
||||
# Check before committing
|
||||
cargo fmt --check && cargo clippy
|
||||
```
|
||||
|
||||
Style preferences:
|
||||
- Use descriptive variable names
|
||||
- Add documentation comments for public APIs
|
||||
- Keep functions small and focused
|
||||
- Use `Result` for error handling
|
||||
- Prefer `&str` over `String` for function parameters
|
||||
|
||||
### Frontend Guidelines
|
||||
|
||||
```bash
|
||||
# Format code
|
||||
npm run format
|
||||
|
||||
# Lint check
|
||||
npm run lint
|
||||
|
||||
# Type check
|
||||
npm run type-check
|
||||
```
|
||||
|
||||
Style preferences:
|
||||
- Use functional components with hooks
|
||||
- TypeScript for all new code
|
||||
- Descriptive component and variable names
|
||||
- Extract reusable logic into custom hooks
|
||||
- Keep components focused and small
|
||||
|
||||
## Contributing
|
||||
|
||||
We welcome contributions! Please see our [Contributing Guide](../CONTRIBUTING.md) for details.
|
||||
|
||||
### Getting Started
|
||||
|
||||
1. **Fork the repository**
|
||||
2. **Create a feature branch**:
|
||||
```bash
|
||||
git checkout -b feature/amazing-feature
|
||||
```
|
||||
|
||||
3. **Make your changes**
|
||||
4. **Add tests** for new functionality
|
||||
5. **Ensure all tests pass**:
|
||||
```bash
|
||||
cargo test
|
||||
cd frontend && npm test
|
||||
```
|
||||
|
||||
6. **Commit your changes** (see commit guidelines below)
|
||||
7. **Push to your fork**:
|
||||
```bash
|
||||
git push origin feature/amazing-feature
|
||||
```
|
||||
|
||||
8. **Open a Pull Request**
|
||||
|
||||
### Pull Request Process
|
||||
|
||||
1. **Update documentation** for any changed functionality
|
||||
2. **Add tests** covering new code
|
||||
3. **Ensure CI passes** (automated checks)
|
||||
4. **Request review** from maintainers
|
||||
5. **Address feedback** promptly
|
||||
6. **Squash commits** if requested
|
||||
|
||||
### Commit Guidelines
|
||||
|
||||
We use conventional commits for clear history:
|
||||
|
||||
```
|
||||
feat: add bulk document export
|
||||
fix: resolve OCR timeout on large files
|
||||
docs: update API authentication section
|
||||
test: add coverage for search filters
|
||||
refactor: simplify document processing pipeline
|
||||
perf: optimize database queries for search
|
||||
chore: update dependencies
|
||||
```
|
||||
|
||||
Format:
|
||||
```
|
||||
<type>(<scope>): <subject>
|
||||
|
||||
<body>
|
||||
|
||||
<footer>
|
||||
```
|
||||
|
||||
Types:
|
||||
- `feat`: New feature
|
||||
- `fix`: Bug fix
|
||||
- `docs`: Documentation only
|
||||
- `style`: Code style changes
|
||||
- `refactor`: Code refactoring
|
||||
- `perf`: Performance improvements
|
||||
- `test`: Test additions/changes
|
||||
- `chore`: Build process/auxiliary tool changes
|
||||
|
||||
## Debugging
|
||||
|
||||
### Backend Debugging
|
||||
|
||||
1. **Enable debug logging**:
|
||||
```bash
|
||||
RUST_LOG=debug cargo run
|
||||
```
|
||||
|
||||
2. **Use VS Code debugger**:
|
||||
```json
|
||||
// .vscode/launch.json
|
||||
{
|
||||
"version": "0.2.0",
|
||||
"configurations": [
|
||||
{
|
||||
"type": "lldb",
|
||||
"request": "launch",
|
||||
"name": "Debug Readur",
|
||||
"cargo": {
|
||||
"args": ["build", "--bin=readur"],
|
||||
"filter": {
|
||||
"name": "readur",
|
||||
"kind": "bin"
|
||||
}
|
||||
},
|
||||
"args": [],
|
||||
"cwd": "${workspaceFolder}"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
3. **Database query logging**:
|
||||
```bash
|
||||
RUST_LOG=sqlx=debug cargo run
|
||||
```
|
||||
|
||||
### Frontend Debugging
|
||||
|
||||
1. **React DevTools**: Install browser extension
|
||||
2. **Redux DevTools**: For state debugging
|
||||
3. **Network tab**: Monitor API calls
|
||||
4. **Console debugging**: Strategic `console.log`
|
||||
|
||||
## Performance Profiling
|
||||
|
||||
### Backend Profiling
|
||||
|
||||
```bash
|
||||
# CPU profiling with flamegraph
|
||||
cargo install flamegraph
|
||||
cargo flamegraph --bin readur
|
||||
|
||||
# Memory profiling
|
||||
valgrind --tool=massif target/release/readur
|
||||
```
|
||||
|
||||
### Frontend Profiling
|
||||
|
||||
1. Use Chrome DevTools Performance tab
|
||||
2. React Profiler for component performance
|
||||
3. Lighthouse for overall performance audit
|
||||
|
||||
### Database Profiling
|
||||
|
||||
```sql
|
||||
-- Enable query timing
|
||||
\timing on
|
||||
|
||||
-- Analyze query plan
|
||||
EXPLAIN ANALYZE SELECT * FROM documents WHERE ...;
|
||||
|
||||
-- Check slow queries
|
||||
SELECT * FROM pg_stat_statements
|
||||
ORDER BY total_time DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Rust Book](https://doc.rust-lang.org/book/)
|
||||
- [React Documentation](https://react.dev/)
|
||||
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
|
||||
- [Tesseract Documentation](https://tesseract-ocr.github.io/)
|
||||
- [Testing Guide](TESTING.md)
|
||||
|
||||
## Getting Help
|
||||
|
||||
- **GitHub Issues**: For bug reports and feature requests
|
||||
- **GitHub Discussions**: For questions and community support
|
||||
- **Discord**: Join our community server (link in README)
|
||||
|
||||
## License
|
||||
|
||||
By contributing to Readur, you agree that your contributions will be licensed under the MIT License.
|
||||
|
|
@ -0,0 +1,175 @@
|
|||
# Installation Guide
|
||||
|
||||
This guide covers various methods to install and run Readur, from quick Docker deployment to manual installation.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Quick Start with Docker Compose](#quick-start-with-docker-compose)
|
||||
- [System Requirements](#system-requirements)
|
||||
- [Manual Installation](#manual-installation)
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Backend Setup](#backend-setup)
|
||||
- [Frontend Setup](#frontend-setup)
|
||||
- [Verifying Installation](#verifying-installation)
|
||||
|
||||
## Quick Start with Docker Compose
|
||||
|
||||
The fastest way to get Readur running:
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/perfectra1n/readur
|
||||
cd readur
|
||||
|
||||
# Start all services
|
||||
docker compose up --build -d
|
||||
|
||||
# Access the application
|
||||
open http://localhost:8000
|
||||
```
|
||||
|
||||
**Default login credentials:**
|
||||
- Username: `admin`
|
||||
- Password: `readur2024`
|
||||
|
||||
> ⚠️ **Important**: Change the default admin password immediately after first login!
|
||||
|
||||
### What You Get
|
||||
|
||||
After deployment, you'll have:
|
||||
- **Web Interface**: Modern document management UI at `http://localhost:8000`
|
||||
- **PostgreSQL Database**: Document metadata and full-text search indexes
|
||||
- **File Storage**: Persistent document storage with OCR processing
|
||||
- **Watch Folder**: Automatic file ingestion from mounted directories
|
||||
- **REST API**: Full API access for integrations
|
||||
|
||||
## System Requirements
|
||||
|
||||
### Minimum Requirements
|
||||
- **CPU**: 2 cores
|
||||
- **RAM**: 2GB
|
||||
- **Storage**: 10GB free space
|
||||
- **OS**: Linux, macOS, or Windows with Docker
|
||||
|
||||
### Recommended for Production
|
||||
- **CPU**: 4+ cores
|
||||
- **RAM**: 4GB+
|
||||
- **Storage**: 50GB+ SSD
|
||||
- **Network**: Stable internet connection for OCR processing
|
||||
|
||||
## Manual Installation
|
||||
|
||||
For development or custom deployments without Docker:
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Install these dependencies on your system:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y \
|
||||
tesseract-ocr tesseract-ocr-eng \
|
||||
libtesseract-dev libleptonica-dev \
|
||||
postgresql postgresql-contrib \
|
||||
pkg-config libclang-dev
|
||||
|
||||
# macOS (requires Homebrew)
|
||||
brew install tesseract leptonica postgresql rust nodejs npm
|
||||
|
||||
# Install Rust (if not already installed)
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||
```
|
||||
|
||||
### Backend Setup
|
||||
|
||||
1. **Configure Database**:
|
||||
```bash
|
||||
# Create database and user
|
||||
sudo -u postgres psql
|
||||
CREATE DATABASE readur;
|
||||
CREATE USER readur_user WITH ENCRYPTED PASSWORD 'your_password';
|
||||
GRANT ALL PRIVILEGES ON DATABASE readur TO readur_user;
|
||||
\q
|
||||
```
|
||||
|
||||
2. **Environment Configuration**:
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.example .env
|
||||
|
||||
# Edit configuration
|
||||
nano .env
|
||||
```
|
||||
|
||||
Required environment variables:
|
||||
```env
|
||||
DATABASE_URL=postgresql://readur_user:your_password@localhost/readur
|
||||
JWT_SECRET=your-super-secret-jwt-key-change-this
|
||||
SERVER_ADDRESS=0.0.0.0:8000
|
||||
UPLOAD_PATH=./uploads
|
||||
WATCH_FOLDER=./watch
|
||||
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,gif,bmp,tiff,txt,rtf,doc,docx
|
||||
```
|
||||
|
||||
3. **Build and Run Backend**:
|
||||
```bash
|
||||
# Install dependencies and run
|
||||
cargo build --release
|
||||
cargo run
|
||||
```
|
||||
|
||||
### Frontend Setup
|
||||
|
||||
1. **Install Dependencies**:
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
```
|
||||
|
||||
2. **Development Mode**:
|
||||
```bash
|
||||
npm run dev
|
||||
# Frontend available at http://localhost:5173
|
||||
```
|
||||
|
||||
3. **Production Build**:
|
||||
```bash
|
||||
npm run build
|
||||
# Built files in frontend/dist/
|
||||
```
|
||||
|
||||
## Verifying Installation
|
||||
|
||||
After installation, verify everything is working:
|
||||
|
||||
1. **Check Backend Health**:
|
||||
```bash
|
||||
curl http://localhost:8000/api/health
|
||||
```
|
||||
|
||||
2. **Access Web Interface**:
|
||||
- Navigate to `http://localhost:8000`
|
||||
- Log in with default credentials
|
||||
- Upload a test document
|
||||
|
||||
3. **Verify Database Connection**:
|
||||
```bash
|
||||
# For Docker installation
|
||||
docker exec -it readur-postgres-1 psql -U readur -c "\dt"
|
||||
|
||||
# For manual installation
|
||||
psql -U readur_user -d readur -c "\dt"
|
||||
```
|
||||
|
||||
4. **Check OCR Functionality**:
|
||||
- Upload a PDF or image file
|
||||
- Wait for processing to complete
|
||||
- Search for text content from the uploaded file
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Configure Readur](configuration.md) for your specific needs
|
||||
- Set up [production deployment](deployment.md) with SSL and proper security
|
||||
- Read the [User Guide](user-guide.md) to learn about all features
|
||||
- Explore the [API Reference](api-reference.md) for integrations
|
||||
|
|
@ -0,0 +1,282 @@
|
|||
# User Guide
|
||||
|
||||
A comprehensive guide to using Readur's features for document management, OCR processing, and search.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Getting Started](#getting-started)
|
||||
- [Supported File Types](#supported-file-types)
|
||||
- [Using the Interface](#using-the-interface)
|
||||
- [Dashboard](#dashboard)
|
||||
- [Document Management](#document-management)
|
||||
- [Advanced Search](#advanced-search)
|
||||
- [Folder Watching](#folder-watching)
|
||||
- [Document Upload](#document-upload)
|
||||
- [OCR Processing](#ocr-processing)
|
||||
- [Search Features](#search-features)
|
||||
- [Tags and Organization](#tags-and-organization)
|
||||
- [User Settings](#user-settings)
|
||||
- [Tips for Best Results](#tips-for-best-results)
|
||||
|
||||
## Getting Started
|
||||
|
||||
1. **First Login**:
|
||||
- Navigate to `http://localhost:8000` (or your configured URL)
|
||||
- Use the default admin credentials (username: `admin`, password: `readur2024`)
|
||||
- **Important**: Change the default password immediately
|
||||
|
||||
2. **Initial Setup**:
|
||||
- Configure your user preferences
|
||||
- Set OCR language if different from English
|
||||
- Adjust search and display settings
|
||||
|
||||
3. **Quick Start**:
|
||||
- Upload your first document using drag-and-drop or the upload button
|
||||
- Wait for OCR processing to complete
|
||||
- Search for content within your documents
|
||||
|
||||
## Supported File Types
|
||||
|
||||
| Type | Extensions | OCR Support | Notes |
|
||||
|------|-----------|-------------|-------|
|
||||
| **PDF** | `.pdf` | ✅ | Text extraction + OCR for scanned pages |
|
||||
| **Images** | `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | ✅ | Full OCR text extraction |
|
||||
| **Text** | `.txt`, `.rtf` | ❌ | Direct text indexing |
|
||||
| **Office** | `.doc`, `.docx` | ⚠️ | Limited support |
|
||||
|
||||
## Using the Interface
|
||||
|
||||
### Dashboard
|
||||
|
||||
The dashboard provides an overview of your document system:
|
||||
|
||||
- **Document Statistics**:
|
||||
- Total documents in the system
|
||||
- Storage usage breakdown
|
||||
- OCR processing status
|
||||
- Recent activity timeline
|
||||
|
||||
- **Quick Actions**:
|
||||
- Upload new documents
|
||||
- Quick search bar
|
||||
- Access to recent documents
|
||||
- System notifications
|
||||
|
||||
### Document Management
|
||||
|
||||
#### List/Grid View
|
||||
- **List View**: Detailed document information in a table format
|
||||
- **Grid View**: Visual thumbnails for quick browsing
|
||||
- Toggle between views using the view selector in the top toolbar
|
||||
|
||||
#### Sorting Options
|
||||
- Upload date (newest/oldest first)
|
||||
- File name (A-Z/Z-A)
|
||||
- File size (largest/smallest)
|
||||
- Document type
|
||||
- OCR status
|
||||
|
||||
#### Filtering
|
||||
- By file type (PDF, images, text)
|
||||
- By OCR status (completed, pending, failed)
|
||||
- By date range
|
||||
- By tags
|
||||
- By source (uploaded, watched folder)
|
||||
|
||||
#### Bulk Actions
|
||||
1. Select multiple documents using checkboxes
|
||||
2. Available bulk actions:
|
||||
- Delete selected documents
|
||||
- Add/remove tags
|
||||
- Export document list
|
||||
- Reprocess OCR
|
||||
|
||||
### Advanced Search
|
||||
|
||||
Readur offers powerful search capabilities:
|
||||
|
||||
#### Full-Text Search
|
||||
- Search within document content
|
||||
- Automatic stemming and fuzzy matching
|
||||
- Phrase search with quotes: `"exact phrase"`
|
||||
- Exclude terms with minus: `-excluded`
|
||||
|
||||
#### Search Filters
|
||||
- **Date Range**: Find documents from specific time periods
|
||||
- **File Type**: Limit search to specific formats
|
||||
- **File Size**: Filter by document size
|
||||
- **OCR Status**: Only search processed documents
|
||||
- **Tags**: Search within tagged documents
|
||||
|
||||
#### Search Syntax
|
||||
```
|
||||
invoice 2024 # Find documents with both terms
|
||||
"quarterly report" # Exact phrase search
|
||||
invoice -draft # Exclude drafts
|
||||
tag:important invoice # Search within tagged documents
|
||||
type:pdf contract # Search only PDFs
|
||||
```
|
||||
|
||||
### Folder Watching
|
||||
|
||||
The folder watching feature automatically imports documents:
|
||||
|
||||
1. **Non-destructive**: Source files remain untouched
|
||||
2. **Automatic Processing**: New files are detected and processed
|
||||
3. **Configurable Intervals**: Adjust scan frequency
|
||||
4. **Multiple Sources**: Watch local folders, network drives, cloud storage
|
||||
|
||||
#### Setting Up Watch Folders
|
||||
1. Go to Settings → Sources
|
||||
2. Add a new source with type "Local Folder"
|
||||
3. Configure the path and scan interval
|
||||
4. Enable/disable the source as needed
|
||||
|
||||
## Document Upload
|
||||
|
||||
### Manual Upload
|
||||
1. Click the upload button or drag files to the upload area
|
||||
2. Select one or multiple files
|
||||
3. Add tags during upload (optional)
|
||||
4. Click "Upload" to start processing
|
||||
|
||||
### Drag and Drop
|
||||
- Drag files directly from your file manager
|
||||
- Drop anywhere on the document list page
|
||||
- Multiple files can be dropped at once
|
||||
|
||||
### Upload Limits
|
||||
- Maximum file size: Configurable (default 50MB)
|
||||
- Supported formats: See [Supported File Types](#supported-file-types)
|
||||
- Batch upload: Up to 100 files at once
|
||||
|
||||
## OCR Processing
|
||||
|
||||
### Automatic OCR
|
||||
- Starts automatically after upload
|
||||
- Processes documents in background
|
||||
- Priority queue for smaller files
|
||||
|
||||
### OCR Settings
|
||||
- **Language**: Select from 100+ languages
|
||||
- **Preprocessing**: Enable image enhancement
|
||||
- **Auto-rotation**: Correct document orientation
|
||||
- **Quality**: Balance between speed and accuracy
|
||||
|
||||
### OCR Status Indicators
|
||||
- 🟢 **Completed**: Full text extracted
|
||||
- 🟡 **Processing**: OCR in progress
|
||||
- 🔴 **Failed**: Error during processing
|
||||
- ⚪ **Pending**: Waiting in queue
|
||||
|
||||
## Search Features
|
||||
|
||||
### Quick Search
|
||||
- Available in the header on all pages
|
||||
- Instant results as you type
|
||||
- Shows top 5 matches with snippets
|
||||
|
||||
### Advanced Search Page
|
||||
- Full search interface with all filters
|
||||
- Export search results
|
||||
- Save frequently used searches
|
||||
- Search history
|
||||
|
||||
### Search Tips
|
||||
1. Use quotes for exact phrases
|
||||
2. Combine filters for precise results
|
||||
3. Use wildcards: `inv*` matches invoice, inventory
|
||||
4. Search in specific fields: `filename:report`
|
||||
|
||||
## Tags and Organization
|
||||
|
||||
### Creating Tags
|
||||
1. Select document(s)
|
||||
2. Click "Add Tag"
|
||||
3. Enter tag name or select existing
|
||||
4. Tags are color-coded for easy identification
|
||||
|
||||
### Tag Management
|
||||
- Rename tags globally
|
||||
- Merge similar tags
|
||||
- Delete unused tags
|
||||
- Set tag colors
|
||||
|
||||
### Smart Collections
|
||||
Create saved searches based on:
|
||||
- Tag combinations
|
||||
- Date ranges
|
||||
- File types
|
||||
- Custom criteria
|
||||
|
||||
## User Settings
|
||||
|
||||
### Personal Preferences
|
||||
- **Display**: List/grid default view
|
||||
- **Language**: Interface language
|
||||
- **Time Zone**: For accurate timestamps
|
||||
- **Notifications**: Email/in-app alerts
|
||||
|
||||
### OCR Preferences
|
||||
- Default OCR language
|
||||
- Processing priority
|
||||
- Image preprocessing options
|
||||
- Batch size limits
|
||||
|
||||
### Search Settings
|
||||
- Results per page
|
||||
- Default sort order
|
||||
- Snippet length
|
||||
- Fuzzy search threshold
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
### OCR Quality
|
||||
1. **Higher Resolution**: 300+ DPI produces better OCR results
|
||||
2. **Clean Scans**: Avoid skewed or dirty documents
|
||||
3. **Good Lighting**: For photo captures, ensure even lighting
|
||||
4. **Text Contrast**: Black text on white background works best
|
||||
|
||||
### File Organization
|
||||
1. **Consistent Naming**: Use descriptive, consistent file names
|
||||
2. **Regular Uploads**: Don't let documents pile up
|
||||
3. **Use Tags**: Tag documents immediately after upload
|
||||
4. **Folder Structure**: Organize watch folders logically
|
||||
|
||||
### Search Optimization
|
||||
1. **Use Filters**: Combine text search with filters
|
||||
2. **Save Searches**: Save frequently used search queries
|
||||
3. **Learn Syntax**: Master search operators for better results
|
||||
4. **Index Regularly**: Ensure all documents are processed
|
||||
|
||||
### Performance Tips
|
||||
1. **Batch Processing**: Upload similar documents together
|
||||
2. **Off-Peak Hours**: Schedule large uploads during low-usage times
|
||||
3. **Monitor Queue**: Check OCR queue status regularly
|
||||
4. **Clean Up**: Remove outdated documents periodically
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**OCR Not Starting**
|
||||
- Check file size limits
|
||||
- Verify supported file format
|
||||
- Ensure OCR service is running
|
||||
|
||||
**Search Not Finding Documents**
|
||||
- Confirm OCR completed successfully
|
||||
- Check search syntax
|
||||
- Try broader search terms
|
||||
|
||||
**Slow Performance**
|
||||
- Review concurrent OCR job settings
|
||||
- Check system resources
|
||||
- Consider increasing memory limits
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore the [API Reference](api-reference.md) for automation
|
||||
- Learn about [advanced configuration](configuration.md)
|
||||
- Set up [automated workflows](WATCH_FOLDER.md)
|
||||
- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md)
|
||||
Loading…
Reference in New Issue