feat(docs): add more user facing docs, update README, and move dev docs to correct folder
This commit is contained in:
parent
102e7d8b3f
commit
a7883c1b63
787
README.md
787
README.md
|
|
@ -16,10 +16,6 @@ A powerful, modern document management system built with Rust and React. Readur
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
### Using Docker Compose (Recommended)
|
|
||||||
|
|
||||||
The fastest way to get Readur running:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
# Clone the repository
|
||||||
git clone https://github.com/perfectra1n/readur
|
git clone https://github.com/perfectra1n/readur
|
||||||
|
|
@ -38,278 +34,26 @@ open http://localhost:8000
|
||||||
|
|
||||||
> ⚠️ **Important**: Change the default admin password immediately after first login!
|
> ⚠️ **Important**: Change the default admin password immediately after first login!
|
||||||
|
|
||||||
### What You Get
|
## 📚 Documentation
|
||||||
|
|
||||||
After deployment, you'll have:
|
### Getting Started
|
||||||
- **Web Interface**: Modern document management UI at `http://localhost:8000`
|
- [📦 Installation Guide](docs/installation.md) - Docker & manual installation instructions
|
||||||
- **PostgreSQL Database**: Document metadata and full-text search indexes
|
- [🔧 Configuration](docs/configuration.md) - Environment variables and settings
|
||||||
- **File Storage**: Persistent document storage with OCR processing
|
- [📖 User Guide](docs/user-guide.md) - How to use Readur effectively
|
||||||
- **Watch Folder**: Automatic file ingestion from mounted directories
|
|
||||||
- **REST API**: Full API access for integrations
|
|
||||||
|
|
||||||
## 🐳 Docker Deployment Guide
|
### Deployment & Operations
|
||||||
|
- [🚀 Deployment Guide](docs/deployment.md) - Production deployment, SSL, monitoring
|
||||||
|
- [🔄 Reverse Proxy Setup](docs/REVERSE_PROXY.md) - Nginx, Traefik, and more
|
||||||
|
- [📁 Watch Folder Guide](docs/WATCH_FOLDER.md) - Automatic document ingestion
|
||||||
|
|
||||||
### Production Docker Compose
|
### Development
|
||||||
|
- [🏗️ Developer Documentation](docs/dev/) - Architecture, development setup, testing
|
||||||
|
- [🔌 API Reference](docs/api-reference.md) - REST API documentation
|
||||||
|
|
||||||
For production deployments, create a custom `docker-compose.prod.yml`:
|
### Advanced Topics
|
||||||
|
- [🔍 OCR Optimization](docs/dev/OCR_OPTIMIZATION_GUIDE.md) - Improve OCR performance
|
||||||
```yaml
|
- [🗄️ Database Best Practices](docs/dev/DATABASE_GUARDRAILS.md) - Concurrency and safety
|
||||||
services:
|
- [📊 Queue Architecture](docs/dev/QUEUE_IMPROVEMENTS.md) - Background job processing
|
||||||
readur:
|
|
||||||
image: readur:latest
|
|
||||||
ports:
|
|
||||||
- "8000:8000"
|
|
||||||
environment:
|
|
||||||
# Core Configuration
|
|
||||||
- DATABASE_URL=postgresql://readur:${DB_PASSWORD}@postgres:5432/readur
|
|
||||||
- JWT_SECRET=${JWT_SECRET}
|
|
||||||
- SERVER_ADDRESS=0.0.0.0:8000
|
|
||||||
|
|
||||||
# File Storage
|
|
||||||
- UPLOAD_PATH=/app/uploads
|
|
||||||
- WATCH_FOLDER=/app/watch
|
|
||||||
- ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,doc,docx
|
|
||||||
|
|
||||||
# Watch Folder Settings
|
|
||||||
- WATCH_INTERVAL_SECONDS=30
|
|
||||||
- FILE_STABILITY_CHECK_MS=500
|
|
||||||
- MAX_FILE_AGE_HOURS=168
|
|
||||||
|
|
||||||
# OCR Configuration
|
|
||||||
- OCR_LANGUAGE=eng
|
|
||||||
- CONCURRENT_OCR_JOBS=4
|
|
||||||
- OCR_TIMEOUT_SECONDS=300
|
|
||||||
- MAX_FILE_SIZE_MB=100
|
|
||||||
|
|
||||||
# Performance Tuning
|
|
||||||
- MEMORY_LIMIT_MB=1024
|
|
||||||
- CPU_PRIORITY=normal
|
|
||||||
- ENABLE_COMPRESSION=true
|
|
||||||
|
|
||||||
volumes:
|
|
||||||
# Document storage
|
|
||||||
- ./data/uploads:/app/uploads
|
|
||||||
|
|
||||||
# Watch folder - mount your network drives here
|
|
||||||
- /mnt/nfs/documents:/app/watch
|
|
||||||
# or SMB: - /mnt/smb/shared:/app/watch
|
|
||||||
# or S3: - /mnt/s3/bucket:/app/watch
|
|
||||||
|
|
||||||
depends_on:
|
|
||||||
- postgres
|
|
||||||
restart: unless-stopped
|
|
||||||
|
|
||||||
# Resource limits for production
|
|
||||||
deploy:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
memory: 2G
|
|
||||||
cpus: '2.0'
|
|
||||||
reservations:
|
|
||||||
memory: 512M
|
|
||||||
cpus: '0.5'
|
|
||||||
|
|
||||||
postgres:
|
|
||||||
image: postgres:15
|
|
||||||
environment:
|
|
||||||
- POSTGRES_USER=readur
|
|
||||||
- POSTGRES_PASSWORD=${DB_PASSWORD}
|
|
||||||
- POSTGRES_DB=readur
|
|
||||||
- POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8
|
|
||||||
|
|
||||||
volumes:
|
|
||||||
- postgres_data:/var/lib/postgresql/data
|
|
||||||
- ./postgres-config:/etc/postgresql/conf.d:ro
|
|
||||||
|
|
||||||
# PostgreSQL optimization for document search
|
|
||||||
command: >
|
|
||||||
postgres
|
|
||||||
-c shared_buffers=256MB
|
|
||||||
-c effective_cache_size=1GB
|
|
||||||
-c max_connections=100
|
|
||||||
-c default_text_search_config=pg_catalog.english
|
|
||||||
|
|
||||||
restart: unless-stopped
|
|
||||||
|
|
||||||
# Don't expose port in production
|
|
||||||
# ports:
|
|
||||||
# - "5433:5432"
|
|
||||||
|
|
||||||
volumes:
|
|
||||||
postgres_data:
|
|
||||||
driver: local
|
|
||||||
```
|
|
||||||
|
|
||||||
### Environment Variables
|
|
||||||
|
|
||||||
#### Port Configuration
|
|
||||||
|
|
||||||
Readur supports flexible port configuration:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Method 1: Specify full server address
|
|
||||||
SERVER_ADDRESS=0.0.0.0:8000
|
|
||||||
|
|
||||||
# Method 2: Use separate host and port (recommended)
|
|
||||||
SERVER_HOST=0.0.0.0
|
|
||||||
SERVER_PORT=8000
|
|
||||||
|
|
||||||
# For development: Configure frontend port
|
|
||||||
CLIENT_PORT=5173
|
|
||||||
BACKEND_PORT=8000
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Security Configuration
|
|
||||||
|
|
||||||
Create a `.env` file for your secrets:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Generate secure secrets
|
|
||||||
JWT_SECRET=$(openssl rand -base64 64)
|
|
||||||
DB_PASSWORD=$(openssl rand -base64 32)
|
|
||||||
|
|
||||||
# Save to .env file
|
|
||||||
cat > .env << EOF
|
|
||||||
JWT_SECRET=${JWT_SECRET}
|
|
||||||
DB_PASSWORD=${DB_PASSWORD}
|
|
||||||
EOF
|
|
||||||
```
|
|
||||||
|
|
||||||
Deploy with:
|
|
||||||
```bash
|
|
||||||
docker compose -f docker-compose.prod.yml --env-file .env up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
### Network Filesystem Mounts
|
|
||||||
|
|
||||||
#### NFS Mounts
|
|
||||||
```bash
|
|
||||||
# Mount NFS share
|
|
||||||
sudo mount -t nfs 192.168.1.100:/documents /mnt/nfs/documents
|
|
||||||
|
|
||||||
# Add to docker-compose.yml
|
|
||||||
volumes:
|
|
||||||
- /mnt/nfs/documents:/app/watch
|
|
||||||
environment:
|
|
||||||
- WATCH_INTERVAL_SECONDS=60
|
|
||||||
- FILE_STABILITY_CHECK_MS=1000
|
|
||||||
- FORCE_POLLING_WATCH=1
|
|
||||||
```
|
|
||||||
|
|
||||||
#### SMB/CIFS Mounts
|
|
||||||
```bash
|
|
||||||
# Mount SMB share
|
|
||||||
sudo mount -t cifs //server/share /mnt/smb/shared -o username=user,password=pass
|
|
||||||
|
|
||||||
# Docker volume configuration
|
|
||||||
volumes:
|
|
||||||
- /mnt/smb/shared:/app/watch
|
|
||||||
environment:
|
|
||||||
- WATCH_INTERVAL_SECONDS=30
|
|
||||||
- FILE_STABILITY_CHECK_MS=2000
|
|
||||||
```
|
|
||||||
|
|
||||||
#### S3 Mounts (using s3fs)
|
|
||||||
```bash
|
|
||||||
# Mount S3 bucket
|
|
||||||
s3fs mybucket /mnt/s3/bucket -o passwd_file=~/.passwd-s3fs
|
|
||||||
|
|
||||||
# Docker configuration for S3
|
|
||||||
volumes:
|
|
||||||
- /mnt/s3/bucket:/app/watch
|
|
||||||
environment:
|
|
||||||
- WATCH_INTERVAL_SECONDS=120
|
|
||||||
- FILE_STABILITY_CHECK_MS=5000
|
|
||||||
- FORCE_POLLING_WATCH=1
|
|
||||||
```
|
|
||||||
|
|
||||||
### SSL/HTTPS Setup
|
|
||||||
|
|
||||||
Use a reverse proxy like Nginx or Traefik:
|
|
||||||
|
|
||||||
#### Nginx Configuration
|
|
||||||
```nginx
|
|
||||||
server {
|
|
||||||
listen 443 ssl http2;
|
|
||||||
server_name readur.yourdomain.com;
|
|
||||||
|
|
||||||
ssl_certificate /path/to/cert.pem;
|
|
||||||
ssl_certificate_key /path/to/key.pem;
|
|
||||||
|
|
||||||
location / {
|
|
||||||
proxy_pass http://localhost:8000;
|
|
||||||
proxy_set_header Host $host;
|
|
||||||
proxy_set_header X-Real-IP $remote_addr;
|
|
||||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
||||||
proxy_set_header X-Forwarded-Proto $scheme;
|
|
||||||
|
|
||||||
# For file uploads
|
|
||||||
client_max_body_size 100M;
|
|
||||||
proxy_read_timeout 300s;
|
|
||||||
proxy_send_timeout 300s;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Traefik Configuration
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
readur:
|
|
||||||
labels:
|
|
||||||
- "traefik.enable=true"
|
|
||||||
- "traefik.http.routers.readur.rule=Host(`readur.yourdomain.com`)"
|
|
||||||
- "traefik.http.routers.readur.tls=true"
|
|
||||||
- "traefik.http.routers.readur.tls.certresolver=letsencrypt"
|
|
||||||
```
|
|
||||||
|
|
||||||
> 📘 **For detailed reverse proxy configurations** including Apache, Caddy, custom ports, load balancing, and advanced scenarios, see [REVERSE_PROXY.md](./REVERSE_PROXY.md).
|
|
||||||
|
|
||||||
### Health Checks
|
|
||||||
|
|
||||||
Add health checks to your Docker configuration:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
readur:
|
|
||||||
healthcheck:
|
|
||||||
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
|
|
||||||
interval: 30s
|
|
||||||
timeout: 10s
|
|
||||||
retries: 3
|
|
||||||
start_period: 40s
|
|
||||||
```
|
|
||||||
|
|
||||||
### Backup Strategy
|
|
||||||
|
|
||||||
```bash
|
|
||||||
#!/bin/bash
|
|
||||||
# backup.sh - Automated backup script
|
|
||||||
|
|
||||||
# Backup database
|
|
||||||
docker exec readur-postgres-1 pg_dump -U readur readur | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
|
|
||||||
|
|
||||||
# Backup uploaded files
|
|
||||||
tar -czf uploads_backup_$(date +%Y%m%d_%H%M%S).tar.gz -C ./data uploads/
|
|
||||||
|
|
||||||
# Clean old backups (keep 30 days)
|
|
||||||
find . -name "backup_*.sql.gz" -mtime +30 -delete
|
|
||||||
find . -name "uploads_backup_*.tar.gz" -mtime +30 -delete
|
|
||||||
```
|
|
||||||
|
|
||||||
### Monitoring
|
|
||||||
|
|
||||||
Monitor your deployment with Docker stats:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Real-time resource usage
|
|
||||||
docker stats
|
|
||||||
|
|
||||||
# Container logs
|
|
||||||
docker compose logs -f readur
|
|
||||||
|
|
||||||
# Watch folder activity
|
|
||||||
docker compose logs -f readur | grep watcher
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🏗️ Architecture
|
## 🏗️ Architecture
|
||||||
|
|
||||||
|
|
@ -327,495 +71,24 @@ docker compose logs -f readur | grep watcher
|
||||||
|
|
||||||
## 📋 System Requirements
|
## 📋 System Requirements
|
||||||
|
|
||||||
### Minimum Requirements
|
### Minimum
|
||||||
- **CPU**: 2 cores
|
- 2 CPU cores, 2GB RAM, 10GB storage
|
||||||
- **RAM**: 2GB
|
- Docker or manual installation prerequisites
|
||||||
- **Storage**: 10GB free space
|
|
||||||
- **OS**: Linux, macOS, or Windows with Docker
|
|
||||||
|
|
||||||
### Recommended for Production
|
### Recommended for Production
|
||||||
- **CPU**: 4+ cores
|
- 4+ CPU cores, 4GB+ RAM, 50GB+ SSD
|
||||||
- **RAM**: 4GB+
|
- See [deployment guide](docs/deployment.md) for details
|
||||||
- **Storage**: 50GB+ SSD
|
|
||||||
- **Network**: Stable internet connection for OCR processing
|
|
||||||
|
|
||||||
## 🛠️ Manual Installation
|
|
||||||
|
|
||||||
For development or custom deployments without Docker:
|
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
|
|
||||||
Install these dependencies on your system:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Ubuntu/Debian
|
|
||||||
sudo apt-get update
|
|
||||||
sudo apt-get install -y \
|
|
||||||
tesseract-ocr tesseract-ocr-eng \
|
|
||||||
libtesseract-dev libleptonica-dev \
|
|
||||||
postgresql postgresql-contrib \
|
|
||||||
pkg-config libclang-dev
|
|
||||||
|
|
||||||
# macOS (requires Homebrew)
|
|
||||||
brew install tesseract leptonica postgresql rust nodejs npm
|
|
||||||
|
|
||||||
# Install Rust (if not already installed)
|
|
||||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Backend Setup
|
|
||||||
|
|
||||||
1. **Configure Database**:
|
|
||||||
```bash
|
|
||||||
# Create database and user
|
|
||||||
sudo -u postgres psql
|
|
||||||
CREATE DATABASE readur;
|
|
||||||
CREATE USER readur_user WITH ENCRYPTED PASSWORD 'your_password';
|
|
||||||
GRANT ALL PRIVILEGES ON DATABASE readur TO readur_user;
|
|
||||||
\q
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Environment Configuration**:
|
|
||||||
```bash
|
|
||||||
# Copy environment template
|
|
||||||
cp .env.example .env
|
|
||||||
|
|
||||||
# Edit configuration
|
|
||||||
nano .env
|
|
||||||
```
|
|
||||||
|
|
||||||
Required environment variables:
|
|
||||||
```env
|
|
||||||
DATABASE_URL=postgresql://readur_user:your_password@localhost/readur
|
|
||||||
JWT_SECRET=your-super-secret-jwt-key-change-this
|
|
||||||
SERVER_ADDRESS=0.0.0.0:8000
|
|
||||||
UPLOAD_PATH=./uploads
|
|
||||||
WATCH_FOLDER=./watch
|
|
||||||
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,gif,bmp,tiff,txt,rtf,doc,docx
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Build and Run Backend**:
|
|
||||||
```bash
|
|
||||||
# Install dependencies and run
|
|
||||||
cargo build --release
|
|
||||||
cargo run
|
|
||||||
```
|
|
||||||
|
|
||||||
### Frontend Setup
|
|
||||||
|
|
||||||
1. **Install Dependencies**:
|
|
||||||
```bash
|
|
||||||
cd frontend
|
|
||||||
npm install
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Development Mode**:
|
|
||||||
```bash
|
|
||||||
npm run dev
|
|
||||||
# Frontend available at http://localhost:5173
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Production Build**:
|
|
||||||
```bash
|
|
||||||
npm run build
|
|
||||||
# Built files in frontend/dist/
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📖 User Guide
|
|
||||||
|
|
||||||
### Getting Started
|
|
||||||
|
|
||||||
1. **First Login**: Use the default admin credentials to access the system
|
|
||||||
2. **Upload Documents**: Drag and drop files or use the upload button
|
|
||||||
3. **Wait for Processing**: OCR processing happens automatically in the background
|
|
||||||
4. **Search and Organize**: Use the powerful search features to find your documents
|
|
||||||
|
|
||||||
### Supported File Types
|
|
||||||
|
|
||||||
| Type | Extensions | OCR Support | Notes |
|
|
||||||
|------|-----------|-------------|-------|
|
|
||||||
| **PDF** | `.pdf` | ✅ | Text extraction + OCR for scanned pages |
|
|
||||||
| **Images** | `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | ✅ | Full OCR text extraction |
|
|
||||||
| **Text** | `.txt`, `.rtf` | ❌ | Direct text indexing |
|
|
||||||
| **Office** | `.doc`, `.docx` | ⚠️ | Limited support |
|
|
||||||
|
|
||||||
### Using the Interface
|
|
||||||
|
|
||||||
#### Dashboard
|
|
||||||
- **Document Statistics**: Total documents, storage usage, OCR status
|
|
||||||
- **Recent Activity**: Latest uploads and processing status
|
|
||||||
- **Quick Actions**: Fast access to upload and search
|
|
||||||
|
|
||||||
#### Document Management
|
|
||||||
- **List/Grid View**: Toggle between different viewing modes
|
|
||||||
- **Sorting**: Sort by date, name, size, or file type
|
|
||||||
- **Filtering**: Filter by tags, file types, and OCR status
|
|
||||||
- **Bulk Actions**: Select multiple documents for batch operations
|
|
||||||
|
|
||||||
#### Advanced Search
|
|
||||||
- **Full-text Search**: Search within document content
|
|
||||||
- **Metadata Filters**: Filter by upload date, file size, type
|
|
||||||
- **Tag System**: Organize documents with custom tags
|
|
||||||
- **OCR Status**: Find processed vs. pending documents
|
|
||||||
|
|
||||||
#### Folder Watching
|
|
||||||
- **Non-destructive**: Unlike paperless-ngx, source files remain untouched
|
|
||||||
- **Automatic Processing**: New files are detected and processed automatically
|
|
||||||
- **Configurable**: Set custom watch directories
|
|
||||||
|
|
||||||
### Tips for Best Results
|
|
||||||
|
|
||||||
1. **OCR Quality**: Higher resolution images (300+ DPI) produce better OCR results
|
|
||||||
2. **File Organization**: Use consistent naming conventions for easier searching
|
|
||||||
3. **Regular Backups**: Backup both database and file storage regularly
|
|
||||||
4. **Performance**: For large document collections, consider increasing server resources
|
|
||||||
|
|
||||||
## 🔧 Configuration
|
|
||||||
|
|
||||||
### Environment Variables
|
|
||||||
|
|
||||||
All application settings can be configured via environment variables:
|
|
||||||
|
|
||||||
#### Core Configuration
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `DATABASE_URL` | `postgresql://readur:readur@localhost/readur` | PostgreSQL connection string |
|
|
||||||
| `JWT_SECRET` | `your-secret-key` | Secret key for JWT tokens ⚠️ **Change in production!** |
|
|
||||||
| `SERVER_ADDRESS` | `0.0.0.0:8000` | Server bind address and port |
|
|
||||||
|
|
||||||
#### File Storage & Upload
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `UPLOAD_PATH` | `./uploads` | Document storage directory |
|
|
||||||
| `ALLOWED_FILE_TYPES` | `pdf,txt,doc,docx,png,jpg,jpeg` | Comma-separated allowed file extensions |
|
|
||||||
|
|
||||||
#### Watch Folder Configuration
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `WATCH_FOLDER` | `./watch` | Directory to monitor for new files |
|
|
||||||
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems (seconds) |
|
|
||||||
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file write completion (milliseconds) |
|
|
||||||
| `MAX_FILE_AGE_HOURS` | _(none)_ | Skip files older than this many hours |
|
|
||||||
| `FORCE_POLLING_WATCH` | _(none)_ | Force polling mode even for local filesystems |
|
|
||||||
|
|
||||||
#### OCR & Processing Settings
|
|
||||||
*Note: These settings can also be configured per-user via the web interface*
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `OCR_LANGUAGE` | `eng` | OCR language code (eng, fra, deu, spa, etc.) |
|
|
||||||
| `CONCURRENT_OCR_JOBS` | `4` | Maximum parallel OCR processes |
|
|
||||||
| `OCR_TIMEOUT_SECONDS` | `300` | OCR processing timeout per file |
|
|
||||||
| `MAX_FILE_SIZE_MB` | `50` | Maximum file size for processing |
|
|
||||||
| `AUTO_ROTATE_IMAGES` | `true` | Automatically rotate images for better OCR |
|
|
||||||
| `ENABLE_IMAGE_PREPROCESSING` | `true` | Apply image enhancement before OCR |
|
|
||||||
|
|
||||||
#### Search & Performance
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `SEARCH_RESULTS_PER_PAGE` | `25` | Default number of search results per page |
|
|
||||||
| `SEARCH_SNIPPET_LENGTH` | `200` | Length of text snippets in search results |
|
|
||||||
| `FUZZY_SEARCH_THRESHOLD` | `0.8` | Similarity threshold for fuzzy search (0.0-1.0) |
|
|
||||||
| `MEMORY_LIMIT_MB` | `512` | Memory limit for OCR processes |
|
|
||||||
| `CPU_PRIORITY` | `normal` | CPU priority: `low`, `normal`, `high` |
|
|
||||||
|
|
||||||
#### Data Management
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `RETENTION_DAYS` | _(none)_ | Auto-delete documents after N days |
|
|
||||||
| `ENABLE_AUTO_CLEANUP` | `false` | Enable automatic cleanup of old documents |
|
|
||||||
| `ENABLE_COMPRESSION` | `false` | Compress stored documents to save space |
|
|
||||||
| `ENABLE_BACKGROUND_OCR` | `true` | Process OCR in background queue |
|
|
||||||
|
|
||||||
### Example Production Configuration
|
|
||||||
|
|
||||||
```env
|
|
||||||
# Core settings
|
|
||||||
DATABASE_URL=postgresql://readur:secure_password@postgres:5432/readur
|
|
||||||
JWT_SECRET=your-very-long-random-secret-key-generated-with-openssl
|
|
||||||
SERVER_ADDRESS=0.0.0.0:8000
|
|
||||||
|
|
||||||
# File handling
|
|
||||||
UPLOAD_PATH=/app/uploads
|
|
||||||
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,rtf,doc,docx
|
|
||||||
|
|
||||||
# Watch folder for NFS mount
|
|
||||||
WATCH_FOLDER=/mnt/nfs/documents
|
|
||||||
WATCH_INTERVAL_SECONDS=60
|
|
||||||
FILE_STABILITY_CHECK_MS=1000
|
|
||||||
MAX_FILE_AGE_HOURS=168
|
|
||||||
FORCE_POLLING_WATCH=1
|
|
||||||
|
|
||||||
# OCR optimization
|
|
||||||
OCR_LANGUAGE=eng
|
|
||||||
CONCURRENT_OCR_JOBS=8
|
|
||||||
OCR_TIMEOUT_SECONDS=600
|
|
||||||
MAX_FILE_SIZE_MB=200
|
|
||||||
AUTO_ROTATE_IMAGES=true
|
|
||||||
ENABLE_IMAGE_PREPROCESSING=true
|
|
||||||
|
|
||||||
# Performance tuning
|
|
||||||
MEMORY_LIMIT_MB=2048
|
|
||||||
CPU_PRIORITY=high
|
|
||||||
ENABLE_COMPRESSION=true
|
|
||||||
ENABLE_BACKGROUND_OCR=true
|
|
||||||
|
|
||||||
# Search optimization
|
|
||||||
SEARCH_RESULTS_PER_PAGE=50
|
|
||||||
SEARCH_SNIPPET_LENGTH=300
|
|
||||||
FUZZY_SEARCH_THRESHOLD=0.7
|
|
||||||
|
|
||||||
# Data management
|
|
||||||
RETENTION_DAYS=2555 # 7 years
|
|
||||||
ENABLE_AUTO_CLEANUP=true
|
|
||||||
```
|
|
||||||
|
|
||||||
### Runtime Settings vs Environment Variables
|
|
||||||
|
|
||||||
Some settings can be configured in two ways:
|
|
||||||
|
|
||||||
1. **Environment Variables**: Set at container startup, affects the entire application
|
|
||||||
2. **User Settings**: Configured per-user via the web interface, stored in database
|
|
||||||
|
|
||||||
**Environment variables take precedence** and provide system-wide defaults. User settings override these defaults for individual users where applicable.
|
|
||||||
|
|
||||||
Settings configurable via web interface:
|
|
||||||
- OCR language preferences
|
|
||||||
- Search result limits
|
|
||||||
- File type restrictions
|
|
||||||
- OCR processing options
|
|
||||||
- Data retention policies
|
|
||||||
|
|
||||||
### Configuration Priority
|
|
||||||
|
|
||||||
Settings are applied in this order (later values override earlier ones):
|
|
||||||
|
|
||||||
1. **Application defaults** (built into the code)
|
|
||||||
2. **Environment variables** (system-wide configuration)
|
|
||||||
3. **User settings** (per-user database settings via web interface)
|
|
||||||
|
|
||||||
This allows for flexible deployment where system administrators can set defaults while users can customize their experience.
|
|
||||||
|
|
||||||
### Quick Reference - Essential Variables
|
|
||||||
|
|
||||||
For a minimal production deployment, configure these essential variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Security (REQUIRED)
|
|
||||||
JWT_SECRET=your-secure-random-key-here
|
|
||||||
DATABASE_URL=postgresql://user:password@host:port/database
|
|
||||||
|
|
||||||
# File Storage
|
|
||||||
UPLOAD_PATH=/app/uploads
|
|
||||||
WATCH_FOLDER=/path/to/mounted/folder
|
|
||||||
|
|
||||||
# Watch Folder (for network mounts)
|
|
||||||
WATCH_INTERVAL_SECONDS=60
|
|
||||||
FORCE_POLLING_WATCH=1
|
|
||||||
|
|
||||||
# Performance
|
|
||||||
CONCURRENT_OCR_JOBS=4
|
|
||||||
MAX_FILE_SIZE_MB=100
|
|
||||||
```
|
|
||||||
|
|
||||||
### Database Tuning
|
|
||||||
|
|
||||||
For better search performance with large document collections:
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Increase shared_buffers for better caching
|
|
||||||
ALTER SYSTEM SET shared_buffers = '256MB';
|
|
||||||
|
|
||||||
-- Optimize for full-text search
|
|
||||||
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
|
|
||||||
|
|
||||||
-- Restart PostgreSQL after changes
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔌 API Reference
|
|
||||||
|
|
||||||
### Authentication Endpoints
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Register new user
|
|
||||||
POST /api/auth/register
|
|
||||||
Content-Type: application/json
|
|
||||||
{
|
|
||||||
"username": "john_doe",
|
|
||||||
"email": "john@example.com",
|
|
||||||
"password": "secure_password"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Login
|
|
||||||
POST /api/auth/login
|
|
||||||
Content-Type: application/json
|
|
||||||
{
|
|
||||||
"username": "john_doe",
|
|
||||||
"password": "secure_password"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Get current user
|
|
||||||
GET /api/auth/me
|
|
||||||
Authorization: Bearer <jwt_token>
|
|
||||||
```
|
|
||||||
|
|
||||||
### Document Management
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Upload document
|
|
||||||
POST /api/documents
|
|
||||||
Authorization: Bearer <jwt_token>
|
|
||||||
Content-Type: multipart/form-data
|
|
||||||
file: <binary_file_data>
|
|
||||||
|
|
||||||
# List documents
|
|
||||||
GET /api/documents?limit=50&offset=0
|
|
||||||
Authorization: Bearer <jwt_token>
|
|
||||||
|
|
||||||
# Download document
|
|
||||||
GET /api/documents/{id}/download
|
|
||||||
Authorization: Bearer <jwt_token>
|
|
||||||
```
|
|
||||||
|
|
||||||
### Search
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Search documents
|
|
||||||
GET /api/search?query=contract&limit=20
|
|
||||||
Authorization: Bearer <jwt_token>
|
|
||||||
|
|
||||||
# Advanced search with filters
|
|
||||||
GET /api/search?query=invoice&mime_types=application/pdf&tags=important
|
|
||||||
Authorization: Bearer <jwt_token>
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🧪 Testing
|
|
||||||
|
|
||||||
### Run All Tests
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Backend tests
|
|
||||||
cargo test
|
|
||||||
|
|
||||||
# Frontend tests
|
|
||||||
cd frontend && npm test
|
|
||||||
|
|
||||||
# Integration tests with Docker
|
|
||||||
docker compose -f docker-compose.test.yml up --build
|
|
||||||
```
|
|
||||||
|
|
||||||
### Test Coverage
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Install cargo-tarpaulin for coverage
|
|
||||||
cargo install cargo-tarpaulin
|
|
||||||
|
|
||||||
# Generate coverage report
|
|
||||||
cargo tarpaulin --out Html
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔒 Security Considerations
|
|
||||||
|
|
||||||
### Production Deployment
|
|
||||||
|
|
||||||
1. **Change Default Credentials**: Update admin password immediately
|
|
||||||
2. **Use Strong JWT Secret**: Generate a secure random key
|
|
||||||
3. **Enable HTTPS**: Use a reverse proxy with SSL/TLS
|
|
||||||
4. **Database Security**: Use strong passwords and restrict network access
|
|
||||||
5. **File Permissions**: Ensure proper file system permissions
|
|
||||||
6. **Regular Updates**: Keep dependencies and base images updated
|
|
||||||
|
|
||||||
### Recommended Production Setup
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Use environment-specific secrets
|
|
||||||
JWT_SECRET=$(openssl rand -base64 64)
|
|
||||||
|
|
||||||
# Restrict database access
|
|
||||||
# Only allow connections from application container
|
|
||||||
|
|
||||||
# Use read-only file system where possible
|
|
||||||
# Mount uploads and watch folders as separate volumes
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🚀 Deployment Options
|
|
||||||
|
|
||||||
### Docker Swarm
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
version: '3.8'
|
|
||||||
services:
|
|
||||||
readur:
|
|
||||||
image: readur:latest
|
|
||||||
deploy:
|
|
||||||
replicas: 2
|
|
||||||
restart_policy:
|
|
||||||
condition: on-failure
|
|
||||||
networks:
|
|
||||||
- readur-network
|
|
||||||
secrets:
|
|
||||||
- jwt_secret
|
|
||||||
- db_password
|
|
||||||
```
|
|
||||||
|
|
||||||
### Kubernetes
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: readur
|
|
||||||
spec:
|
|
||||||
replicas: 3
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
app: readur
|
|
||||||
template:
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: readur
|
|
||||||
image: readur:latest
|
|
||||||
env:
|
|
||||||
- name: JWT_SECRET
|
|
||||||
valueFrom:
|
|
||||||
secretKeyRef:
|
|
||||||
name: readur-secrets
|
|
||||||
key: jwt-secret
|
|
||||||
```
|
|
||||||
|
|
||||||
### Cloud Platforms
|
|
||||||
|
|
||||||
- **AWS**: Use ECS with RDS PostgreSQL
|
|
||||||
- **Google Cloud**: Deploy to Cloud Run with Cloud SQL
|
|
||||||
- **Azure**: Use Container Instances with Azure Database
|
|
||||||
- **DigitalOcean**: App Platform with Managed Database
|
|
||||||
|
|
||||||
## 🤝 Contributing
|
## 🤝 Contributing
|
||||||
|
|
||||||
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
|
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) and [Development Setup](docs/dev/development.md) for details.
|
||||||
|
|
||||||
### Development Setup
|
## 🔒 Security
|
||||||
|
|
||||||
```bash
|
- Change default credentials immediately
|
||||||
# Fork and clone the repository
|
- Use HTTPS in production
|
||||||
git clone https://github.com/yourusername/readur.git
|
- Regular security updates
|
||||||
cd readur
|
- See [deployment guide](docs/deployment.md#security-considerations) for security best practices
|
||||||
|
|
||||||
# Create a feature branch
|
|
||||||
git checkout -b feature/amazing-feature
|
|
||||||
|
|
||||||
# Make your changes and test
|
|
||||||
cargo test
|
|
||||||
cd frontend && npm test
|
|
||||||
|
|
||||||
# Submit a pull request
|
|
||||||
```
|
|
||||||
|
|
||||||
### Code Style
|
|
||||||
|
|
||||||
- **Rust**: Follow `rustfmt` and `clippy` recommendations
|
|
||||||
- **Frontend**: Use Prettier and ESLint configurations
|
|
||||||
- **Commits**: Use conventional commit format
|
|
||||||
|
|
||||||
## 📝 License
|
## 📝 License
|
||||||
|
|
||||||
|
|
@ -830,9 +103,9 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
||||||
|
|
||||||
## 📞 Support
|
## 📞 Support
|
||||||
|
|
||||||
- **Documentation**: Check this README and inline code comments
|
- **Documentation**: Start with the [User Guide](docs/user-guide.md)
|
||||||
- **Issues**: Report bugs and request features on GitHub Issues
|
- **Issues**: Report bugs on [GitHub Issues](https://github.com/perfectra1n/readur/issues)
|
||||||
- **Discussions**: Join community discussions on GitHub Discussions
|
- **Discussions**: Join our [GitHub Discussions](https://github.com/perfectra1n/readur/discussions)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,31 +1,68 @@
|
||||||
# Watch Folder Documentation
|
# Watch Folder Guide
|
||||||
|
|
||||||
The watch folder feature automatically monitors a directory for new OCR-able files and processes them without deleting the original files. This is perfect for scenarios where files are mounted from various filesystem types including NFS, SMB, S3, and local storage.
|
The watch folder feature automatically monitors a directory for new files and processes them with OCR, making them searchable in Readur. Your original files are never modified or deleted - Readur simply copies and processes them while leaving the originals untouched.
|
||||||
|
|
||||||
## Features
|
## What is Watch Folder?
|
||||||
|
|
||||||
### 🔄 Cross-Filesystem Compatibility
|
Watch folder allows you to:
|
||||||
- **Automatic Detection**: Detects filesystem type and chooses optimal watching strategy
|
- **Drop files anywhere** - Point Readur to any folder (local, network drive, cloud mount)
|
||||||
- **Local Filesystems**: Uses efficient inotify-based watching for ext4, NTFS, APFS, etc.
|
- **Automatic processing** - New files are automatically detected and processed
|
||||||
- **Network Filesystems**: Uses polling-based watching for NFS, SMB/CIFS, S3 mounts
|
- **Non-destructive** - Original files remain exactly where you put them
|
||||||
- **Hybrid Fallback**: Gracefully falls back to polling if inotify fails
|
- **Background operation** - Processing happens in the background while you continue working
|
||||||
|
|
||||||
### 📁 Smart File Processing
|
Perfect for scenarios where you want to automatically process files from:
|
||||||
- **OCR-able File Detection**: Only processes supported file types (PDF, images, text, Word docs)
|
- Network drives (NFS, SMB shares)
|
||||||
- **Duplicate Prevention**: Checks for existing files with same name and size
|
- Cloud storage mounts (Google Drive, Dropbox, OneDrive)
|
||||||
- **File Stability**: Waits for files to finish being written before processing
|
- Local folders where you save scanned documents
|
||||||
- **System File Exclusion**: Skips hidden files, temporary files, and system directories
|
- Shared team folders
|
||||||
|
|
||||||
### ⚙️ Configuration Options
|
## How It Works
|
||||||
|
|
||||||
| Environment Variable | Default | Description |
|
1. **Point Readur to your folder** - Set the `WATCH_FOLDER` path to any directory you want monitored
|
||||||
|---------------------|---------|-------------|
|
2. **Drop files** - Add documents to that folder (PDFs, images, text files, Word docs)
|
||||||
| `WATCH_FOLDER` | `./watch` | Path to the folder to monitor |
|
3. **Automatic detection** - Readur notices new files within seconds (local) or minutes (network)
|
||||||
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems |
|
4. **OCR processing** - Files are automatically processed to extract searchable text
|
||||||
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file stability |
|
5. **Search and find** - Your documents become searchable in the Readur web interface
|
||||||
| `MAX_FILE_AGE_HOURS` | `none` | Skip files older than specified hours |
|
|
||||||
| `ALLOWED_FILE_TYPES` | `pdf,png,jpg,jpeg,tiff,bmp,txt,doc,docx` | Allowed file extensions |
|
## Key Features
|
||||||
| `FORCE_POLLING_WATCH` | `unset` | Force polling mode even for local filesystems |
|
|
||||||
|
✅ **Works with any storage type** - Local drives, network shares, cloud mounts
|
||||||
|
✅ **Smart processing** - Only processes supported file types
|
||||||
|
✅ **Duplicate prevention** - Won't process the same file twice
|
||||||
|
✅ **Safe operation** - Never modifies or deletes your original files
|
||||||
|
✅ **Background processing** - Doesn't interrupt your workflow
|
||||||
|
|
||||||
|
## Quick Setup
|
||||||
|
|
||||||
|
### Basic Setup (Docker Compose)
|
||||||
|
|
||||||
|
1. **Edit your docker-compose.yml**:
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
readur:
|
||||||
|
image: readur:latest
|
||||||
|
volumes:
|
||||||
|
# Mount your folder to the watch directory
|
||||||
|
- /path/to/your/documents:/app/watch
|
||||||
|
environment:
|
||||||
|
- WATCH_FOLDER=/app/watch
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Start Readur**:
|
||||||
|
```bash
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Start dropping files** into `/path/to/your/documents` - they'll be automatically processed!
|
||||||
|
|
||||||
|
### Configuration Options
|
||||||
|
|
||||||
|
| Setting | Default | What it does |
|
||||||
|
|---------|---------|-------------|
|
||||||
|
| `WATCH_FOLDER` | `./watch` | Which folder to monitor |
|
||||||
|
| `WATCH_INTERVAL_SECONDS` | `30` | How often to check for new files (network drives) |
|
||||||
|
| `MAX_FILE_AGE_HOURS` | _(none)_ | Ignore files older than this |
|
||||||
|
| `ALLOWED_FILE_TYPES` | `pdf,png,jpg,jpeg,tiff,bmp,txt,doc,docx` | Which file types to process |
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,618 @@
|
||||||
|
# API Reference
|
||||||
|
|
||||||
|
Readur provides a comprehensive REST API for integrating with external systems and building custom workflows.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Base URL](#base-url)
|
||||||
|
- [Authentication](#authentication)
|
||||||
|
- [Error Handling](#error-handling)
|
||||||
|
- [Rate Limiting](#rate-limiting)
|
||||||
|
- [Endpoints](#endpoints)
|
||||||
|
- [Authentication](#authentication-endpoints)
|
||||||
|
- [Documents](#document-endpoints)
|
||||||
|
- [Search](#search-endpoints)
|
||||||
|
- [OCR Queue](#ocr-queue-endpoints)
|
||||||
|
- [Settings](#settings-endpoints)
|
||||||
|
- [Sources](#sources-endpoints)
|
||||||
|
- [Labels](#labels-endpoints)
|
||||||
|
- [Users](#user-endpoints)
|
||||||
|
- [WebSocket API](#websocket-api)
|
||||||
|
- [Examples](#examples)
|
||||||
|
|
||||||
|
## Base URL
|
||||||
|
|
||||||
|
```
|
||||||
|
http://localhost:8000/api
|
||||||
|
```
|
||||||
|
|
||||||
|
For production deployments, replace with your configured domain and ensure HTTPS is used.
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
Readur uses JWT (JSON Web Token) authentication. Include the token in the Authorization header:
|
||||||
|
|
||||||
|
```
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Obtaining a Token
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/auth/login
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"username": "admin",
|
||||||
|
"password": "your_password"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||||
|
"user": {
|
||||||
|
"id": 1,
|
||||||
|
"username": "admin",
|
||||||
|
"email": "admin@example.com",
|
||||||
|
"role": "admin"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
All API errors follow a consistent format:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"error": {
|
||||||
|
"code": "VALIDATION_ERROR",
|
||||||
|
"message": "Invalid request parameters",
|
||||||
|
"details": {
|
||||||
|
"field": "email",
|
||||||
|
"reason": "Invalid email format"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Common HTTP status codes:
|
||||||
|
- `200` - Success
|
||||||
|
- `201` - Created
|
||||||
|
- `400` - Bad Request
|
||||||
|
- `401` - Unauthorized
|
||||||
|
- `403` - Forbidden
|
||||||
|
- `404` - Not Found
|
||||||
|
- `422` - Validation Error
|
||||||
|
- `500` - Internal Server Error
|
||||||
|
|
||||||
|
## Rate Limiting
|
||||||
|
|
||||||
|
API requests are rate-limited to prevent abuse:
|
||||||
|
- Authenticated users: 1000 requests per hour
|
||||||
|
- Unauthenticated users: 100 requests per hour
|
||||||
|
|
||||||
|
Rate limit headers:
|
||||||
|
```
|
||||||
|
X-RateLimit-Limit: 1000
|
||||||
|
X-RateLimit-Remaining: 999
|
||||||
|
X-RateLimit-Reset: 1640995200
|
||||||
|
```
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
### Authentication Endpoints
|
||||||
|
|
||||||
|
#### Register New User
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/auth/register
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"username": "john_doe",
|
||||||
|
"email": "john@example.com",
|
||||||
|
"password": "secure_password"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Login
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/auth/login
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"username": "john_doe",
|
||||||
|
"password": "secure_password"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Get Current User
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/auth/me
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Logout
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/auth/logout
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Document Endpoints
|
||||||
|
|
||||||
|
#### Upload Document
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/documents
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: multipart/form-data
|
||||||
|
|
||||||
|
file: <binary_file_data>
|
||||||
|
tags: ["invoice", "2024"] # Optional
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "550e8400-e29b-41d4-a716-446655440000",
|
||||||
|
"filename": "invoice_2024.pdf",
|
||||||
|
"mime_type": "application/pdf",
|
||||||
|
"size": 1048576,
|
||||||
|
"uploaded_at": "2024-01-01T00:00:00Z",
|
||||||
|
"ocr_status": "pending"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### List Documents
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/documents?limit=50&offset=0&sort=-uploaded_at
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
Query parameters:
|
||||||
|
- `limit` - Number of results (default: 50, max: 100)
|
||||||
|
- `offset` - Pagination offset
|
||||||
|
- `sort` - Sort field (prefix with `-` for descending)
|
||||||
|
- `mime_type` - Filter by MIME type
|
||||||
|
- `ocr_status` - Filter by OCR status
|
||||||
|
- `tag` - Filter by tag
|
||||||
|
|
||||||
|
#### Get Document Details
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/documents/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Download Document
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/documents/{id}/download
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Delete Document
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DELETE /api/documents/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Update Document
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PATCH /api/documents/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"tags": ["invoice", "paid", "2024"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search Endpoints
|
||||||
|
|
||||||
|
#### Search Documents
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/search?query=invoice&limit=20
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
Query parameters:
|
||||||
|
- `query` - Search query (required)
|
||||||
|
- `limit` - Number of results
|
||||||
|
- `offset` - Pagination offset
|
||||||
|
- `mime_types` - Comma-separated MIME types
|
||||||
|
- `tags` - Comma-separated tags
|
||||||
|
- `date_from` - Start date (ISO 8601)
|
||||||
|
- `date_to` - End date (ISO 8601)
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"results": [
|
||||||
|
{
|
||||||
|
"id": "550e8400-e29b-41d4-a716-446655440000",
|
||||||
|
"filename": "invoice_2024.pdf",
|
||||||
|
"snippet": "...invoice for services rendered in Q1 2024...",
|
||||||
|
"score": 0.95,
|
||||||
|
"highlights": ["invoice", "2024"]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"total": 42,
|
||||||
|
"limit": 20,
|
||||||
|
"offset": 0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Advanced Search
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/search/advanced
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"query": "invoice",
|
||||||
|
"filters": {
|
||||||
|
"mime_types": ["application/pdf"],
|
||||||
|
"tags": ["unpaid"],
|
||||||
|
"date_range": {
|
||||||
|
"from": "2024-01-01",
|
||||||
|
"to": "2024-12-31"
|
||||||
|
},
|
||||||
|
"file_size": {
|
||||||
|
"min": 1024,
|
||||||
|
"max": 10485760
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"fuzzy": true,
|
||||||
|
"snippet_length": 200
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### OCR Queue Endpoints
|
||||||
|
|
||||||
|
#### Get Queue Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/queue/status
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"pending": 15,
|
||||||
|
"processing": 3,
|
||||||
|
"completed_today": 127,
|
||||||
|
"failed_today": 2,
|
||||||
|
"average_processing_time": 4.5
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Reprocess Document
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/documents/{id}/reprocess
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Get Failed OCR Jobs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/queue/failed
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Settings Endpoints
|
||||||
|
|
||||||
|
#### Get User Settings
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/settings
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Update User Settings
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PUT /api/settings
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"ocr_language": "eng",
|
||||||
|
"search_results_per_page": 50,
|
||||||
|
"enable_notifications": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sources Endpoints
|
||||||
|
|
||||||
|
#### List Sources
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/sources
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Create Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/sources
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"name": "Network Drive",
|
||||||
|
"type": "local_folder",
|
||||||
|
"config": {
|
||||||
|
"path": "/mnt/network/documents",
|
||||||
|
"scan_interval": 3600
|
||||||
|
},
|
||||||
|
"enabled": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Update Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PUT /api/sources/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"enabled": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Delete Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DELETE /api/sources/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Sync Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/sources/{id}/sync
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Labels Endpoints
|
||||||
|
|
||||||
|
#### List Labels
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/labels
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Create Label
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/labels
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"name": "Important",
|
||||||
|
"color": "#FF0000"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Update Label
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PUT /api/labels/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"name": "Very Important",
|
||||||
|
"color": "#FF00FF"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Delete Label
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DELETE /api/labels/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
### User Endpoints
|
||||||
|
|
||||||
|
#### List Users (Admin Only)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/users
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Get User
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/users/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Update User
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PUT /api/users/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"email": "newemail@example.com",
|
||||||
|
"role": "user"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Delete User (Admin Only)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DELETE /api/users/{id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
## WebSocket API
|
||||||
|
|
||||||
|
Connect to receive real-time updates:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const ws = new WebSocket('ws://localhost:8000/ws');
|
||||||
|
|
||||||
|
ws.onmessage = (event) => {
|
||||||
|
const data = JSON.parse(event.data);
|
||||||
|
console.log('Event:', data);
|
||||||
|
};
|
||||||
|
|
||||||
|
// Authenticate
|
||||||
|
ws.send(JSON.stringify({
|
||||||
|
type: 'auth',
|
||||||
|
token: 'your_jwt_token'
|
||||||
|
}));
|
||||||
|
```
|
||||||
|
|
||||||
|
Event types:
|
||||||
|
- `document.uploaded` - New document uploaded
|
||||||
|
- `ocr.completed` - OCR processing completed
|
||||||
|
- `ocr.failed` - OCR processing failed
|
||||||
|
- `source.sync.completed` - Source sync finished
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Python Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
BASE_URL = "http://localhost:8000/api"
|
||||||
|
USERNAME = "admin"
|
||||||
|
PASSWORD = "your_password"
|
||||||
|
|
||||||
|
# Login
|
||||||
|
response = requests.post(f"{BASE_URL}/auth/login", json={
|
||||||
|
"username": USERNAME,
|
||||||
|
"password": PASSWORD
|
||||||
|
})
|
||||||
|
token = response.json()["token"]
|
||||||
|
headers = {"Authorization": f"Bearer {token}"}
|
||||||
|
|
||||||
|
# Upload document
|
||||||
|
with open("document.pdf", "rb") as f:
|
||||||
|
files = {"file": ("document.pdf", f, "application/pdf")}
|
||||||
|
response = requests.post(
|
||||||
|
f"{BASE_URL}/documents",
|
||||||
|
headers=headers,
|
||||||
|
files=files
|
||||||
|
)
|
||||||
|
document_id = response.json()["id"]
|
||||||
|
|
||||||
|
# Search documents
|
||||||
|
response = requests.get(
|
||||||
|
f"{BASE_URL}/search",
|
||||||
|
headers=headers,
|
||||||
|
params={"query": "invoice 2024"}
|
||||||
|
)
|
||||||
|
results = response.json()["results"]
|
||||||
|
```
|
||||||
|
|
||||||
|
### JavaScript Example
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Configuration
|
||||||
|
const BASE_URL = 'http://localhost:8000/api';
|
||||||
|
|
||||||
|
// Login
|
||||||
|
async function login(username, password) {
|
||||||
|
const response = await fetch(`${BASE_URL}/auth/login`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ username, password })
|
||||||
|
});
|
||||||
|
const data = await response.json();
|
||||||
|
return data.token;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Upload document
|
||||||
|
async function uploadDocument(token, file) {
|
||||||
|
const formData = new FormData();
|
||||||
|
formData.append('file', file);
|
||||||
|
|
||||||
|
const response = await fetch(`${BASE_URL}/documents`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Authorization': `Bearer ${token}` },
|
||||||
|
body: formData
|
||||||
|
});
|
||||||
|
return response.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Search documents
|
||||||
|
async function searchDocuments(token, query) {
|
||||||
|
const response = await fetch(
|
||||||
|
`${BASE_URL}/search?query=${encodeURIComponent(query)}`,
|
||||||
|
{
|
||||||
|
headers: { 'Authorization': `Bearer ${token}` }
|
||||||
|
}
|
||||||
|
);
|
||||||
|
return response.json();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### cURL Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Login
|
||||||
|
TOKEN=$(curl -s -X POST http://localhost:8000/api/auth/login \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"username":"admin","password":"your_password"}' \
|
||||||
|
| jq -r .token)
|
||||||
|
|
||||||
|
# Upload document
|
||||||
|
curl -X POST http://localhost:8000/api/documents \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-F "file=@document.pdf"
|
||||||
|
|
||||||
|
# Search documents
|
||||||
|
curl -X GET "http://localhost:8000/api/search?query=invoice" \
|
||||||
|
-H "Authorization: Bearer $TOKEN"
|
||||||
|
|
||||||
|
# Get document
|
||||||
|
curl -X GET http://localhost:8000/api/documents/550e8400-e29b-41d4-a716-446655440000 \
|
||||||
|
-H "Authorization: Bearer $TOKEN"
|
||||||
|
```
|
||||||
|
|
||||||
|
## OpenAPI Specification
|
||||||
|
|
||||||
|
The complete OpenAPI specification is available at:
|
||||||
|
```
|
||||||
|
GET /api/openapi.json
|
||||||
|
```
|
||||||
|
|
||||||
|
You can use this with tools like Swagger UI or to generate client libraries.
|
||||||
|
|
||||||
|
## SDK Support
|
||||||
|
|
||||||
|
Official SDKs are planned for:
|
||||||
|
- Python
|
||||||
|
- JavaScript/TypeScript
|
||||||
|
- Go
|
||||||
|
- Ruby
|
||||||
|
|
||||||
|
Check the [GitHub repository](https://github.com/perfectra1n/readur) for the latest SDK availability.
|
||||||
|
|
@ -0,0 +1,261 @@
|
||||||
|
# Configuration Guide
|
||||||
|
|
||||||
|
This guide covers all configuration options available in Readur through environment variables and runtime settings.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Environment Variables](#environment-variables)
|
||||||
|
- [Core Configuration](#core-configuration)
|
||||||
|
- [File Storage & Upload](#file-storage--upload)
|
||||||
|
- [Watch Folder Configuration](#watch-folder-configuration)
|
||||||
|
- [OCR & Processing Settings](#ocr--processing-settings)
|
||||||
|
- [Search & Performance](#search--performance)
|
||||||
|
- [Data Management](#data-management)
|
||||||
|
- [Port Configuration](#port-configuration)
|
||||||
|
- [Example Configurations](#example-configurations)
|
||||||
|
- [Configuration Priority](#configuration-priority)
|
||||||
|
- [Runtime Settings vs Environment Variables](#runtime-settings-vs-environment-variables)
|
||||||
|
- [Database Tuning](#database-tuning)
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
All application settings can be configured via environment variables:
|
||||||
|
|
||||||
|
### Core Configuration
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `DATABASE_URL` | `postgresql://readur:readur@localhost/readur` | PostgreSQL connection string |
|
||||||
|
| `JWT_SECRET` | `your-secret-key` | Secret key for JWT tokens ⚠️ **Change in production!** |
|
||||||
|
| `SERVER_ADDRESS` | `0.0.0.0:8000` | Server bind address and port |
|
||||||
|
|
||||||
|
### File Storage & Upload
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `UPLOAD_PATH` | `./uploads` | Document storage directory |
|
||||||
|
| `ALLOWED_FILE_TYPES` | `pdf,txt,doc,docx,png,jpg,jpeg` | Comma-separated allowed file extensions |
|
||||||
|
|
||||||
|
### Watch Folder Configuration
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `WATCH_FOLDER` | `./watch` | Directory to monitor for new files |
|
||||||
|
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems (seconds) |
|
||||||
|
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file write completion (milliseconds) |
|
||||||
|
| `MAX_FILE_AGE_HOURS` | _(none)_ | Skip files older than this many hours |
|
||||||
|
| `FORCE_POLLING_WATCH` | _(none)_ | Force polling mode even for local filesystems |
|
||||||
|
|
||||||
|
### OCR & Processing Settings
|
||||||
|
|
||||||
|
*Note: These settings can also be configured per-user via the web interface*
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `OCR_LANGUAGE` | `eng` | OCR language code (eng, fra, deu, spa, etc.) |
|
||||||
|
| `CONCURRENT_OCR_JOBS` | `4` | Maximum parallel OCR processes |
|
||||||
|
| `OCR_TIMEOUT_SECONDS` | `300` | OCR processing timeout per file |
|
||||||
|
| `MAX_FILE_SIZE_MB` | `50` | Maximum file size for processing |
|
||||||
|
| `AUTO_ROTATE_IMAGES` | `true` | Automatically rotate images for better OCR |
|
||||||
|
| `ENABLE_IMAGE_PREPROCESSING` | `true` | Apply image enhancement before OCR |
|
||||||
|
|
||||||
|
### Search & Performance
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `SEARCH_RESULTS_PER_PAGE` | `25` | Default number of search results per page |
|
||||||
|
| `SEARCH_SNIPPET_LENGTH` | `200` | Length of text snippets in search results |
|
||||||
|
| `FUZZY_SEARCH_THRESHOLD` | `0.8` | Similarity threshold for fuzzy search (0.0-1.0) |
|
||||||
|
| `MEMORY_LIMIT_MB` | `512` | Memory limit for OCR processes |
|
||||||
|
| `CPU_PRIORITY` | `normal` | CPU priority: `low`, `normal`, `high` |
|
||||||
|
|
||||||
|
### Data Management
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `RETENTION_DAYS` | _(none)_ | Auto-delete documents after N days |
|
||||||
|
| `ENABLE_AUTO_CLEANUP` | `false` | Enable automatic cleanup of old documents |
|
||||||
|
| `ENABLE_COMPRESSION` | `false` | Compress stored documents to save space |
|
||||||
|
| `ENABLE_BACKGROUND_OCR` | `true` | Process OCR in background queue |
|
||||||
|
|
||||||
|
## Port Configuration
|
||||||
|
|
||||||
|
Readur supports flexible port configuration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Method 1: Specify full server address
|
||||||
|
SERVER_ADDRESS=0.0.0.0:8000
|
||||||
|
|
||||||
|
# Method 2: Use separate host and port (recommended)
|
||||||
|
SERVER_HOST=0.0.0.0
|
||||||
|
SERVER_PORT=8000
|
||||||
|
|
||||||
|
# For development: Configure frontend port
|
||||||
|
CLIENT_PORT=5173
|
||||||
|
BACKEND_PORT=8000
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example Configurations
|
||||||
|
|
||||||
|
### Development Configuration
|
||||||
|
|
||||||
|
```env
|
||||||
|
# Basic development setup
|
||||||
|
DATABASE_URL=postgresql://readur:readur@localhost/readur
|
||||||
|
JWT_SECRET=dev-secret-key-not-for-production
|
||||||
|
SERVER_ADDRESS=0.0.0.0:8000
|
||||||
|
UPLOAD_PATH=./uploads
|
||||||
|
WATCH_FOLDER=./watch
|
||||||
|
OCR_LANGUAGE=eng
|
||||||
|
CONCURRENT_OCR_JOBS=2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Production Configuration
|
||||||
|
|
||||||
|
```env
|
||||||
|
# Core settings
|
||||||
|
DATABASE_URL=postgresql://readur:secure_password@postgres:5432/readur
|
||||||
|
JWT_SECRET=your-very-long-random-secret-key-generated-with-openssl
|
||||||
|
SERVER_ADDRESS=0.0.0.0:8000
|
||||||
|
|
||||||
|
# File handling
|
||||||
|
UPLOAD_PATH=/app/uploads
|
||||||
|
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,rtf,doc,docx
|
||||||
|
|
||||||
|
# Watch folder for NFS mount
|
||||||
|
WATCH_FOLDER=/mnt/nfs/documents
|
||||||
|
WATCH_INTERVAL_SECONDS=60
|
||||||
|
FILE_STABILITY_CHECK_MS=1000
|
||||||
|
MAX_FILE_AGE_HOURS=168
|
||||||
|
FORCE_POLLING_WATCH=1
|
||||||
|
|
||||||
|
# OCR optimization
|
||||||
|
OCR_LANGUAGE=eng
|
||||||
|
CONCURRENT_OCR_JOBS=8
|
||||||
|
OCR_TIMEOUT_SECONDS=600
|
||||||
|
MAX_FILE_SIZE_MB=200
|
||||||
|
AUTO_ROTATE_IMAGES=true
|
||||||
|
ENABLE_IMAGE_PREPROCESSING=true
|
||||||
|
|
||||||
|
# Performance tuning
|
||||||
|
MEMORY_LIMIT_MB=2048
|
||||||
|
CPU_PRIORITY=high
|
||||||
|
ENABLE_COMPRESSION=true
|
||||||
|
ENABLE_BACKGROUND_OCR=true
|
||||||
|
|
||||||
|
# Search optimization
|
||||||
|
SEARCH_RESULTS_PER_PAGE=50
|
||||||
|
SEARCH_SNIPPET_LENGTH=300
|
||||||
|
FUZZY_SEARCH_THRESHOLD=0.7
|
||||||
|
|
||||||
|
# Data management
|
||||||
|
RETENTION_DAYS=2555 # 7 years
|
||||||
|
ENABLE_AUTO_CLEANUP=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Network Filesystem Configuration
|
||||||
|
|
||||||
|
```env
|
||||||
|
# For NFS mounts
|
||||||
|
WATCH_FOLDER=/mnt/nfs/documents
|
||||||
|
WATCH_INTERVAL_SECONDS=60
|
||||||
|
FILE_STABILITY_CHECK_MS=1000
|
||||||
|
FORCE_POLLING_WATCH=1
|
||||||
|
|
||||||
|
# For SMB/CIFS mounts
|
||||||
|
WATCH_FOLDER=/mnt/smb/shared
|
||||||
|
WATCH_INTERVAL_SECONDS=30
|
||||||
|
FILE_STABILITY_CHECK_MS=2000
|
||||||
|
|
||||||
|
# For S3 mounts (using s3fs)
|
||||||
|
WATCH_FOLDER=/mnt/s3/bucket
|
||||||
|
WATCH_INTERVAL_SECONDS=120
|
||||||
|
FILE_STABILITY_CHECK_MS=5000
|
||||||
|
FORCE_POLLING_WATCH=1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration Priority
|
||||||
|
|
||||||
|
Settings are applied in this order (later values override earlier ones):
|
||||||
|
|
||||||
|
1. **Application defaults** (built into the code)
|
||||||
|
2. **Environment variables** (system-wide configuration)
|
||||||
|
3. **User settings** (per-user database settings via web interface)
|
||||||
|
|
||||||
|
This allows for flexible deployment where system administrators can set defaults while users can customize their experience.
|
||||||
|
|
||||||
|
## Runtime Settings vs Environment Variables
|
||||||
|
|
||||||
|
Some settings can be configured in two ways:
|
||||||
|
|
||||||
|
1. **Environment Variables**: Set at container startup, affects the entire application
|
||||||
|
2. **User Settings**: Configured per-user via the web interface, stored in database
|
||||||
|
|
||||||
|
**Environment variables take precedence** and provide system-wide defaults. User settings override these defaults for individual users where applicable.
|
||||||
|
|
||||||
|
Settings configurable via web interface:
|
||||||
|
- OCR language preferences
|
||||||
|
- Search result limits
|
||||||
|
- File type restrictions
|
||||||
|
- OCR processing options
|
||||||
|
- Data retention policies
|
||||||
|
|
||||||
|
## Database Tuning
|
||||||
|
|
||||||
|
For better search performance with large document collections:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Increase shared_buffers for better caching
|
||||||
|
ALTER SYSTEM SET shared_buffers = '256MB';
|
||||||
|
|
||||||
|
-- Optimize for full-text search
|
||||||
|
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
|
||||||
|
|
||||||
|
-- Restart PostgreSQL after changes
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Configuration
|
||||||
|
|
||||||
|
### Generating Secure Secrets
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate secure JWT secret
|
||||||
|
JWT_SECRET=$(openssl rand -base64 64)
|
||||||
|
|
||||||
|
# Generate secure database password
|
||||||
|
DB_PASSWORD=$(openssl rand -base64 32)
|
||||||
|
|
||||||
|
# Save to .env file
|
||||||
|
cat > .env << EOF
|
||||||
|
JWT_SECRET=${JWT_SECRET}
|
||||||
|
DB_PASSWORD=${DB_PASSWORD}
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Reference - Essential Variables
|
||||||
|
|
||||||
|
For a minimal production deployment, configure these essential variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Security (REQUIRED)
|
||||||
|
JWT_SECRET=your-secure-random-key-here
|
||||||
|
DATABASE_URL=postgresql://user:password@host:port/database
|
||||||
|
|
||||||
|
# File Storage
|
||||||
|
UPLOAD_PATH=/app/uploads
|
||||||
|
WATCH_FOLDER=/path/to/mounted/folder
|
||||||
|
|
||||||
|
# Watch Folder (for network mounts)
|
||||||
|
WATCH_INTERVAL_SECONDS=60
|
||||||
|
FORCE_POLLING_WATCH=1
|
||||||
|
|
||||||
|
# Performance
|
||||||
|
CONCURRENT_OCR_JOBS=4
|
||||||
|
MAX_FILE_SIZE_MB=100
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Review [deployment options](deployment.md) for production setup
|
||||||
|
- Learn about [folder watching](WATCH_FOLDER.md) for automatic document ingestion
|
||||||
|
- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md) for your use case
|
||||||
|
|
@ -0,0 +1,403 @@
|
||||||
|
# Deployment Guide
|
||||||
|
|
||||||
|
This guide covers production deployment strategies, SSL setup, monitoring, backups, and best practices for running Readur in production.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Production Docker Compose](#production-docker-compose)
|
||||||
|
- [Network Filesystem Mounts](#network-filesystem-mounts)
|
||||||
|
- [NFS Mounts](#nfs-mounts)
|
||||||
|
- [SMB/CIFS Mounts](#smbcifs-mounts)
|
||||||
|
- [S3 Mounts](#s3-mounts)
|
||||||
|
- [SSL/HTTPS Setup](#sslhttps-setup)
|
||||||
|
- [Nginx Configuration](#nginx-configuration)
|
||||||
|
- [Traefik Configuration](#traefik-configuration)
|
||||||
|
- [Health Checks](#health-checks)
|
||||||
|
- [Backup Strategy](#backup-strategy)
|
||||||
|
- [Monitoring](#monitoring)
|
||||||
|
- [Deployment Platforms](#deployment-platforms)
|
||||||
|
- [Docker Swarm](#docker-swarm)
|
||||||
|
- [Kubernetes](#kubernetes)
|
||||||
|
- [Cloud Platforms](#cloud-platforms)
|
||||||
|
- [Security Considerations](#security-considerations)
|
||||||
|
|
||||||
|
## Production Docker Compose
|
||||||
|
|
||||||
|
For production deployments, create a custom `docker-compose.prod.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
readur:
|
||||||
|
image: readur:latest
|
||||||
|
ports:
|
||||||
|
- "8000:8000"
|
||||||
|
environment:
|
||||||
|
# Core Configuration
|
||||||
|
- DATABASE_URL=postgresql://readur:${DB_PASSWORD}@postgres:5432/readur
|
||||||
|
- JWT_SECRET=${JWT_SECRET}
|
||||||
|
- SERVER_ADDRESS=0.0.0.0:8000
|
||||||
|
|
||||||
|
# File Storage
|
||||||
|
- UPLOAD_PATH=/app/uploads
|
||||||
|
- WATCH_FOLDER=/app/watch
|
||||||
|
- ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,doc,docx
|
||||||
|
|
||||||
|
# Watch Folder Settings
|
||||||
|
- WATCH_INTERVAL_SECONDS=30
|
||||||
|
- FILE_STABILITY_CHECK_MS=500
|
||||||
|
- MAX_FILE_AGE_HOURS=168
|
||||||
|
|
||||||
|
# OCR Configuration
|
||||||
|
- OCR_LANGUAGE=eng
|
||||||
|
- CONCURRENT_OCR_JOBS=4
|
||||||
|
- OCR_TIMEOUT_SECONDS=300
|
||||||
|
- MAX_FILE_SIZE_MB=100
|
||||||
|
|
||||||
|
# Performance Tuning
|
||||||
|
- MEMORY_LIMIT_MB=1024
|
||||||
|
- CPU_PRIORITY=normal
|
||||||
|
- ENABLE_COMPRESSION=true
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
# Document storage
|
||||||
|
- ./data/uploads:/app/uploads
|
||||||
|
|
||||||
|
# Watch folder - mount your network drives here
|
||||||
|
- /mnt/nfs/documents:/app/watch
|
||||||
|
# or SMB: - /mnt/smb/shared:/app/watch
|
||||||
|
# or S3: - /mnt/s3/bucket:/app/watch
|
||||||
|
|
||||||
|
depends_on:
|
||||||
|
- postgres
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Resource limits for production
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 2G
|
||||||
|
cpus: '2.0'
|
||||||
|
reservations:
|
||||||
|
memory: 512M
|
||||||
|
cpus: '0.5'
|
||||||
|
|
||||||
|
postgres:
|
||||||
|
image: postgres:15
|
||||||
|
environment:
|
||||||
|
- POSTGRES_USER=readur
|
||||||
|
- POSTGRES_PASSWORD=${DB_PASSWORD}
|
||||||
|
- POSTGRES_DB=readur
|
||||||
|
- POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
- postgres_data:/var/lib/postgresql/data
|
||||||
|
- ./postgres-config:/etc/postgresql/conf.d:ro
|
||||||
|
|
||||||
|
# PostgreSQL optimization for document search
|
||||||
|
command: >
|
||||||
|
postgres
|
||||||
|
-c shared_buffers=256MB
|
||||||
|
-c effective_cache_size=1GB
|
||||||
|
-c max_connections=100
|
||||||
|
-c default_text_search_config=pg_catalog.english
|
||||||
|
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Don't expose port in production
|
||||||
|
# ports:
|
||||||
|
# - "5433:5432"
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
postgres_data:
|
||||||
|
driver: local
|
||||||
|
```
|
||||||
|
|
||||||
|
Deploy with environment file:
|
||||||
|
```bash
|
||||||
|
# Create .env file with secrets
|
||||||
|
cat > .env << EOF
|
||||||
|
JWT_SECRET=$(openssl rand -base64 64)
|
||||||
|
DB_PASSWORD=$(openssl rand -base64 32)
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Deploy
|
||||||
|
docker compose -f docker-compose.prod.yml --env-file .env up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Network Filesystem Mounts
|
||||||
|
|
||||||
|
### NFS Mounts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Mount NFS share
|
||||||
|
sudo mount -t nfs 192.168.1.100:/documents /mnt/nfs/documents
|
||||||
|
|
||||||
|
# Add to docker-compose.yml
|
||||||
|
volumes:
|
||||||
|
- /mnt/nfs/documents:/app/watch
|
||||||
|
environment:
|
||||||
|
- WATCH_INTERVAL_SECONDS=60
|
||||||
|
- FILE_STABILITY_CHECK_MS=1000
|
||||||
|
- FORCE_POLLING_WATCH=1
|
||||||
|
```
|
||||||
|
|
||||||
|
### SMB/CIFS Mounts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Mount SMB share
|
||||||
|
sudo mount -t cifs //server/share /mnt/smb/shared -o username=user,password=pass
|
||||||
|
|
||||||
|
# Docker volume configuration
|
||||||
|
volumes:
|
||||||
|
- /mnt/smb/shared:/app/watch
|
||||||
|
environment:
|
||||||
|
- WATCH_INTERVAL_SECONDS=30
|
||||||
|
- FILE_STABILITY_CHECK_MS=2000
|
||||||
|
```
|
||||||
|
|
||||||
|
### S3 Mounts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Mount S3 bucket using s3fs
|
||||||
|
s3fs mybucket /mnt/s3/bucket -o passwd_file=~/.passwd-s3fs
|
||||||
|
|
||||||
|
# Docker configuration for S3
|
||||||
|
volumes:
|
||||||
|
- /mnt/s3/bucket:/app/watch
|
||||||
|
environment:
|
||||||
|
- WATCH_INTERVAL_SECONDS=120
|
||||||
|
- FILE_STABILITY_CHECK_MS=5000
|
||||||
|
- FORCE_POLLING_WATCH=1
|
||||||
|
```
|
||||||
|
|
||||||
|
## SSL/HTTPS Setup
|
||||||
|
|
||||||
|
### Nginx Configuration
|
||||||
|
|
||||||
|
```nginx
|
||||||
|
server {
|
||||||
|
listen 443 ssl http2;
|
||||||
|
server_name readur.yourdomain.com;
|
||||||
|
|
||||||
|
ssl_certificate /path/to/cert.pem;
|
||||||
|
ssl_certificate_key /path/to/key.pem;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
proxy_pass http://localhost:8000;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
|
||||||
|
# For file uploads
|
||||||
|
client_max_body_size 100M;
|
||||||
|
proxy_read_timeout 300s;
|
||||||
|
proxy_send_timeout 300s;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Traefik Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
readur:
|
||||||
|
labels:
|
||||||
|
- "traefik.enable=true"
|
||||||
|
- "traefik.http.routers.readur.rule=Host(`readur.yourdomain.com`)"
|
||||||
|
- "traefik.http.routers.readur.tls=true"
|
||||||
|
- "traefik.http.routers.readur.tls.certresolver=letsencrypt"
|
||||||
|
```
|
||||||
|
|
||||||
|
> 📘 **For more reverse proxy configurations** including Apache, Caddy, custom ports, load balancing, and advanced scenarios, see [REVERSE_PROXY.md](./REVERSE_PROXY.md).
|
||||||
|
|
||||||
|
## Health Checks
|
||||||
|
|
||||||
|
Add health checks to your Docker configuration:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
readur:
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 40s
|
||||||
|
```
|
||||||
|
|
||||||
|
## Backup Strategy
|
||||||
|
|
||||||
|
Create an automated backup script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# backup.sh - Automated backup script
|
||||||
|
|
||||||
|
BACKUP_DIR="/path/to/backups"
|
||||||
|
DATE=$(date +%Y%m%d_%H%M%S)
|
||||||
|
|
||||||
|
# Create backup directory
|
||||||
|
mkdir -p "$BACKUP_DIR"
|
||||||
|
|
||||||
|
# Backup database
|
||||||
|
docker exec readur-postgres-1 pg_dump -U readur readur | gzip > "$BACKUP_DIR/db_backup_$DATE.sql.gz"
|
||||||
|
|
||||||
|
# Backup uploaded files
|
||||||
|
tar -czf "$BACKUP_DIR/uploads_backup_$DATE.tar.gz" -C ./data uploads/
|
||||||
|
|
||||||
|
# Clean old backups (keep 30 days)
|
||||||
|
find "$BACKUP_DIR" -name "db_backup_*.sql.gz" -mtime +30 -delete
|
||||||
|
find "$BACKUP_DIR" -name "uploads_backup_*.tar.gz" -mtime +30 -delete
|
||||||
|
|
||||||
|
echo "Backup completed: $DATE"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add to crontab for daily backups:
|
||||||
|
```bash
|
||||||
|
0 2 * * * /path/to/backup.sh >> /var/log/readur-backup.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore from Backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restore database
|
||||||
|
gunzip -c db_backup_20240101_020000.sql.gz | docker exec -i readur-postgres-1 psql -U readur readur
|
||||||
|
|
||||||
|
# Restore files
|
||||||
|
tar -xzf uploads_backup_20240101_020000.tar.gz -C ./data
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
Monitor your deployment with Docker stats:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Real-time resource usage
|
||||||
|
docker stats
|
||||||
|
|
||||||
|
# Container logs
|
||||||
|
docker compose logs -f readur
|
||||||
|
|
||||||
|
# Watch folder activity
|
||||||
|
docker compose logs -f readur | grep watcher
|
||||||
|
|
||||||
|
# PostgreSQL query performance
|
||||||
|
docker exec readur-postgres-1 psql -U readur -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prometheus Metrics
|
||||||
|
|
||||||
|
Readur exposes metrics at `/metrics` endpoint:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# prometheus.yml
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: 'readur'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['readur:8000']
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment Platforms
|
||||||
|
|
||||||
|
### Docker Swarm
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: '3.8'
|
||||||
|
services:
|
||||||
|
readur:
|
||||||
|
image: readur:latest
|
||||||
|
deploy:
|
||||||
|
replicas: 2
|
||||||
|
restart_policy:
|
||||||
|
condition: on-failure
|
||||||
|
placement:
|
||||||
|
constraints: [node.role == worker]
|
||||||
|
networks:
|
||||||
|
- readur-network
|
||||||
|
secrets:
|
||||||
|
- jwt_secret
|
||||||
|
- db_password
|
||||||
|
|
||||||
|
secrets:
|
||||||
|
jwt_secret:
|
||||||
|
external: true
|
||||||
|
db_password:
|
||||||
|
external: true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kubernetes
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: readur
|
||||||
|
spec:
|
||||||
|
replicas: 3
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: readur
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: readur
|
||||||
|
image: readur:latest
|
||||||
|
env:
|
||||||
|
- name: JWT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: readur-secrets
|
||||||
|
key: jwt-secret
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: "2Gi"
|
||||||
|
cpu: "2"
|
||||||
|
requests:
|
||||||
|
memory: "512Mi"
|
||||||
|
cpu: "500m"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cloud Platforms
|
||||||
|
|
||||||
|
- **AWS**: Use ECS with RDS PostgreSQL
|
||||||
|
- **Google Cloud**: Deploy to Cloud Run with Cloud SQL
|
||||||
|
- **Azure**: Use Container Instances with Azure Database
|
||||||
|
- **DigitalOcean**: App Platform with Managed Database
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
### Production Checklist
|
||||||
|
|
||||||
|
- [ ] Change default admin password
|
||||||
|
- [ ] Generate strong JWT secret
|
||||||
|
- [ ] Use HTTPS/SSL in production
|
||||||
|
- [ ] Restrict database network access
|
||||||
|
- [ ] Set proper file permissions
|
||||||
|
- [ ] Enable firewall rules
|
||||||
|
- [ ] Regular security updates
|
||||||
|
- [ ] Monitor access logs
|
||||||
|
- [ ] Implement rate limiting
|
||||||
|
- [ ] Enable audit logging
|
||||||
|
|
||||||
|
### Recommended Production Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate secure secrets
|
||||||
|
JWT_SECRET=$(openssl rand -base64 64)
|
||||||
|
DB_PASSWORD=$(openssl rand -base64 32)
|
||||||
|
|
||||||
|
# Restrict file permissions
|
||||||
|
chmod 600 .env
|
||||||
|
chmod 700 ./data/uploads
|
||||||
|
|
||||||
|
# Use read-only root filesystem
|
||||||
|
docker run --read-only --tmpfs /tmp ...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Configure [monitoring and alerting](monitoring-usage)
|
||||||
|
- Review [security best practices](security)
|
||||||
|
- Set up [automated backups](#backup-strategy)
|
||||||
|
- Explore [database guardrails](dev/DATABASE_GUARDRAILS.md)
|
||||||
|
|
@ -0,0 +1,47 @@
|
||||||
|
# Developer Documentation
|
||||||
|
|
||||||
|
This directory contains technical documentation for developers working on Readur.
|
||||||
|
|
||||||
|
## 📋 Table of Contents
|
||||||
|
|
||||||
|
### 🏗️ Architecture & Design
|
||||||
|
- [**Architecture Overview**](architecture.md) - System design, components, and data flow
|
||||||
|
- [**Database Guardrails**](DATABASE_GUARDRAILS.md) - Concurrency safety and database best practices
|
||||||
|
|
||||||
|
### 🛠️ Development
|
||||||
|
- [**Development Guide**](development.md) - Setup, contributing, code style guidelines
|
||||||
|
- [**Testing Guide**](TESTING.md) - Comprehensive testing strategy and instructions
|
||||||
|
|
||||||
|
### ⚙️ Technical Guides
|
||||||
|
- [**OCR Optimization**](OCR_OPTIMIZATION_GUIDE.md) - Performance tuning and best practices
|
||||||
|
- [**Queue Improvements**](QUEUE_IMPROVEMENTS.md) - Background job processing architecture
|
||||||
|
- [**Deployment Summary**](DEPLOYMENT_SUMMARY.md) - Technical deployment overview
|
||||||
|
|
||||||
|
## 🚀 Quick Start for Developers
|
||||||
|
|
||||||
|
1. **Read the [Architecture Overview](architecture.md)** to understand the system design
|
||||||
|
2. **Follow the [Development Guide](development.md)** to set up your local environment
|
||||||
|
3. **Review the [Testing Guide](TESTING.md)** to understand our testing approach
|
||||||
|
4. **Check [Database Guardrails](DATABASE_GUARDRAILS.md)** for data safety patterns
|
||||||
|
|
||||||
|
## 📖 Related User Documentation
|
||||||
|
|
||||||
|
- [Installation Guide](../installation.md) - How to install and run Readur
|
||||||
|
- [Configuration Guide](../configuration.md) - Environment variables and settings
|
||||||
|
- [User Guide](../user-guide.md) - How to use Readur features
|
||||||
|
- [API Reference](../api-reference.md) - REST API documentation
|
||||||
|
|
||||||
|
## 🤝 Contributing
|
||||||
|
|
||||||
|
Please read our [Development Guide](development.md) for:
|
||||||
|
- Setting up your development environment
|
||||||
|
- Code style guidelines
|
||||||
|
- Testing requirements
|
||||||
|
- Pull request process
|
||||||
|
|
||||||
|
## 🏷️ Document Categories
|
||||||
|
|
||||||
|
- **📘 User Docs**: Installation, configuration, user guide
|
||||||
|
- **🔧 Operations**: Deployment, monitoring, troubleshooting
|
||||||
|
- **💻 Developer**: Architecture, development setup, testing
|
||||||
|
- **🔌 Integration**: API reference, webhooks, extensions
|
||||||
|
|
@ -0,0 +1,350 @@
|
||||||
|
# Architecture Overview
|
||||||
|
|
||||||
|
This document provides a comprehensive overview of Readur's architecture, design decisions, and technical implementation details.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [System Architecture](#system-architecture)
|
||||||
|
- [Technology Stack](#technology-stack)
|
||||||
|
- [Component Overview](#component-overview)
|
||||||
|
- [Backend (Rust/Axum)](#backend-rustaxum)
|
||||||
|
- [Frontend (React)](#frontend-react)
|
||||||
|
- [Database (PostgreSQL)](#database-postgresql)
|
||||||
|
- [OCR Engine](#ocr-engine)
|
||||||
|
- [Data Flow](#data-flow)
|
||||||
|
- [Security Architecture](#security-architecture)
|
||||||
|
- [Performance Considerations](#performance-considerations)
|
||||||
|
- [Scalability](#scalability)
|
||||||
|
- [Design Patterns](#design-patterns)
|
||||||
|
|
||||||
|
## System Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||||
|
│ React Frontend │────│ Rust Backend │────│ PostgreSQL DB │
|
||||||
|
│ (Port 8000) │ │ (Axum API) │ │ (Port 5433) │
|
||||||
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
|
│ │ │
|
||||||
|
│ ┌─────────────────┐ │
|
||||||
|
└──────────────│ File Storage │─────────────┘
|
||||||
|
│ + OCR Engine │
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### High-Level Components
|
||||||
|
|
||||||
|
1. **Web Interface**: Modern React SPA with Material-UI
|
||||||
|
2. **API Server**: High-performance Rust backend using Axum
|
||||||
|
3. **Database**: PostgreSQL with full-text search capabilities
|
||||||
|
4. **File Storage**: Local or network-mounted filesystem
|
||||||
|
5. **OCR Processing**: Tesseract integration for text extraction
|
||||||
|
6. **Background Jobs**: Async task processing for OCR and file watching
|
||||||
|
|
||||||
|
## Technology Stack
|
||||||
|
|
||||||
|
### Backend
|
||||||
|
- **Language**: Rust (for performance and memory safety)
|
||||||
|
- **Web Framework**: Axum (async, fast, type-safe)
|
||||||
|
- **Database ORM**: SQLx (compile-time checked queries)
|
||||||
|
- **Authentication**: JWT tokens with bcrypt password hashing
|
||||||
|
- **Async Runtime**: Tokio
|
||||||
|
- **Serialization**: Serde
|
||||||
|
|
||||||
|
### Frontend
|
||||||
|
- **Framework**: React 18 with TypeScript
|
||||||
|
- **UI Library**: Material-UI (MUI)
|
||||||
|
- **State Management**: React Context + Hooks
|
||||||
|
- **Build Tool**: Vite
|
||||||
|
- **HTTP Client**: Axios
|
||||||
|
- **Routing**: React Router
|
||||||
|
|
||||||
|
### Infrastructure
|
||||||
|
- **Database**: PostgreSQL 14+ with pgvector extension
|
||||||
|
- **OCR**: Tesseract 4.0+
|
||||||
|
- **Container**: Docker with multi-stage builds
|
||||||
|
- **Reverse Proxy**: Nginx/Traefik compatible
|
||||||
|
|
||||||
|
## Component Overview
|
||||||
|
|
||||||
|
### Backend (Rust/Axum)
|
||||||
|
|
||||||
|
The backend is structured following clean architecture principles:
|
||||||
|
|
||||||
|
```
|
||||||
|
src/
|
||||||
|
├── main.rs # Application entry and server setup
|
||||||
|
├── config.rs # Configuration management
|
||||||
|
├── models.rs # Domain models and DTOs
|
||||||
|
├── error.rs # Error handling
|
||||||
|
├── auth.rs # Authentication middleware
|
||||||
|
├── routes/ # HTTP route handlers
|
||||||
|
│ ├── auth.rs # Authentication endpoints
|
||||||
|
│ ├── documents.rs # Document CRUD operations
|
||||||
|
│ ├── search.rs # Search functionality
|
||||||
|
│ └── ...
|
||||||
|
├── db/ # Database operations
|
||||||
|
│ ├── documents.rs # Document queries
|
||||||
|
│ ├── users.rs # User queries
|
||||||
|
│ └── ...
|
||||||
|
├── services/ # Business logic
|
||||||
|
│ ├── ocr.rs # OCR processing
|
||||||
|
│ ├── file_service.rs # File management
|
||||||
|
│ └── watcher.rs # Folder watching
|
||||||
|
└── tests/ # Integration tests
|
||||||
|
```
|
||||||
|
|
||||||
|
Key design decisions:
|
||||||
|
- **Async-first**: All I/O operations are async
|
||||||
|
- **Type safety**: Leverages Rust's type system
|
||||||
|
- **Error handling**: Comprehensive error types
|
||||||
|
- **Dependency injection**: Clean separation of concerns
|
||||||
|
|
||||||
|
### Frontend (React)
|
||||||
|
|
||||||
|
The frontend follows a component-based architecture:
|
||||||
|
|
||||||
|
```
|
||||||
|
frontend/src/
|
||||||
|
├── components/ # Reusable UI components
|
||||||
|
│ ├── DocumentList/
|
||||||
|
│ ├── SearchBar/
|
||||||
|
│ └── ...
|
||||||
|
├── pages/ # Page-level components
|
||||||
|
│ ├── Dashboard/
|
||||||
|
│ ├── Documents/
|
||||||
|
│ └── ...
|
||||||
|
├── services/ # API integration
|
||||||
|
│ ├── api.ts # Base API client
|
||||||
|
│ ├── auth.ts # Auth service
|
||||||
|
│ └── documents.ts # Document service
|
||||||
|
├── hooks/ # Custom React hooks
|
||||||
|
├── contexts/ # React contexts
|
||||||
|
└── utils/ # Utility functions
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database (PostgreSQL)
|
||||||
|
|
||||||
|
Schema design optimized for document management:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Core tables
|
||||||
|
users # User accounts
|
||||||
|
documents # Document metadata
|
||||||
|
document_content # Extracted text content
|
||||||
|
document_tags # Many-to-many tags
|
||||||
|
sources # File sources (folders, S3, etc.)
|
||||||
|
ocr_queue # OCR processing queue
|
||||||
|
|
||||||
|
-- Search optimization
|
||||||
|
document_search_index # Full-text search index
|
||||||
|
```
|
||||||
|
|
||||||
|
Key features:
|
||||||
|
- **Full-text search**: PostgreSQL's powerful search capabilities
|
||||||
|
- **JSONB fields**: Flexible metadata storage
|
||||||
|
- **Triggers**: Automatic search index updates
|
||||||
|
- **Views**: Optimized query patterns
|
||||||
|
|
||||||
|
### OCR Engine
|
||||||
|
|
||||||
|
OCR processing pipeline:
|
||||||
|
|
||||||
|
1. **File Detection**: New files detected via upload or folder watch
|
||||||
|
2. **Queue Management**: Files added to processing queue
|
||||||
|
3. **Pre-processing**: Image enhancement and optimization
|
||||||
|
4. **Text Extraction**: Tesseract OCR with language detection
|
||||||
|
5. **Post-processing**: Text cleaning and formatting
|
||||||
|
6. **Database Storage**: Indexed for search
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
### Document Upload Flow
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
User->>Frontend: Upload Document
|
||||||
|
Frontend->>API: POST /api/documents
|
||||||
|
API->>FileStorage: Save File
|
||||||
|
API->>Database: Create Document Record
|
||||||
|
API->>OCRQueue: Add to Queue
|
||||||
|
API-->>Frontend: Document Created
|
||||||
|
OCRWorker->>OCRQueue: Poll for Jobs
|
||||||
|
OCRWorker->>FileStorage: Read File
|
||||||
|
OCRWorker->>Tesseract: Extract Text
|
||||||
|
OCRWorker->>Database: Update with Content
|
||||||
|
OCRWorker->>Frontend: WebSocket Update
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search Flow
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
User->>Frontend: Enter Search Query
|
||||||
|
Frontend->>API: GET /api/search
|
||||||
|
API->>Database: Full-text Search
|
||||||
|
Database->>API: Ranked Results
|
||||||
|
API->>Frontend: Search Results
|
||||||
|
Frontend->>User: Display Results
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Architecture
|
||||||
|
|
||||||
|
### Authentication & Authorization
|
||||||
|
|
||||||
|
- **JWT Tokens**: Stateless authentication
|
||||||
|
- **Role-Based Access**: Admin, User roles
|
||||||
|
- **Token Refresh**: Automatic token renewal
|
||||||
|
- **Password Security**: Bcrypt with salt rounds
|
||||||
|
|
||||||
|
### API Security
|
||||||
|
|
||||||
|
- **CORS**: Configurable allowed origins
|
||||||
|
- **Rate Limiting**: Prevent abuse
|
||||||
|
- **Input Validation**: Comprehensive validation
|
||||||
|
- **SQL Injection**: Parameterized queries via SQLx
|
||||||
|
|
||||||
|
### File Security
|
||||||
|
|
||||||
|
- **Upload Validation**: File type and size checks
|
||||||
|
- **Virus Scanning**: Optional ClamAV integration
|
||||||
|
- **Access Control**: Document-level permissions
|
||||||
|
- **Secure Storage**: Filesystem permissions
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
### Backend Optimization
|
||||||
|
|
||||||
|
- **Connection Pooling**: Database connection reuse
|
||||||
|
- **Async I/O**: Non-blocking operations
|
||||||
|
- **Caching**: In-memory caching for hot data
|
||||||
|
- **Query Optimization**: Indexed searches
|
||||||
|
|
||||||
|
### Frontend Optimization
|
||||||
|
|
||||||
|
- **Code Splitting**: Lazy loading of routes
|
||||||
|
- **Virtual Scrolling**: Large document lists
|
||||||
|
- **Memoization**: Prevent unnecessary re-renders
|
||||||
|
- **Service Workers**: Offline capability
|
||||||
|
|
||||||
|
### OCR Optimization
|
||||||
|
|
||||||
|
- **Parallel Processing**: Multiple concurrent jobs
|
||||||
|
- **Image Pre-processing**: Enhance OCR accuracy
|
||||||
|
- **Resource Limits**: Memory and CPU constraints
|
||||||
|
- **Queue Priority**: Smart job scheduling
|
||||||
|
|
||||||
|
## Scalability
|
||||||
|
|
||||||
|
### Horizontal Scaling
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Multiple backend instances
|
||||||
|
backend-1:
|
||||||
|
image: readur:latest
|
||||||
|
environment:
|
||||||
|
- INSTANCE_ID=1
|
||||||
|
|
||||||
|
backend-2:
|
||||||
|
image: readur:latest
|
||||||
|
environment:
|
||||||
|
- INSTANCE_ID=2
|
||||||
|
|
||||||
|
# Load balancer
|
||||||
|
nginx:
|
||||||
|
upstream backend {
|
||||||
|
server backend-1:8000;
|
||||||
|
server backend-2:8000;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database Scaling
|
||||||
|
|
||||||
|
- **Read Replicas**: Distribute read load
|
||||||
|
- **Connection Pooling**: PgBouncer
|
||||||
|
- **Partitioning**: Time-based partitions
|
||||||
|
- **Archival**: Move old documents
|
||||||
|
|
||||||
|
### Storage Scaling
|
||||||
|
|
||||||
|
- **S3 Compatible**: Object storage support
|
||||||
|
- **CDN Integration**: Static file delivery
|
||||||
|
- **Distributed Storage**: GlusterFS/Ceph
|
||||||
|
- **Archive Tiering**: Hot/cold storage
|
||||||
|
|
||||||
|
## Design Patterns
|
||||||
|
|
||||||
|
### Backend Patterns
|
||||||
|
|
||||||
|
1. **Repository Pattern**: Database abstraction
|
||||||
|
2. **Service Layer**: Business logic separation
|
||||||
|
3. **Middleware Chain**: Request processing
|
||||||
|
4. **Error Boundaries**: Graceful error handling
|
||||||
|
|
||||||
|
### Frontend Patterns
|
||||||
|
|
||||||
|
1. **Container/Presenter**: Component separation
|
||||||
|
2. **Custom Hooks**: Logic reuse
|
||||||
|
3. **Context Provider**: State management
|
||||||
|
4. **HOCs**: Cross-cutting concerns
|
||||||
|
|
||||||
|
### Database Patterns
|
||||||
|
|
||||||
|
1. **Soft Deletes**: Data preservation
|
||||||
|
2. **Audit Trails**: Change tracking
|
||||||
|
3. **Materialized Views**: Performance
|
||||||
|
4. **Event Sourcing**: Optional audit log
|
||||||
|
|
||||||
|
## Future Architecture Considerations
|
||||||
|
|
||||||
|
### Microservices Migration
|
||||||
|
|
||||||
|
Potential service boundaries:
|
||||||
|
- Authentication Service
|
||||||
|
- Document Service
|
||||||
|
- OCR Service
|
||||||
|
- Search Service
|
||||||
|
- Notification Service
|
||||||
|
|
||||||
|
### Event-Driven Architecture
|
||||||
|
|
||||||
|
- Message Queue (RabbitMQ/Kafka)
|
||||||
|
- Event Sourcing
|
||||||
|
- CQRS Pattern
|
||||||
|
- Async communication
|
||||||
|
|
||||||
|
### Cloud-Native Features
|
||||||
|
|
||||||
|
- Kubernetes deployment
|
||||||
|
- Service mesh (Istio)
|
||||||
|
- Distributed tracing
|
||||||
|
- Cloud storage integration
|
||||||
|
|
||||||
|
## Monitoring and Observability
|
||||||
|
|
||||||
|
### Metrics
|
||||||
|
|
||||||
|
- Prometheus metrics endpoint
|
||||||
|
- Custom business metrics
|
||||||
|
- Performance counters
|
||||||
|
- Resource utilization
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
|
||||||
|
- Structured logging (JSON)
|
||||||
|
- Log aggregation ready
|
||||||
|
- Correlation IDs
|
||||||
|
- Debug levels
|
||||||
|
|
||||||
|
### Tracing
|
||||||
|
|
||||||
|
- OpenTelemetry support
|
||||||
|
- Distributed tracing
|
||||||
|
- Performance profiling
|
||||||
|
- Request tracking
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Review [deployment options](deployment.md)
|
||||||
|
- Explore [performance tuning](OCR_OPTIMIZATION_GUIDE.md)
|
||||||
|
- Understand [database design](DATABASE_GUARDRAILS.md)
|
||||||
|
- Learn about [testing strategy](TESTING.md)
|
||||||
|
|
@ -0,0 +1,434 @@
|
||||||
|
# Development Guide
|
||||||
|
|
||||||
|
This guide covers contributing to Readur, setting up a development environment, testing, and code style guidelines.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Development Setup](#development-setup)
|
||||||
|
- [Prerequisites](#prerequisites)
|
||||||
|
- [Local Development](#local-development)
|
||||||
|
- [Development with Docker](#development-with-docker)
|
||||||
|
- [Project Structure](#project-structure)
|
||||||
|
- [Testing](#testing)
|
||||||
|
- [Backend Tests](#backend-tests)
|
||||||
|
- [Frontend Tests](#frontend-tests)
|
||||||
|
- [Integration Tests](#integration-tests)
|
||||||
|
- [E2E Tests](#e2e-tests)
|
||||||
|
- [Code Style](#code-style)
|
||||||
|
- [Rust Guidelines](#rust-guidelines)
|
||||||
|
- [Frontend Guidelines](#frontend-guidelines)
|
||||||
|
- [Contributing](#contributing)
|
||||||
|
- [Getting Started](#getting-started)
|
||||||
|
- [Pull Request Process](#pull-request-process)
|
||||||
|
- [Commit Guidelines](#commit-guidelines)
|
||||||
|
- [Debugging](#debugging)
|
||||||
|
- [Performance Profiling](#performance-profiling)
|
||||||
|
|
||||||
|
## Development Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Rust 1.70+ and Cargo
|
||||||
|
- Node.js 18+ and npm
|
||||||
|
- PostgreSQL 14+
|
||||||
|
- Tesseract OCR 4.0+
|
||||||
|
- Git
|
||||||
|
|
||||||
|
### Local Development
|
||||||
|
|
||||||
|
1. **Clone the repository**:
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/perfectra1n/readur.git
|
||||||
|
cd readur
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Set up the database**:
|
||||||
|
```bash
|
||||||
|
# Create development database
|
||||||
|
sudo -u postgres psql
|
||||||
|
CREATE DATABASE readur_dev;
|
||||||
|
CREATE USER readur_dev WITH ENCRYPTED PASSWORD 'dev_password';
|
||||||
|
GRANT ALL PRIVILEGES ON DATABASE readur_dev TO readur_dev;
|
||||||
|
\q
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Configure environment**:
|
||||||
|
```bash
|
||||||
|
# Copy example environment
|
||||||
|
cp .env.example .env.development
|
||||||
|
|
||||||
|
# Edit with your settings
|
||||||
|
DATABASE_URL=postgresql://readur_dev:dev_password@localhost/readur_dev
|
||||||
|
JWT_SECRET=dev-secret-key
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Run database migrations**:
|
||||||
|
```bash
|
||||||
|
# Install sqlx-cli if needed
|
||||||
|
cargo install sqlx-cli
|
||||||
|
|
||||||
|
# Run migrations
|
||||||
|
sqlx migrate run
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Start the backend**:
|
||||||
|
```bash
|
||||||
|
# Development mode with auto-reload
|
||||||
|
cargo watch -x run
|
||||||
|
|
||||||
|
# Or without auto-reload
|
||||||
|
cargo run
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Start the frontend**:
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
npm install
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
### Development with Docker
|
||||||
|
|
||||||
|
For a consistent development environment:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start all services
|
||||||
|
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
|
||||||
|
|
||||||
|
# Backend available at: http://localhost:8000
|
||||||
|
# Frontend dev server at: http://localhost:5173
|
||||||
|
# PostgreSQL at: localhost:5433
|
||||||
|
```
|
||||||
|
|
||||||
|
The development compose file includes:
|
||||||
|
- Volume mounts for hot reloading
|
||||||
|
- Exposed database port
|
||||||
|
- Debug logging enabled
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
readur/
|
||||||
|
├── src/ # Rust backend source
|
||||||
|
│ ├── main.rs # Application entry point
|
||||||
|
│ ├── config.rs # Configuration management
|
||||||
|
│ ├── models.rs # Database models
|
||||||
|
│ ├── routes/ # API route handlers
|
||||||
|
│ ├── db/ # Database operations
|
||||||
|
│ ├── ocr.rs # OCR processing
|
||||||
|
│ └── tests/ # Integration tests
|
||||||
|
├── frontend/ # React frontend
|
||||||
|
│ ├── src/
|
||||||
|
│ │ ├── components/ # React components
|
||||||
|
│ │ ├── pages/ # Page components
|
||||||
|
│ │ ├── services/ # API services
|
||||||
|
│ │ └── App.tsx # Main app component
|
||||||
|
│ └── tests/ # Frontend tests
|
||||||
|
├── migrations/ # Database migrations
|
||||||
|
├── docs/ # Documentation
|
||||||
|
└── tests/ # E2E and integration tests
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
Readur has comprehensive test coverage across unit, integration, and end-to-end tests.
|
||||||
|
|
||||||
|
### Backend Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Run with output
|
||||||
|
cargo test -- --nocapture
|
||||||
|
|
||||||
|
# Run specific test
|
||||||
|
cargo test test_document_upload
|
||||||
|
|
||||||
|
# Run tests with coverage
|
||||||
|
cargo install cargo-tarpaulin
|
||||||
|
cargo tarpaulin --out Html
|
||||||
|
```
|
||||||
|
|
||||||
|
Test categories:
|
||||||
|
- **Unit tests**: In `src/tests/`
|
||||||
|
- **Integration tests**: In `tests/`
|
||||||
|
- **Database tests**: Require `TEST_DATABASE_URL`
|
||||||
|
|
||||||
|
Example test:
|
||||||
|
```rust
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_document_creation() {
|
||||||
|
let doc = Document::new("test.pdf", "application/pdf");
|
||||||
|
assert_eq!(doc.filename, "test.pdf");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
|
||||||
|
# Run unit tests
|
||||||
|
npm test
|
||||||
|
|
||||||
|
# Run with coverage
|
||||||
|
npm run test:coverage
|
||||||
|
|
||||||
|
# Run in watch mode
|
||||||
|
npm run test:watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Example test:
|
||||||
|
```typescript
|
||||||
|
import { render, screen } from '@testing-library/react';
|
||||||
|
import DocumentList from './DocumentList';
|
||||||
|
|
||||||
|
test('renders document list', () => {
|
||||||
|
render(<DocumentList documents={[]} />);
|
||||||
|
expect(screen.getByText(/No documents/i)).toBeInTheDocument();
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run integration tests
|
||||||
|
docker compose -f docker-compose.test.yml up --abort-on-container-exit
|
||||||
|
|
||||||
|
# Or manually
|
||||||
|
cargo test --test '*' -- --test-threads=1
|
||||||
|
```
|
||||||
|
|
||||||
|
### E2E Tests
|
||||||
|
|
||||||
|
Using Playwright for end-to-end testing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
|
||||||
|
# Install Playwright
|
||||||
|
npm run e2e:install
|
||||||
|
|
||||||
|
# Run E2E tests
|
||||||
|
npm run e2e
|
||||||
|
|
||||||
|
# Run in UI mode
|
||||||
|
npm run e2e:ui
|
||||||
|
```
|
||||||
|
|
||||||
|
## Code Style
|
||||||
|
|
||||||
|
### Rust Guidelines
|
||||||
|
|
||||||
|
We follow the official Rust style guide with some additions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Format code
|
||||||
|
cargo fmt
|
||||||
|
|
||||||
|
# Check linting
|
||||||
|
cargo clippy -- -D warnings
|
||||||
|
|
||||||
|
# Check before committing
|
||||||
|
cargo fmt --check && cargo clippy
|
||||||
|
```
|
||||||
|
|
||||||
|
Style preferences:
|
||||||
|
- Use descriptive variable names
|
||||||
|
- Add documentation comments for public APIs
|
||||||
|
- Keep functions small and focused
|
||||||
|
- Use `Result` for error handling
|
||||||
|
- Prefer `&str` over `String` for function parameters
|
||||||
|
|
||||||
|
### Frontend Guidelines
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Format code
|
||||||
|
npm run format
|
||||||
|
|
||||||
|
# Lint check
|
||||||
|
npm run lint
|
||||||
|
|
||||||
|
# Type check
|
||||||
|
npm run type-check
|
||||||
|
```
|
||||||
|
|
||||||
|
Style preferences:
|
||||||
|
- Use functional components with hooks
|
||||||
|
- TypeScript for all new code
|
||||||
|
- Descriptive component and variable names
|
||||||
|
- Extract reusable logic into custom hooks
|
||||||
|
- Keep components focused and small
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
We welcome contributions! Please see our [Contributing Guide](../CONTRIBUTING.md) for details.
|
||||||
|
|
||||||
|
### Getting Started
|
||||||
|
|
||||||
|
1. **Fork the repository**
|
||||||
|
2. **Create a feature branch**:
|
||||||
|
```bash
|
||||||
|
git checkout -b feature/amazing-feature
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Make your changes**
|
||||||
|
4. **Add tests** for new functionality
|
||||||
|
5. **Ensure all tests pass**:
|
||||||
|
```bash
|
||||||
|
cargo test
|
||||||
|
cd frontend && npm test
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Commit your changes** (see commit guidelines below)
|
||||||
|
7. **Push to your fork**:
|
||||||
|
```bash
|
||||||
|
git push origin feature/amazing-feature
|
||||||
|
```
|
||||||
|
|
||||||
|
8. **Open a Pull Request**
|
||||||
|
|
||||||
|
### Pull Request Process
|
||||||
|
|
||||||
|
1. **Update documentation** for any changed functionality
|
||||||
|
2. **Add tests** covering new code
|
||||||
|
3. **Ensure CI passes** (automated checks)
|
||||||
|
4. **Request review** from maintainers
|
||||||
|
5. **Address feedback** promptly
|
||||||
|
6. **Squash commits** if requested
|
||||||
|
|
||||||
|
### Commit Guidelines
|
||||||
|
|
||||||
|
We use conventional commits for clear history:
|
||||||
|
|
||||||
|
```
|
||||||
|
feat: add bulk document export
|
||||||
|
fix: resolve OCR timeout on large files
|
||||||
|
docs: update API authentication section
|
||||||
|
test: add coverage for search filters
|
||||||
|
refactor: simplify document processing pipeline
|
||||||
|
perf: optimize database queries for search
|
||||||
|
chore: update dependencies
|
||||||
|
```
|
||||||
|
|
||||||
|
Format:
|
||||||
|
```
|
||||||
|
<type>(<scope>): <subject>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<footer>
|
||||||
|
```
|
||||||
|
|
||||||
|
Types:
|
||||||
|
- `feat`: New feature
|
||||||
|
- `fix`: Bug fix
|
||||||
|
- `docs`: Documentation only
|
||||||
|
- `style`: Code style changes
|
||||||
|
- `refactor`: Code refactoring
|
||||||
|
- `perf`: Performance improvements
|
||||||
|
- `test`: Test additions/changes
|
||||||
|
- `chore`: Build process/auxiliary tool changes
|
||||||
|
|
||||||
|
## Debugging
|
||||||
|
|
||||||
|
### Backend Debugging
|
||||||
|
|
||||||
|
1. **Enable debug logging**:
|
||||||
|
```bash
|
||||||
|
RUST_LOG=debug cargo run
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Use VS Code debugger**:
|
||||||
|
```json
|
||||||
|
// .vscode/launch.json
|
||||||
|
{
|
||||||
|
"version": "0.2.0",
|
||||||
|
"configurations": [
|
||||||
|
{
|
||||||
|
"type": "lldb",
|
||||||
|
"request": "launch",
|
||||||
|
"name": "Debug Readur",
|
||||||
|
"cargo": {
|
||||||
|
"args": ["build", "--bin=readur"],
|
||||||
|
"filter": {
|
||||||
|
"name": "readur",
|
||||||
|
"kind": "bin"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"args": [],
|
||||||
|
"cwd": "${workspaceFolder}"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Database query logging**:
|
||||||
|
```bash
|
||||||
|
RUST_LOG=sqlx=debug cargo run
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Debugging
|
||||||
|
|
||||||
|
1. **React DevTools**: Install browser extension
|
||||||
|
2. **Redux DevTools**: For state debugging
|
||||||
|
3. **Network tab**: Monitor API calls
|
||||||
|
4. **Console debugging**: Strategic `console.log`
|
||||||
|
|
||||||
|
## Performance Profiling
|
||||||
|
|
||||||
|
### Backend Profiling
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# CPU profiling with flamegraph
|
||||||
|
cargo install flamegraph
|
||||||
|
cargo flamegraph --bin readur
|
||||||
|
|
||||||
|
# Memory profiling
|
||||||
|
valgrind --tool=massif target/release/readur
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Profiling
|
||||||
|
|
||||||
|
1. Use Chrome DevTools Performance tab
|
||||||
|
2. React Profiler for component performance
|
||||||
|
3. Lighthouse for overall performance audit
|
||||||
|
|
||||||
|
### Database Profiling
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Enable query timing
|
||||||
|
\timing on
|
||||||
|
|
||||||
|
-- Analyze query plan
|
||||||
|
EXPLAIN ANALYZE SELECT * FROM documents WHERE ...;
|
||||||
|
|
||||||
|
-- Check slow queries
|
||||||
|
SELECT * FROM pg_stat_statements
|
||||||
|
ORDER BY total_time DESC
|
||||||
|
LIMIT 10;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Additional Resources
|
||||||
|
|
||||||
|
- [Rust Book](https://doc.rust-lang.org/book/)
|
||||||
|
- [React Documentation](https://react.dev/)
|
||||||
|
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
|
||||||
|
- [Tesseract Documentation](https://tesseract-ocr.github.io/)
|
||||||
|
- [Testing Guide](TESTING.md)
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
- **GitHub Issues**: For bug reports and feature requests
|
||||||
|
- **GitHub Discussions**: For questions and community support
|
||||||
|
- **Discord**: Join our community server (link in README)
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
By contributing to Readur, you agree that your contributions will be licensed under the MIT License.
|
||||||
|
|
@ -0,0 +1,175 @@
|
||||||
|
# Installation Guide
|
||||||
|
|
||||||
|
This guide covers various methods to install and run Readur, from quick Docker deployment to manual installation.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Quick Start with Docker Compose](#quick-start-with-docker-compose)
|
||||||
|
- [System Requirements](#system-requirements)
|
||||||
|
- [Manual Installation](#manual-installation)
|
||||||
|
- [Prerequisites](#prerequisites)
|
||||||
|
- [Backend Setup](#backend-setup)
|
||||||
|
- [Frontend Setup](#frontend-setup)
|
||||||
|
- [Verifying Installation](#verifying-installation)
|
||||||
|
|
||||||
|
## Quick Start with Docker Compose
|
||||||
|
|
||||||
|
The fastest way to get Readur running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the repository
|
||||||
|
git clone https://github.com/perfectra1n/readur
|
||||||
|
cd readur
|
||||||
|
|
||||||
|
# Start all services
|
||||||
|
docker compose up --build -d
|
||||||
|
|
||||||
|
# Access the application
|
||||||
|
open http://localhost:8000
|
||||||
|
```
|
||||||
|
|
||||||
|
**Default login credentials:**
|
||||||
|
- Username: `admin`
|
||||||
|
- Password: `readur2024`
|
||||||
|
|
||||||
|
> ⚠️ **Important**: Change the default admin password immediately after first login!
|
||||||
|
|
||||||
|
### What You Get
|
||||||
|
|
||||||
|
After deployment, you'll have:
|
||||||
|
- **Web Interface**: Modern document management UI at `http://localhost:8000`
|
||||||
|
- **PostgreSQL Database**: Document metadata and full-text search indexes
|
||||||
|
- **File Storage**: Persistent document storage with OCR processing
|
||||||
|
- **Watch Folder**: Automatic file ingestion from mounted directories
|
||||||
|
- **REST API**: Full API access for integrations
|
||||||
|
|
||||||
|
## System Requirements
|
||||||
|
|
||||||
|
### Minimum Requirements
|
||||||
|
- **CPU**: 2 cores
|
||||||
|
- **RAM**: 2GB
|
||||||
|
- **Storage**: 10GB free space
|
||||||
|
- **OS**: Linux, macOS, or Windows with Docker
|
||||||
|
|
||||||
|
### Recommended for Production
|
||||||
|
- **CPU**: 4+ cores
|
||||||
|
- **RAM**: 4GB+
|
||||||
|
- **Storage**: 50GB+ SSD
|
||||||
|
- **Network**: Stable internet connection for OCR processing
|
||||||
|
|
||||||
|
## Manual Installation
|
||||||
|
|
||||||
|
For development or custom deployments without Docker:
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
Install these dependencies on your system:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Ubuntu/Debian
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y \
|
||||||
|
tesseract-ocr tesseract-ocr-eng \
|
||||||
|
libtesseract-dev libleptonica-dev \
|
||||||
|
postgresql postgresql-contrib \
|
||||||
|
pkg-config libclang-dev
|
||||||
|
|
||||||
|
# macOS (requires Homebrew)
|
||||||
|
brew install tesseract leptonica postgresql rust nodejs npm
|
||||||
|
|
||||||
|
# Install Rust (if not already installed)
|
||||||
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backend Setup
|
||||||
|
|
||||||
|
1. **Configure Database**:
|
||||||
|
```bash
|
||||||
|
# Create database and user
|
||||||
|
sudo -u postgres psql
|
||||||
|
CREATE DATABASE readur;
|
||||||
|
CREATE USER readur_user WITH ENCRYPTED PASSWORD 'your_password';
|
||||||
|
GRANT ALL PRIVILEGES ON DATABASE readur TO readur_user;
|
||||||
|
\q
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Environment Configuration**:
|
||||||
|
```bash
|
||||||
|
# Copy environment template
|
||||||
|
cp .env.example .env
|
||||||
|
|
||||||
|
# Edit configuration
|
||||||
|
nano .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Required environment variables:
|
||||||
|
```env
|
||||||
|
DATABASE_URL=postgresql://readur_user:your_password@localhost/readur
|
||||||
|
JWT_SECRET=your-super-secret-jwt-key-change-this
|
||||||
|
SERVER_ADDRESS=0.0.0.0:8000
|
||||||
|
UPLOAD_PATH=./uploads
|
||||||
|
WATCH_FOLDER=./watch
|
||||||
|
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,gif,bmp,tiff,txt,rtf,doc,docx
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Build and Run Backend**:
|
||||||
|
```bash
|
||||||
|
# Install dependencies and run
|
||||||
|
cargo build --release
|
||||||
|
cargo run
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Setup
|
||||||
|
|
||||||
|
1. **Install Dependencies**:
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Development Mode**:
|
||||||
|
```bash
|
||||||
|
npm run dev
|
||||||
|
# Frontend available at http://localhost:5173
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Production Build**:
|
||||||
|
```bash
|
||||||
|
npm run build
|
||||||
|
# Built files in frontend/dist/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verifying Installation
|
||||||
|
|
||||||
|
After installation, verify everything is working:
|
||||||
|
|
||||||
|
1. **Check Backend Health**:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8000/api/health
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Access Web Interface**:
|
||||||
|
- Navigate to `http://localhost:8000`
|
||||||
|
- Log in with default credentials
|
||||||
|
- Upload a test document
|
||||||
|
|
||||||
|
3. **Verify Database Connection**:
|
||||||
|
```bash
|
||||||
|
# For Docker installation
|
||||||
|
docker exec -it readur-postgres-1 psql -U readur -c "\dt"
|
||||||
|
|
||||||
|
# For manual installation
|
||||||
|
psql -U readur_user -d readur -c "\dt"
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Check OCR Functionality**:
|
||||||
|
- Upload a PDF or image file
|
||||||
|
- Wait for processing to complete
|
||||||
|
- Search for text content from the uploaded file
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- [Configure Readur](configuration.md) for your specific needs
|
||||||
|
- Set up [production deployment](deployment.md) with SSL and proper security
|
||||||
|
- Read the [User Guide](user-guide.md) to learn about all features
|
||||||
|
- Explore the [API Reference](api-reference.md) for integrations
|
||||||
|
|
@ -0,0 +1,282 @@
|
||||||
|
# User Guide
|
||||||
|
|
||||||
|
A comprehensive guide to using Readur's features for document management, OCR processing, and search.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Getting Started](#getting-started)
|
||||||
|
- [Supported File Types](#supported-file-types)
|
||||||
|
- [Using the Interface](#using-the-interface)
|
||||||
|
- [Dashboard](#dashboard)
|
||||||
|
- [Document Management](#document-management)
|
||||||
|
- [Advanced Search](#advanced-search)
|
||||||
|
- [Folder Watching](#folder-watching)
|
||||||
|
- [Document Upload](#document-upload)
|
||||||
|
- [OCR Processing](#ocr-processing)
|
||||||
|
- [Search Features](#search-features)
|
||||||
|
- [Tags and Organization](#tags-and-organization)
|
||||||
|
- [User Settings](#user-settings)
|
||||||
|
- [Tips for Best Results](#tips-for-best-results)
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
1. **First Login**:
|
||||||
|
- Navigate to `http://localhost:8000` (or your configured URL)
|
||||||
|
- Use the default admin credentials (username: `admin`, password: `readur2024`)
|
||||||
|
- **Important**: Change the default password immediately
|
||||||
|
|
||||||
|
2. **Initial Setup**:
|
||||||
|
- Configure your user preferences
|
||||||
|
- Set OCR language if different from English
|
||||||
|
- Adjust search and display settings
|
||||||
|
|
||||||
|
3. **Quick Start**:
|
||||||
|
- Upload your first document using drag-and-drop or the upload button
|
||||||
|
- Wait for OCR processing to complete
|
||||||
|
- Search for content within your documents
|
||||||
|
|
||||||
|
## Supported File Types
|
||||||
|
|
||||||
|
| Type | Extensions | OCR Support | Notes |
|
||||||
|
|------|-----------|-------------|-------|
|
||||||
|
| **PDF** | `.pdf` | ✅ | Text extraction + OCR for scanned pages |
|
||||||
|
| **Images** | `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | ✅ | Full OCR text extraction |
|
||||||
|
| **Text** | `.txt`, `.rtf` | ❌ | Direct text indexing |
|
||||||
|
| **Office** | `.doc`, `.docx` | ⚠️ | Limited support |
|
||||||
|
|
||||||
|
## Using the Interface
|
||||||
|
|
||||||
|
### Dashboard
|
||||||
|
|
||||||
|
The dashboard provides an overview of your document system:
|
||||||
|
|
||||||
|
- **Document Statistics**:
|
||||||
|
- Total documents in the system
|
||||||
|
- Storage usage breakdown
|
||||||
|
- OCR processing status
|
||||||
|
- Recent activity timeline
|
||||||
|
|
||||||
|
- **Quick Actions**:
|
||||||
|
- Upload new documents
|
||||||
|
- Quick search bar
|
||||||
|
- Access to recent documents
|
||||||
|
- System notifications
|
||||||
|
|
||||||
|
### Document Management
|
||||||
|
|
||||||
|
#### List/Grid View
|
||||||
|
- **List View**: Detailed document information in a table format
|
||||||
|
- **Grid View**: Visual thumbnails for quick browsing
|
||||||
|
- Toggle between views using the view selector in the top toolbar
|
||||||
|
|
||||||
|
#### Sorting Options
|
||||||
|
- Upload date (newest/oldest first)
|
||||||
|
- File name (A-Z/Z-A)
|
||||||
|
- File size (largest/smallest)
|
||||||
|
- Document type
|
||||||
|
- OCR status
|
||||||
|
|
||||||
|
#### Filtering
|
||||||
|
- By file type (PDF, images, text)
|
||||||
|
- By OCR status (completed, pending, failed)
|
||||||
|
- By date range
|
||||||
|
- By tags
|
||||||
|
- By source (uploaded, watched folder)
|
||||||
|
|
||||||
|
#### Bulk Actions
|
||||||
|
1. Select multiple documents using checkboxes
|
||||||
|
2. Available bulk actions:
|
||||||
|
- Delete selected documents
|
||||||
|
- Add/remove tags
|
||||||
|
- Export document list
|
||||||
|
- Reprocess OCR
|
||||||
|
|
||||||
|
### Advanced Search
|
||||||
|
|
||||||
|
Readur offers powerful search capabilities:
|
||||||
|
|
||||||
|
#### Full-Text Search
|
||||||
|
- Search within document content
|
||||||
|
- Automatic stemming and fuzzy matching
|
||||||
|
- Phrase search with quotes: `"exact phrase"`
|
||||||
|
- Exclude terms with minus: `-excluded`
|
||||||
|
|
||||||
|
#### Search Filters
|
||||||
|
- **Date Range**: Find documents from specific time periods
|
||||||
|
- **File Type**: Limit search to specific formats
|
||||||
|
- **File Size**: Filter by document size
|
||||||
|
- **OCR Status**: Only search processed documents
|
||||||
|
- **Tags**: Search within tagged documents
|
||||||
|
|
||||||
|
#### Search Syntax
|
||||||
|
```
|
||||||
|
invoice 2024 # Find documents with both terms
|
||||||
|
"quarterly report" # Exact phrase search
|
||||||
|
invoice -draft # Exclude drafts
|
||||||
|
tag:important invoice # Search within tagged documents
|
||||||
|
type:pdf contract # Search only PDFs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Folder Watching
|
||||||
|
|
||||||
|
The folder watching feature automatically imports documents:
|
||||||
|
|
||||||
|
1. **Non-destructive**: Source files remain untouched
|
||||||
|
2. **Automatic Processing**: New files are detected and processed
|
||||||
|
3. **Configurable Intervals**: Adjust scan frequency
|
||||||
|
4. **Multiple Sources**: Watch local folders, network drives, cloud storage
|
||||||
|
|
||||||
|
#### Setting Up Watch Folders
|
||||||
|
1. Go to Settings → Sources
|
||||||
|
2. Add a new source with type "Local Folder"
|
||||||
|
3. Configure the path and scan interval
|
||||||
|
4. Enable/disable the source as needed
|
||||||
|
|
||||||
|
## Document Upload
|
||||||
|
|
||||||
|
### Manual Upload
|
||||||
|
1. Click the upload button or drag files to the upload area
|
||||||
|
2. Select one or multiple files
|
||||||
|
3. Add tags during upload (optional)
|
||||||
|
4. Click "Upload" to start processing
|
||||||
|
|
||||||
|
### Drag and Drop
|
||||||
|
- Drag files directly from your file manager
|
||||||
|
- Drop anywhere on the document list page
|
||||||
|
- Multiple files can be dropped at once
|
||||||
|
|
||||||
|
### Upload Limits
|
||||||
|
- Maximum file size: Configurable (default 50MB)
|
||||||
|
- Supported formats: See [Supported File Types](#supported-file-types)
|
||||||
|
- Batch upload: Up to 100 files at once
|
||||||
|
|
||||||
|
## OCR Processing
|
||||||
|
|
||||||
|
### Automatic OCR
|
||||||
|
- Starts automatically after upload
|
||||||
|
- Processes documents in background
|
||||||
|
- Priority queue for smaller files
|
||||||
|
|
||||||
|
### OCR Settings
|
||||||
|
- **Language**: Select from 100+ languages
|
||||||
|
- **Preprocessing**: Enable image enhancement
|
||||||
|
- **Auto-rotation**: Correct document orientation
|
||||||
|
- **Quality**: Balance between speed and accuracy
|
||||||
|
|
||||||
|
### OCR Status Indicators
|
||||||
|
- 🟢 **Completed**: Full text extracted
|
||||||
|
- 🟡 **Processing**: OCR in progress
|
||||||
|
- 🔴 **Failed**: Error during processing
|
||||||
|
- ⚪ **Pending**: Waiting in queue
|
||||||
|
|
||||||
|
## Search Features
|
||||||
|
|
||||||
|
### Quick Search
|
||||||
|
- Available in the header on all pages
|
||||||
|
- Instant results as you type
|
||||||
|
- Shows top 5 matches with snippets
|
||||||
|
|
||||||
|
### Advanced Search Page
|
||||||
|
- Full search interface with all filters
|
||||||
|
- Export search results
|
||||||
|
- Save frequently used searches
|
||||||
|
- Search history
|
||||||
|
|
||||||
|
### Search Tips
|
||||||
|
1. Use quotes for exact phrases
|
||||||
|
2. Combine filters for precise results
|
||||||
|
3. Use wildcards: `inv*` matches invoice, inventory
|
||||||
|
4. Search in specific fields: `filename:report`
|
||||||
|
|
||||||
|
## Tags and Organization
|
||||||
|
|
||||||
|
### Creating Tags
|
||||||
|
1. Select document(s)
|
||||||
|
2. Click "Add Tag"
|
||||||
|
3. Enter tag name or select existing
|
||||||
|
4. Tags are color-coded for easy identification
|
||||||
|
|
||||||
|
### Tag Management
|
||||||
|
- Rename tags globally
|
||||||
|
- Merge similar tags
|
||||||
|
- Delete unused tags
|
||||||
|
- Set tag colors
|
||||||
|
|
||||||
|
### Smart Collections
|
||||||
|
Create saved searches based on:
|
||||||
|
- Tag combinations
|
||||||
|
- Date ranges
|
||||||
|
- File types
|
||||||
|
- Custom criteria
|
||||||
|
|
||||||
|
## User Settings
|
||||||
|
|
||||||
|
### Personal Preferences
|
||||||
|
- **Display**: List/grid default view
|
||||||
|
- **Language**: Interface language
|
||||||
|
- **Time Zone**: For accurate timestamps
|
||||||
|
- **Notifications**: Email/in-app alerts
|
||||||
|
|
||||||
|
### OCR Preferences
|
||||||
|
- Default OCR language
|
||||||
|
- Processing priority
|
||||||
|
- Image preprocessing options
|
||||||
|
- Batch size limits
|
||||||
|
|
||||||
|
### Search Settings
|
||||||
|
- Results per page
|
||||||
|
- Default sort order
|
||||||
|
- Snippet length
|
||||||
|
- Fuzzy search threshold
|
||||||
|
|
||||||
|
## Tips for Best Results
|
||||||
|
|
||||||
|
### OCR Quality
|
||||||
|
1. **Higher Resolution**: 300+ DPI produces better OCR results
|
||||||
|
2. **Clean Scans**: Avoid skewed or dirty documents
|
||||||
|
3. **Good Lighting**: For photo captures, ensure even lighting
|
||||||
|
4. **Text Contrast**: Black text on white background works best
|
||||||
|
|
||||||
|
### File Organization
|
||||||
|
1. **Consistent Naming**: Use descriptive, consistent file names
|
||||||
|
2. **Regular Uploads**: Don't let documents pile up
|
||||||
|
3. **Use Tags**: Tag documents immediately after upload
|
||||||
|
4. **Folder Structure**: Organize watch folders logically
|
||||||
|
|
||||||
|
### Search Optimization
|
||||||
|
1. **Use Filters**: Combine text search with filters
|
||||||
|
2. **Save Searches**: Save frequently used search queries
|
||||||
|
3. **Learn Syntax**: Master search operators for better results
|
||||||
|
4. **Index Regularly**: Ensure all documents are processed
|
||||||
|
|
||||||
|
### Performance Tips
|
||||||
|
1. **Batch Processing**: Upload similar documents together
|
||||||
|
2. **Off-Peak Hours**: Schedule large uploads during low-usage times
|
||||||
|
3. **Monitor Queue**: Check OCR queue status regularly
|
||||||
|
4. **Clean Up**: Remove outdated documents periodically
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**OCR Not Starting**
|
||||||
|
- Check file size limits
|
||||||
|
- Verify supported file format
|
||||||
|
- Ensure OCR service is running
|
||||||
|
|
||||||
|
**Search Not Finding Documents**
|
||||||
|
- Confirm OCR completed successfully
|
||||||
|
- Check search syntax
|
||||||
|
- Try broader search terms
|
||||||
|
|
||||||
|
**Slow Performance**
|
||||||
|
- Review concurrent OCR job settings
|
||||||
|
- Check system resources
|
||||||
|
- Consider increasing memory limits
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Explore the [API Reference](api-reference.md) for automation
|
||||||
|
- Learn about [advanced configuration](configuration.md)
|
||||||
|
- Set up [automated workflows](WATCH_FOLDER.md)
|
||||||
|
- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md)
|
||||||
Loading…
Reference in New Issue