fix(tests): resolve issue in tests due to new s3 config

This commit is contained in:
perf3ct 2025-08-01 19:17:22 +00:00
parent 862c36aa72
commit 21bc1167e8
No known key found for this signature in database
GPG Key ID: 569C4EEC436F5232
5 changed files with 1039 additions and 1 deletions

View File

@ -0,0 +1,340 @@
# Command-Line Tools Reference
## Overview
Readur includes several command-line utilities for system administration and maintenance. These tools are designed for system administrators and DevOps teams managing Readur deployments.
## migrate_to_s3
**Purpose:** Migrate document storage between backends (Local ↔ S3)
### Usage
```bash
migrate_to_s3 [OPTIONS]
```
### Command Options
| Option | Description | Example |
|--------|-------------|---------|
| `--dry-run` | Test migration without making changes | `--dry-run` |
| `--enable-rollback` | Enable rollback capabilities with state tracking | `--enable-rollback` |
| `--user-id <UUID>` | Migrate documents for specific user only | `--user-id "123e4567-..."` |
| `--resume-from <FILE>` | Resume migration from saved state file | `--resume-from /tmp/state.json` |
| `--rollback <FILE>` | Rollback previous migration using state file | `--rollback /tmp/state.json` |
| `--batch-size <NUM>` | Number of documents to process per batch | `--batch-size 1000` |
| `--parallel-uploads <NUM>` | Maximum concurrent S3 uploads | `--parallel-uploads 5` |
| `--verbose` | Enable detailed output and progress logging | `--verbose` |
| `--audit-files` | Check file system consistency before migration | `--audit-files` |
| `--status` | Show status of current/recent migrations | `--status` |
| `--help` | Display help information | `--help` |
### Examples
#### Basic Migration
```bash
# Test migration first
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
# Run actual migration with safety features
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# Verbose migration with custom batch size
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --verbose --batch-size 500
```
#### User-Specific Migration
```bash
# Get user IDs from database
docker exec readur-app psql -d readur -c \
"SELECT id, email FROM users WHERE email LIKE '%@company.com';"
# Migrate specific user's documents
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --user-id "uuid-from-above"
```
#### Recovery Operations
```bash
# Resume interrupted migration
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--resume-from /tmp/migration_state_20241201_143022.json
# Rollback completed migration
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--rollback /tmp/migration_state_20241201_143022.json
# Check migration status
docker exec readur-app cargo run --bin migrate_to_s3 -- --status
```
#### Performance Optimization
```bash
# High-performance migration for large datasets
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 2000 \
--parallel-uploads 10 \
--verbose
# Conservative migration for limited resources
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 100 \
--parallel-uploads 2
```
### State Files
The migration tool creates state files to track progress and enable recovery:
**Location:** `/tmp/migration_state_YYYYMMDD_HHMMSS.json`
**Contents:**
```json
{
"migration_id": "uuid",
"started_at": "2024-12-01T14:30:22Z",
"completed_migrations": [
{
"document_id": "uuid",
"original_path": "/app/uploads/doc.pdf",
"s3_key": "documents/user123/doc.pdf",
"migrated_at": "2024-12-01T14:31:15Z"
}
],
"failed_migrations": [],
"total_files": 2500,
"processed_files": 1247,
"rollback_enabled": true
}
```
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | General error |
| 2 | Configuration error |
| 3 | Database connection error |
| 4 | S3 access error |
| 5 | File system error |
| 10 | Migration already in progress |
| 11 | State file not found |
| 12 | Rollback failed |
## enqueue_pending_ocr
**Purpose:** Add documents with pending OCR status to the processing queue
### Usage
```bash
docker exec readur-app cargo run --bin enqueue_pending_ocr
```
### Description
This utility addresses situations where documents are marked as pending OCR in the database but haven't been added to the OCR processing queue. This can happen after:
- Database restoration
- System crashes during OCR processing
- Migration from older versions
### Example Output
```
🔍 Scanning for documents with pending OCR status...
📊 Found 45 documents with pending OCR status
🚀 Enqueuing documents for OCR processing...
✅ Successfully enqueued 45 documents
⏱️ Average queue priority: 5
📈 Current queue size: 127 items
```
### When to Use
- After restoring from database backup
- When OCR queue appears empty but documents show "pending" status
- Following system recovery or migration
- As part of maintenance procedures
## test_runner
**Purpose:** Execute comprehensive test suites with detailed reporting
### Usage
```bash
docker exec readur-app cargo run --bin test_runner [OPTIONS]
```
### Options
| Option | Description |
|--------|-------------|
| `--unit` | Run unit tests only |
| `--integration` | Run integration tests only |
| `--e2e` | Run end-to-end tests only |
| `--verbose` | Detailed test output |
| `--parallel <N>` | Number of parallel test threads |
### Examples
```bash
# Run all tests
docker exec readur-app cargo run --bin test_runner
# Run only integration tests with verbose output
docker exec readur-app cargo run --bin test_runner -- --integration --verbose
# Run tests with limited parallelism
docker exec readur-app cargo run --bin test_runner -- --parallel 2
```
## General Usage Patterns
### Docker Deployments
For Docker-based Readur deployments:
```bash
# General pattern
docker exec readur-app cargo run --bin <tool-name> -- [OPTIONS]
# With environment variables
docker exec -e S3_BUCKET_NAME=my-bucket readur-app \
cargo run --bin migrate_to_s3 -- --dry-run
# Interactive mode (if needed)
docker exec -it readur-app cargo run --bin migrate_to_s3 -- --help
```
### Direct Deployments
For direct server deployments:
```bash
# Ensure proper working directory
cd /path/to/readur
# Run with production environment
RUST_ENV=production ./target/release/migrate_to_s3 --dry-run
# With custom configuration
DATABASE_URL="postgresql://..." ./target/release/migrate_to_s3 --status
```
### Kubernetes Deployments
For Kubernetes environments:
```bash
# Find the pod name
kubectl get pods -l app=readur
# Execute tool in pod
kubectl exec deployment/readur -- \
cargo run --bin migrate_to_s3 -- --dry-run
# With environment variable override
kubectl exec deployment/readur -e S3_REGION=eu-west-1 -- \
cargo run --bin migrate_to_s3 -- --status
```
## Best Practices
### Before Running Tools
1. **Backup data** - Always backup database and files
2. **Test in staging** - Try commands in non-production first
3. **Check resources** - Ensure sufficient CPU, memory, disk space
4. **Verify access** - Confirm database and S3 connectivity
### During Execution
1. **Monitor progress** - Watch logs and system resources
2. **Keep sessions active** - Use `screen` or `tmux` for long operations
3. **Save output** - Redirect output to files for later analysis
4. **Document actions** - Keep notes of commands and results
### After Completion
1. **Verify results** - Check that operations completed successfully
2. **Clean up** - Remove temporary files and state data if appropriate
3. **Update documentation** - Record any configuration changes
4. **Monitor application** - Watch for any issues after changes
## Environment Variables
Common environment variables used by CLI tools:
| Variable | Purpose | Example |
|----------|---------|---------|
| `DATABASE_URL` | PostgreSQL connection string | `postgresql://user:pass@host:5432/readur` |
| `S3_BUCKET_NAME` | Target S3 bucket | `my-company-readur` |
| `S3_ACCESS_KEY_ID` | AWS access key | `AKIA...` |
| `S3_SECRET_ACCESS_KEY` | AWS secret key | `...` |
| `S3_REGION` | AWS region | `us-east-1` |
| `S3_ENDPOINT` | Custom S3 endpoint | `https://minio.company.com` |
| `RUST_LOG` | Logging level | `debug`, `info`, `warn`, `error` |
| `RUST_BACKTRACE` | Error backtraces | `1` or `full` |
## Troubleshooting
### Common Issues
1. **Permission Denied**
```bash
# Check container user
docker exec readur-app whoami
# Fix file permissions if needed
docker exec readur-app chown -R readur:readur /app/uploads
```
2. **Tool Not Found**
```bash
# List available binaries
docker exec readur-app find target/release -name "*migrate*" -type f
# Build tools if missing
docker exec readur-app cargo build --release --bins
```
3. **Database Connection Issues**
```bash
# Test database connectivity
docker exec readur-app psql -d readur -c "SELECT version();"
# Check environment variables
docker exec readur-app env | grep DATABASE_URL
```
### Getting Help
For each tool, use the `--help` flag:
```bash
docker exec readur-app cargo run --bin migrate_to_s3 -- --help
docker exec readur-app cargo run --bin enqueue_pending_ocr -- --help
```
### Logging and Debugging
Enable detailed logging:
```bash
# Debug level logging
docker exec -e RUST_LOG=debug readur-app \
cargo run --bin migrate_to_s3 -- --verbose
# With backtrace for errors
docker exec -e RUST_BACKTRACE=1 readur-app \
cargo run --bin migrate_to_s3 -- --status
```
## Security Considerations
### Access Control
- CLI tools should only be run by system administrators
- Use proper Docker user contexts
- Limit access to state files containing sensitive information
### Credential Handling
- Never log full credentials or API keys
- Use environment variables instead of command-line parameters
- Rotate credentials after major operations
### Network Security
- Ensure TLS/HTTPS for all S3 communications
- Use VPN or private networks when possible
- Monitor network traffic during migrations
Remember: These tools have significant impact on your Readur deployment. Always test in non-production environments first and maintain proper backups.

View File

@ -0,0 +1,401 @@
# Migration Troubleshooting Guide
## Common Issues and Solutions
### S3 Access Issues
#### "Access Denied" Errors
**Symptoms:**
- Migration fails with "Access Denied" messages
- S3 uploads return 403 errors
**Causes:**
- Insufficient IAM permissions
- Incorrect bucket policy
- Wrong AWS credentials
**Solutions:**
1. **Verify IAM Policy**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}
```
2. **Test S3 Access**
```bash
# Test bucket access
aws s3 ls s3://your-bucket-name/
# Test upload
echo "test" | aws s3 cp - s3://your-bucket-name/test.txt
# Clean up test file
aws s3 rm s3://your-bucket-name/test.txt
```
3. **Check Environment Variables**
```bash
# Verify credentials are set
docker exec readur-app env | grep S3_
# Should show:
# S3_BUCKET_NAME=your-bucket
# S3_ACCESS_KEY_ID=AKIA...
# S3_SECRET_ACCESS_KEY=... (hidden)
# S3_REGION=us-east-1
```
#### "Bucket Does Not Exist" Errors
**Solution:**
```bash
# Create the bucket
aws s3 mb s3://your-bucket-name --region us-east-1
# Or use different region
aws s3 mb s3://your-bucket-name --region eu-west-1
```
### Migration Interruption Issues
#### Network Timeout Errors
**Symptoms:**
- Migration stops with network timeout messages
- "Connection reset by peer" errors
**Solutions:**
1. **Resume Migration**
```bash
# Find the latest state file
docker exec readur-app ls -la /tmp/migration_state_*.json
# Resume from state
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--resume-from /tmp/migration_state_20241201_143022.json
```
2. **Reduce Batch Size**
```bash
# Process smaller batches
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --batch-size 500
```
3. **Check Network Stability**
```bash
# Test S3 connectivity
ping s3.amazonaws.com
# Test sustained transfer
aws s3 cp /dev/zero s3://your-bucket/test-10mb --expected-size 10485760
```
#### Server Restart During Migration
**Solution:**
```bash
# Check for existing state files
docker exec readur-app find /tmp -name "migration_state_*.json" -mtime -1
# Resume from most recent state
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--resume-from /tmp/migration_state_LATEST.json
```
### Database Issues
#### "Database Connection Lost"
**Symptoms:**
- Migration fails with database connection errors
- PostgreSQL timeout messages
**Solutions:**
1. **Check Database Status**
```bash
# Test database connection
docker exec readur-app psql -d readur -c "SELECT version();"
# Check connection limits
docker exec readur-app psql -d readur -c \
"SELECT setting FROM pg_settings WHERE name = 'max_connections';"
```
2. **Increase Connection Timeout**
```bash
# Add to environment variables
export DATABASE_TIMEOUT=300 # 5 minutes
```
3. **Check Transaction Locks**
```bash
# Look for long-running transactions
docker exec readur-app psql -d readur -c \
"SELECT pid, state, query_start, query FROM pg_stat_activity WHERE state != 'idle';"
```
#### "Transaction Rollback" Errors
**Solution:**
```bash
# Check for conflicting processes
docker exec readur-app psql -d readur -c \
"SELECT * FROM pg_locks WHERE NOT granted;"
# Restart migration with fresh transaction
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --fresh-start
```
### File System Issues
#### "File Not Found" Errors
**Symptoms:**
- Migration reports files in database that don't exist on disk
- Inconsistent file counts
**Solutions:**
1. **Audit File Consistency**
```bash
# Check for orphaned database records
docker exec readur-app cargo run --bin migrate_to_s3 -- --audit-files
```
2. **Clean Up Database**
```bash
# Remove orphaned records (BE CAREFUL!)
docker exec readur-app psql -d readur -c \
"DELETE FROM documents WHERE file_path NOT IN (
SELECT DISTINCT file_path FROM documents
WHERE file_path IS NOT NULL
);"
```
#### "Permission Denied" on Local Files
**Solution:**
```bash
# Check file permissions
docker exec readur-app ls -la /app/uploads/
# Fix permissions if needed
docker exec readur-app chown -R readur:readur /app/uploads/
```
### Performance Issues
#### Very Slow Migration
**Symptoms:**
- Migration takes much longer than expected
- Low upload speeds to S3
**Solutions:**
1. **Check Network Performance**
```bash
# Test upload speed to S3
dd if=/dev/zero bs=1M count=100 | aws s3 cp - s3://your-bucket/speedtest.dat
# Check result and clean up
aws s3 rm s3://your-bucket/speedtest.dat
```
2. **Optimize Migration Settings**
```bash
# Increase parallel uploads
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --parallel-uploads 10
```
3. **Use Multipart Upload Threshold**
```bash
# Lower threshold for multipart uploads
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --multipart-threshold 50MB
```
#### High Memory Usage
**Solution:**
```bash
# Monitor container memory
docker stats readur-app
# Reduce batch size if needed
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --batch-size 100
```
### Rollback Issues
#### "Cannot Rollback - State File Missing"
**Solution:**
```bash
# Look for backup state files
docker exec readur-app find /tmp -name "*migration*" -type f
# Manual rollback (use with caution)
docker exec readur-app psql -d readur -c \
"UPDATE documents SET file_path = REPLACE(file_path, 's3://', '/app/uploads/');"
```
#### "Partial Rollback Completed"
**Symptoms:**
- Some files rolled back, others still in S3
- Database in inconsistent state
**Solution:**
```bash
# Complete the rollback manually
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--force-rollback --state-file /tmp/migration_state_backup.json
# Verify database consistency
docker exec readur-app cargo run --bin migrate_to_s3 -- --verify-state
```
## Validation Commands
### Pre-Migration Checks
```bash
# Check database connectivity
docker exec readur-app psql -d readur -c "SELECT COUNT(*) FROM documents;"
# Verify S3 access
aws s3 ls s3://your-bucket-name/
# Check disk space
docker exec readur-app df -h /app/uploads/
```
### Post-Migration Validation
```bash
# Compare document counts
LOCAL_COUNT=$(docker exec readur-app find /app/uploads -type f | wc -l)
DB_COUNT=$(docker exec readur-app psql -d readur -t -c "SELECT COUNT(*) FROM documents;")
S3_COUNT=$(aws s3 ls s3://your-bucket/documents/ --recursive | wc -l)
echo "Local files: $LOCAL_COUNT"
echo "Database records: $DB_COUNT"
echo "S3 objects: $S3_COUNT"
# Test random document access
RANDOM_DOC=$(docker exec readur-app psql -d readur -t -c \
"SELECT id FROM documents ORDER BY RANDOM() LIMIT 1;")
curl -I "https://your-readur-instance.com/api/documents/$RANDOM_DOC/download"
```
### Health Checks
```bash
# Application health
curl -f https://your-readur-instance.com/health
# Storage backend test
docker exec readur-app cargo run --bin migrate_to_s3 -- --test-storage
# Database integrity
docker exec readur-app psql -d readur -c \
"SELECT COUNT(*) FROM documents WHERE file_path IS NULL OR file_path = '';"
```
## Recovery Procedures
### Emergency Stop
```bash
# Stop running migration
docker exec readur-app pkill -f migrate_to_s3
# Check for partial state
ls /tmp/migration_state_*.json
# Decide: resume, rollback, or restart
```
### Data Recovery
```bash
# Restore from backup (if needed)
docker exec -i readur-db psql -U readur -d readur < readur_backup.sql
# Restore files from backup
docker exec readur-app tar -xzf documents_backup.tar.gz -C /
```
### Clean Start
```bash
# Remove all migration state files
docker exec readur-app rm -f /tmp/migration_state_*.json
# Start fresh migration
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --fresh-start
```
## Getting Help
### Log Analysis
```bash
# View migration logs
docker logs readur-app | grep -i migration
# Check application logs
docker exec readur-app tail -f /var/log/readur/app.log
# Database logs
docker logs readur-db | tail -100
```
### Debug Information
When reporting issues, include:
1. **Migration command used**
2. **Error messages** (full text)
3. **State file contents** (if available)
4. **System information**:
```bash
docker --version
docker exec readur-app cargo --version
docker exec readur-app psql --version
```
5. **Environment configuration** (sanitized):
```bash
docker exec readur-app env | grep -E "(S3_|DATABASE_)" | sed 's/=.*/=***/'
```
### Support Checklist
Before requesting support:
- [ ] Checked this troubleshooting guide
- [ ] Reviewed application logs
- [ ] Verified S3 credentials and permissions
- [ ] Tested basic S3 connectivity
- [ ] Confirmed database is accessible
- [ ] Have backup of data before migration
- [ ] Can provide error messages and command used
## Prevention Tips
1. **Always test in staging first**
2. **Use dry-run mode before real migration**
3. **Ensure adequate disk space and memory**
4. **Verify S3 permissions before starting**
5. **Keep multiple backup copies**
6. **Monitor migration progress actively**
7. **Have rollback plan ready**
Remember: When in doubt, it's better to rollback and investigate than to continue with a problematic migration.

View File

@ -0,0 +1,286 @@
# Storage Migration Guide
## Overview
Readur supports migrating documents between storage backends (Local ↔ S3) using a built-in migration tool. This enterprise-grade utility ensures safe, reliable data migration with comprehensive rollback capabilities.
## When You Need This
- **Moving from local filesystem to S3 cloud storage**
- **Switching between S3 buckets or regions**
- **Disaster recovery scenarios**
- **Infrastructure upgrades or server migrations**
- **Scaling to cloud-based storage**
## Migration Tool Features
**Dry-run mode** - Test migration without making any changes
**Progress tracking** - Resume interrupted migrations from saved state
**Rollback capability** - Complete undo functionality if needed
**Batch processing** - Efficiently handle large datasets
**Associated files** - Automatically migrates thumbnails & processed images
**Data integrity** - Verifies successful uploads before cleanup
**Selective migration** - Migrate specific users or document sets
## Prerequisites
### System Requirements
- Admin access to your Readur deployment
- Ability to run commands on the server (Docker exec or direct access)
- Sufficient disk space for temporary files during migration
- Network connectivity to target storage (S3)
### Before You Start
1. **Complete database backup**
```bash
pg_dump readur > readur_backup_$(date +%Y%m%d).sql
```
2. **File system backup** (if migrating from local storage)
```bash
tar -czf documents_backup_$(date +%Y%m%d).tar.gz /path/to/readur/uploads
```
3. **S3 credentials configured** (for S3 migrations)
- Verify bucket access and permissions
- Test connectivity with AWS CLI
## Step-by-Step Migration Process
### Step 1: Configure Target Storage
For S3 migrations, ensure environment variables are set:
```bash
# Required S3 configuration
export S3_BUCKET_NAME="your-readur-bucket"
export S3_ACCESS_KEY_ID="your-access-key"
export S3_SECRET_ACCESS_KEY="your-secret-key"
export S3_REGION="us-east-1"
# Optional: Custom endpoint for S3-compatible services
export S3_ENDPOINT="https://s3.amazonaws.com"
```
### Step 2: Test with Dry Run
**Always start with a dry run** to validate the migration plan:
```bash
# Docker deployment
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
# Direct deployment
./target/release/migrate_to_s3 --dry-run
# Dry run for specific user
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run --user-id "uuid-here"
```
The dry run will show:
- Number of documents to migrate
- Estimated data transfer size
- Potential issues or conflicts
- Expected migration time
### Step 3: Run the Migration
Once dry run looks good, execute the actual migration:
```bash
# Full migration with rollback enabled (recommended)
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# Migration with progress tracking
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback --verbose
# User-specific migration
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback --user-id "uuid-here"
```
### Step 4: Monitor Progress
The migration tool provides real-time progress updates:
```
📊 Migration Progress:
┌─────────────────────────────────────────────────────────────┐
│ Documents: 1,247 / 2,500 (49.9%) │
│ Data Transferred: 2.3 GB / 4.7 GB │
│ Time Elapsed: 00:15:32 │
│ ETA: 00:16:12 │
│ Current: uploading user_documents/report_2024.pdf │
└─────────────────────────────────────────────────────────────┘
```
### Step 5: Verify Migration
After completion, verify the migration was successful:
```bash
# Check migration status
docker exec readur-app cargo run --bin migrate_to_s3 -- --status
# Verify document count matches
docker exec readur-app psql -d readur -c "SELECT COUNT(*) FROM documents;"
# Test document access through API
curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://your-readur-instance.com/api/documents/sample-uuid/download"
```
### Step 6: Update Configuration
Update your deployment configuration to use the new storage backend:
```yaml
# docker-compose.yml
environment:
- STORAGE_BACKEND=s3
- S3_BUCKET_NAME=your-readur-bucket
- S3_ACCESS_KEY_ID=your-access-key
- S3_SECRET_ACCESS_KEY=your-secret-key
- S3_REGION=us-east-1
```
Restart the application to use the new storage configuration.
## Advanced Usage
### Resuming Interrupted Migrations
If a migration is interrupted, you can resume from the saved state:
```bash
# Resume from automatically saved state
docker exec readur-app cargo run --bin migrate_to_s3 -- --resume-from /tmp/migration_state.json
# Check what migrations are available to resume
ls /tmp/migration_state_*.json
```
### Rolling Back a Migration
If you need to undo a migration:
```bash
# Rollback using saved state file
docker exec readur-app cargo run --bin migrate_to_s3 -- --rollback /tmp/migration_state.json
# Verify rollback completion
docker exec readur-app cargo run --bin migrate_to_s3 -- --rollback-status
```
### Batch Processing Large Datasets
For very large document collections:
```bash
# Process in smaller batches
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 1000 \
--parallel-uploads 5
```
## Migration Scenarios
### Scenario 1: Local to S3 (Most Common)
```bash
# 1. Configure S3 credentials
export S3_BUCKET_NAME="company-readur-docs"
export S3_ACCESS_KEY_ID="AKIA..."
export S3_SECRET_ACCESS_KEY="..."
# 2. Test the migration
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
# 3. Run migration with safety features
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# 4. Update docker-compose.yml to use S3
# 5. Restart application
```
### Scenario 2: S3 to Different S3 Bucket
```bash
# 1. Configure new bucket credentials
export S3_BUCKET_NAME="new-bucket-name"
# 2. Migrate to new bucket
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# 3. Update configuration
```
### Scenario 3: Migrating Specific Users
```bash
# Get user IDs that need migration
docker exec readur-app psql -d readur -c \
"SELECT id, email FROM users WHERE created_at > '2024-01-01';"
# Migrate each user individually
for user_id in $user_ids; do
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --user-id "$user_id"
done
```
## Performance Considerations
### Optimization Tips
1. **Network Bandwidth**: Migration speed depends on upload bandwidth to S3
2. **Parallel Processing**: The tool automatically optimizes concurrent uploads
3. **Large Files**: Files over 100MB use multipart uploads for better performance
4. **Memory Usage**: Migration is designed to use minimal memory regardless of file sizes
### Expected Performance
| Document Count | Typical Time | Network Impact |
|---------------|--------------|----------------|
| < 1,000 | 5-15 minutes | Low |
| 1,000-10,000 | 30-90 minutes| Medium |
| 10,000+ | 2-8 hours | High |
## Security Considerations
### Data Protection
- All transfers use HTTPS/TLS encryption
- Original files remain until migration is verified
- Database transactions ensure consistency
- Rollback preserves original state
### Access Control
- Migration tool respects existing file permissions
- S3 bucket policies should match security requirements
- Consider enabling S3 server-side encryption
### Audit Trail
- All migration operations are logged
- State files contain complete operation history
- Failed operations are tracked for debugging
## Next Steps
After successful migration:
1. **Monitor the application** for any storage-related issues
2. **Update backup procedures** to include S3 data
3. **Configure S3 lifecycle policies** for cost optimization
4. **Set up monitoring** for S3 usage and costs
5. **Clean up local files** once confident in migration success
## Support
If you encounter issues during migration:
1. Check the [troubleshooting guide](./migration-troubleshooting.md)
2. Review application logs for detailed error messages
3. Use the `--verbose` flag for detailed migration output
4. Keep state files for support debugging
Remember: **Always test migrations in a staging environment first** when possible.

View File

@ -190,15 +190,24 @@ async fn create_test_app_state() -> Arc<AppState> {
oidc_client_secret: None, oidc_client_secret: None,
oidc_issuer_url: None, oidc_issuer_url: None,
oidc_redirect_uri: None, oidc_redirect_uri: None,
s3_enabled: false,
s3_config: None,
}; };
// Use smaller connection pool for tests to avoid exhaustion // Use smaller connection pool for tests to avoid exhaustion
let db = Database::new_with_pool_config(&database_url, 10, 2).await.unwrap(); let db = Database::new_with_pool_config(&database_url, 10, 2).await.unwrap();
let queue_service = std::sync::Arc::new(readur::ocr::queue::OcrQueueService::new(db.clone(), db.pool.clone(), 2));
// Create test file service
let storage_config = readur::storage::StorageConfig::Local { upload_path: "/tmp/test_uploads".to_string() };
let storage_backend = readur::storage::factory::create_storage_backend(storage_config).await.unwrap();
let file_service = std::sync::Arc::new(readur::services::file_service::FileService::with_storage("/tmp/test_uploads".to_string(), storage_backend));
let queue_service = std::sync::Arc::new(readur::ocr::queue::OcrQueueService::new(db.clone(), db.pool.clone(), 2, file_service.clone()));
Arc::new(AppState { Arc::new(AppState {
db: db.clone(), db: db.clone(),
config, config,
file_service,
webdav_scheduler: None, webdav_scheduler: None,
source_scheduler: None, source_scheduler: None,
queue_service, queue_service,

View File

@ -26,6 +26,8 @@ fn create_test_config_with_oidc(issuer_url: &str) -> Config {
oidc_client_secret: Some("test-client-secret".to_string()), oidc_client_secret: Some("test-client-secret".to_string()),
oidc_issuer_url: Some(issuer_url.to_string()), oidc_issuer_url: Some(issuer_url.to_string()),
oidc_redirect_uri: Some("http://localhost:8000/auth/oidc/callback".to_string()), oidc_redirect_uri: Some("http://localhost:8000/auth/oidc/callback".to_string()),
s3_enabled: false,
s3_config: None,
} }
} }