286 lines
8.5 KiB
Markdown
286 lines
8.5 KiB
Markdown
# Storage Migration Guide
|
|
|
|
## Overview
|
|
|
|
Readur supports migrating documents between storage backends (Local ↔ S3) using a built-in migration tool. This enterprise-grade utility ensures safe, reliable data migration with comprehensive rollback capabilities.
|
|
|
|
## When You Need This
|
|
|
|
- **Moving from local filesystem to S3 cloud storage**
|
|
- **Switching between S3 buckets or regions**
|
|
- **Disaster recovery scenarios**
|
|
- **Infrastructure upgrades or server migrations**
|
|
- **Scaling to cloud-based storage**
|
|
|
|
## Migration Tool Features
|
|
|
|
✅ **Dry-run mode** - Test migration without making any changes
|
|
✅ **Progress tracking** - Resume interrupted migrations from saved state
|
|
✅ **Rollback capability** - Complete undo functionality if needed
|
|
✅ **Batch processing** - Efficiently handle large datasets
|
|
✅ **Associated files** - Automatically migrates thumbnails & processed images
|
|
✅ **Data integrity** - Verifies successful uploads before cleanup
|
|
✅ **Selective migration** - Migrate specific users or document sets
|
|
|
|
## Prerequisites
|
|
|
|
### System Requirements
|
|
- Admin access to your Readur deployment
|
|
- Ability to run commands on the server (Docker exec or direct access)
|
|
- Sufficient disk space for temporary files during migration
|
|
- Network connectivity to target storage (S3)
|
|
|
|
### Before You Start
|
|
1. **Complete database backup**
|
|
```bash
|
|
pg_dump readur > readur_backup_$(date +%Y%m%d).sql
|
|
```
|
|
|
|
2. **File system backup** (if migrating from local storage)
|
|
```bash
|
|
tar -czf documents_backup_$(date +%Y%m%d).tar.gz /path/to/readur/uploads
|
|
```
|
|
|
|
3. **S3 credentials configured** (for S3 migrations)
|
|
- Verify bucket access and permissions
|
|
- Test connectivity with AWS CLI
|
|
|
|
## Step-by-Step Migration Process
|
|
|
|
### Step 1: Configure Target Storage
|
|
|
|
For S3 migrations, ensure environment variables are set:
|
|
|
|
```bash
|
|
# Required S3 configuration
|
|
export S3_BUCKET_NAME="your-readur-bucket"
|
|
export S3_ACCESS_KEY_ID="your-access-key"
|
|
export S3_SECRET_ACCESS_KEY="your-secret-key"
|
|
export S3_REGION="us-east-1"
|
|
|
|
# Optional: Custom endpoint for S3-compatible services
|
|
export S3_ENDPOINT="https://s3.amazonaws.com"
|
|
```
|
|
|
|
### Step 2: Test with Dry Run
|
|
|
|
**Always start with a dry run** to validate the migration plan:
|
|
|
|
```bash
|
|
# Docker deployment
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
|
|
|
|
# Direct deployment
|
|
./target/release/migrate_to_s3 --dry-run
|
|
|
|
# Dry run for specific user
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run --user-id "uuid-here"
|
|
```
|
|
|
|
The dry run will show:
|
|
- Number of documents to migrate
|
|
- Estimated data transfer size
|
|
- Potential issues or conflicts
|
|
- Expected migration time
|
|
|
|
### Step 3: Run the Migration
|
|
|
|
Once dry run looks good, execute the actual migration:
|
|
|
|
```bash
|
|
# Full migration with rollback enabled (recommended)
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
|
|
|
|
# Migration with progress tracking
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback --verbose
|
|
|
|
# User-specific migration
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback --user-id "uuid-here"
|
|
```
|
|
|
|
### Step 4: Monitor Progress
|
|
|
|
The migration tool provides real-time progress updates:
|
|
|
|
```
|
|
📊 Migration Progress:
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Documents: 1,247 / 2,500 (49.9%) │
|
|
│ Data Transferred: 2.3 GB / 4.7 GB │
|
|
│ Time Elapsed: 00:15:32 │
|
|
│ ETA: 00:16:12 │
|
|
│ Current: uploading user_documents/report_2024.pdf │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Step 5: Verify Migration
|
|
|
|
After completion, verify the migration was successful:
|
|
|
|
```bash
|
|
# Check migration status
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --status
|
|
|
|
# Verify document count matches
|
|
docker exec readur-app psql -d readur -c "SELECT COUNT(*) FROM documents;"
|
|
|
|
# Test document access through API
|
|
curl -H "Authorization: Bearer YOUR_TOKEN" \
|
|
"https://your-readur-instance.com/api/documents/sample-uuid/download"
|
|
```
|
|
|
|
### Step 6: Update Configuration
|
|
|
|
Update your deployment configuration to use the new storage backend:
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
environment:
|
|
- STORAGE_BACKEND=s3
|
|
- S3_BUCKET_NAME=your-readur-bucket
|
|
- S3_ACCESS_KEY_ID=your-access-key
|
|
- S3_SECRET_ACCESS_KEY=your-secret-key
|
|
- S3_REGION=us-east-1
|
|
```
|
|
|
|
Restart the application to use the new storage configuration.
|
|
|
|
## Advanced Usage
|
|
|
|
### Resuming Interrupted Migrations
|
|
|
|
If a migration is interrupted, you can resume from the saved state:
|
|
|
|
```bash
|
|
# Resume from automatically saved state
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --resume-from /tmp/migration_state.json
|
|
|
|
# Check what migrations are available to resume
|
|
ls /tmp/migration_state_*.json
|
|
```
|
|
|
|
### Rolling Back a Migration
|
|
|
|
If you need to undo a migration:
|
|
|
|
```bash
|
|
# Rollback using saved state file
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --rollback /tmp/migration_state.json
|
|
|
|
# Verify rollback completion
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --rollback-status
|
|
```
|
|
|
|
### Batch Processing Large Datasets
|
|
|
|
For very large document collections:
|
|
|
|
```bash
|
|
# Process in smaller batches
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- \
|
|
--enable-rollback \
|
|
--batch-size 1000 \
|
|
--parallel-uploads 5
|
|
```
|
|
|
|
## Migration Scenarios
|
|
|
|
### Scenario 1: Local to S3 (Most Common)
|
|
|
|
```bash
|
|
# 1. Configure S3 credentials
|
|
export S3_BUCKET_NAME="company-readur-docs"
|
|
export S3_ACCESS_KEY_ID="AKIA..."
|
|
export S3_SECRET_ACCESS_KEY="..."
|
|
|
|
# 2. Test the migration
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
|
|
|
|
# 3. Run migration with safety features
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
|
|
|
|
# 4. Update docker-compose.yml to use S3
|
|
# 5. Restart application
|
|
```
|
|
|
|
### Scenario 2: S3 to Different S3 Bucket
|
|
|
|
```bash
|
|
# 1. Configure new bucket credentials
|
|
export S3_BUCKET_NAME="new-bucket-name"
|
|
|
|
# 2. Migrate to new bucket
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
|
|
|
|
# 3. Update configuration
|
|
```
|
|
|
|
### Scenario 3: Migrating Specific Users
|
|
|
|
```bash
|
|
# Get user IDs that need migration
|
|
docker exec readur-app psql -d readur -c \
|
|
"SELECT id, email FROM users WHERE created_at > '2024-01-01';"
|
|
|
|
# Migrate each user individually
|
|
for user_id in $user_ids; do
|
|
docker exec readur-app cargo run --bin migrate_to_s3 -- \
|
|
--enable-rollback --user-id "$user_id"
|
|
done
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Optimization Tips
|
|
|
|
1. **Network Bandwidth**: Migration speed depends on upload bandwidth to S3
|
|
2. **Parallel Processing**: The tool automatically optimizes concurrent uploads
|
|
3. **Large Files**: Files over 100MB use multipart uploads for better performance
|
|
4. **Memory Usage**: Migration is designed to use minimal memory regardless of file sizes
|
|
|
|
### Expected Performance
|
|
|
|
| Document Count | Typical Time | Network Impact |
|
|
|---------------|--------------|----------------|
|
|
| < 1,000 | 5-15 minutes | Low |
|
|
| 1,000-10,000 | 30-90 minutes| Medium |
|
|
| 10,000+ | 2-8 hours | High |
|
|
|
|
## Security Considerations
|
|
|
|
### Data Protection
|
|
- All transfers use HTTPS/TLS encryption
|
|
- Original files remain until migration is verified
|
|
- Database transactions ensure consistency
|
|
- Rollback preserves original state
|
|
|
|
### Access Control
|
|
- Migration tool respects existing file permissions
|
|
- S3 bucket policies should match security requirements
|
|
- Consider enabling S3 server-side encryption
|
|
|
|
### Audit Trail
|
|
- All migration operations are logged
|
|
- State files contain complete operation history
|
|
- Failed operations are tracked for debugging
|
|
|
|
## Next Steps
|
|
|
|
After successful migration:
|
|
|
|
1. **Monitor the application** for any storage-related issues
|
|
2. **Update backup procedures** to include S3 data
|
|
3. **Configure S3 lifecycle policies** for cost optimization
|
|
4. **Set up monitoring** for S3 usage and costs
|
|
5. **Clean up local files** once confident in migration success
|
|
|
|
## Support
|
|
|
|
If you encounter issues during migration:
|
|
|
|
1. Check the [troubleshooting guide](./migration-troubleshooting.md)
|
|
2. Review application logs for detailed error messages
|
|
3. Use the `--verbose` flag for detailed migration output
|
|
4. Keep state files for support debugging
|
|
|
|
Remember: **Always test migrations in a staging environment first** when possible. |