12 KiB

Raw Blame History

Migration Guide: Local Storage to S3

Overview

This guide provides step-by-step instructions for migrating your Readur installation from local filesystem storage to S3 storage. The migration process is designed to be safe, resumable, and reversible.

Pre-Migration Checklist

1. System Requirements

Readur compiled with S3 feature: cargo build --release --features s3
Sufficient disk space for temporary operations (at least 2x largest file)
Network bandwidth for uploading all documents to S3
AWS CLI installed and configured (for verification)

2. S3 Prerequisites

S3 bucket created and accessible
IAM user with appropriate permissions
Access keys generated and tested
Bucket region identified
Encryption settings configured (if required)
Lifecycle policies reviewed

3. Backup Requirements

Database backed up
Local files backed up (optional but recommended)
Configuration files saved
Document count and total size recorded

Migration Process

Step 1: Prepare Environment

1.1 Backup Database

# Create timestamped backup
BACKUP_DATE=$(date +%Y%m%d_%H%M%S)
pg_dump $DATABASE_URL > readur_backup_${BACKUP_DATE}.sql

# Verify backup
pg_restore --list readur_backup_${BACKUP_DATE}.sql | head -20

1.2 Document Current State

-- Record current statistics
SELECT 
    COUNT(*) as total_documents,
    SUM(file_size) / 1024.0 / 1024.0 / 1024.0 as total_size_gb,
    COUNT(DISTINCT user_id) as unique_users
FROM documents;

-- Save document list
\copy (SELECT id, filename, file_path, file_size FROM documents) TO 'documents_pre_migration.csv' CSV HEADER;

1.3 Calculate Migration Time

# Estimate migration duration
TOTAL_SIZE_GB=100  # From query above
UPLOAD_SPEED_MBPS=100  # Your upload speed
ESTIMATED_HOURS=$(echo "scale=2; ($TOTAL_SIZE_GB * 1024 * 8) / ($UPLOAD_SPEED_MBPS * 3600)" | bc)
echo "Estimated migration time: $ESTIMATED_HOURS hours"

Step 2: Configure S3

2.1 Create S3 Bucket

# Create bucket
aws s3api create-bucket \
    --bucket readur-production \
    --region us-east-1 \
    --create-bucket-configuration LocationConstraint=us-east-1

# Enable versioning
aws s3api put-bucket-versioning \
    --bucket readur-production \
    --versioning-configuration Status=Enabled

# Enable encryption
aws s3api put-bucket-encryption \
    --bucket readur-production \
    --server-side-encryption-configuration '{
        "Rules": [{
            "ApplyServerSideEncryptionByDefault": {
                "SSEAlgorithm": "AES256"
            }
        }]
    }'

2.2 Set Up IAM User

# Create policy file
cat > readur-s3-policy.json << 'EOF'
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::readur-production"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:GetObjectVersion",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::readur-production/*"
        }
    ]
}
EOF

# Create IAM user and attach policy
aws iam create-user --user-name readur-s3-user
aws iam put-user-policy \
    --user-name readur-s3-user \
    --policy-name ReadurS3Access \
    --policy-document file://readur-s3-policy.json

# Generate access keys
aws iam create-access-key --user-name readur-s3-user > s3-credentials.json

2.3 Configure Readur for S3

# Add to .env file
cat >> .env << 'EOF'
# S3 Configuration
S3_ENABLED=true
S3_BUCKET_NAME=readur-production
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
S3_REGION=us-east-1
EOF

# Test configuration
source .env
aws s3 ls s3://$S3_BUCKET_NAME --region $S3_REGION

Step 3: Run Migration

3.1 Dry Run

# Preview migration without making changes
cargo run --bin migrate_to_s3 --features s3 -- --dry-run

# Review output
# Expected output:
# 🔍 DRY RUN - Would migrate the following files:
#   - document1.pdf (User: 123e4567..., Size: 2.5 MB)
#   - report.docx (User: 987fcdeb..., Size: 1.2 MB)
# 💡 Run without --dry-run to perform actual migration

3.2 Partial Migration (Testing)

# Migrate only 10 files first
cargo run --bin migrate_to_s3 --features s3 -- --limit 10

# Verify migrated files
aws s3 ls s3://$S3_BUCKET_NAME/documents/ --recursive | head -20

# Check database updates
psql $DATABASE_URL -c "SELECT id, filename, file_path FROM documents WHERE file_path LIKE 's3://%' LIMIT 10;"

3.3 Full Migration

# Run full migration with progress tracking
cargo run --bin migrate_to_s3 --features s3 -- \
    --enable-rollback \
    2>&1 | tee migration_$(date +%Y%m%d_%H%M%S).log

# Monitor progress in another terminal
watch -n 5 'cat migration_state.json | jq "{processed: .processed_files, total: .total_files, failed: .failed_migrations | length}"'

3.4 Migration with Local File Deletion

# Only after verifying successful migration
cargo run --bin migrate_to_s3 --features s3 -- \
    --delete-local \
    --enable-rollback

Step 4: Verify Migration

4.1 Database Verification

-- Check migration completeness
SELECT 
    COUNT(*) FILTER (WHERE file_path LIKE 's3://%') as s3_documents,
    COUNT(*) FILTER (WHERE file_path NOT LIKE 's3://%') as local_documents,
    COUNT(*) as total_documents
FROM documents;

-- Find any failed migrations
SELECT id, filename, file_path 
FROM documents 
WHERE file_path NOT LIKE 's3://%'
ORDER BY created_at DESC
LIMIT 20;

-- Verify path format
SELECT DISTINCT 
    substring(file_path from 1 for 50) as path_prefix,
    COUNT(*) as document_count
FROM documents
GROUP BY path_prefix
ORDER BY document_count DESC;

4.2 S3 Verification

# Count objects in S3
aws s3 ls s3://$S3_BUCKET_NAME/documents/ --recursive --summarize | grep "Total Objects"

# Verify file structure
aws s3 ls s3://$S3_BUCKET_NAME/ --recursive | head -50

# Check specific document
DOCUMENT_ID="123e4567-e89b-12d3-a456-426614174000"
aws s3 ls s3://$S3_BUCKET_NAME/documents/ --recursive | grep $DOCUMENT_ID

4.3 Application Testing

# Restart Readur with S3 configuration
systemctl restart readur

# Test document upload
curl -X POST https://readur.example.com/api/documents \
    -H "Authorization: Bearer $TOKEN" \
    -F "file=@test-document.pdf"

# Test document retrieval
curl -X GET https://readur.example.com/api/documents/$DOCUMENT_ID/download \
    -H "Authorization: Bearer $TOKEN" \
    -o downloaded-test.pdf

# Verify downloaded file
md5sum test-document.pdf downloaded-test.pdf

Step 5: Post-Migration Tasks

5.1 Update Backup Procedures

# Create S3 backup script
cat > backup-s3.sh << 'EOF'
#!/bin/bash
# Backup S3 data to another bucket
BACKUP_BUCKET="readur-backup-$(date +%Y%m%d)"
aws s3api create-bucket --bucket $BACKUP_BUCKET --region us-east-1
aws s3 sync s3://readur-production s3://$BACKUP_BUCKET --storage-class GLACIER
EOF

chmod +x backup-s3.sh

5.2 Set Up Monitoring

# Create CloudWatch dashboard
aws cloudwatch put-dashboard \
    --dashboard-name ReadurS3 \
    --dashboard-body file://cloudwatch-dashboard.json

5.3 Clean Up Local Storage

# After confirming successful migration
# Remove old upload directories (CAREFUL!)
du -sh ./uploads ./thumbnails ./processed_images

# Archive before deletion
tar -czf pre_migration_files_$(date +%Y%m%d).tar.gz ./uploads ./thumbnails ./processed_images

# Remove directories
rm -rf ./uploads/* ./thumbnails/* ./processed_images/*

Rollback Procedures

Automatic Rollback

If migration fails with --enable-rollback:

# Rollback will automatically:
# 1. Restore database paths to original values
# 2. Delete uploaded S3 objects
# 3. Save rollback state to rollback_errors.json

Manual Rollback

Step 1: Restore Database

-- Revert file paths to local
UPDATE documents 
SET file_path = regexp_replace(file_path, '^s3://[^/]+/', './uploads/')
WHERE file_path LIKE 's3://%';

-- Or restore from backup
psql $DATABASE_URL < readur_backup_${BACKUP_DATE}.sql

Step 2: Remove S3 Objects

# Delete all migrated objects
aws s3 rm s3://$S3_BUCKET_NAME/documents/ --recursive
aws s3 rm s3://$S3_BUCKET_NAME/thumbnails/ --recursive
aws s3 rm s3://$S3_BUCKET_NAME/processed_images/ --recursive

Step 3: Restore Configuration

# Disable S3 in configuration
sed -i 's/S3_ENABLED=true/S3_ENABLED=false/' .env

# Restart application
systemctl restart readur

Troubleshooting Migration Issues

Issue: Migration Hangs

# Check current progress
tail -f migration_*.log

# View migration state
cat migration_state.json | jq '.processed_files, .failed_migrations'

# Resume from last successful
LAST_ID=$(cat migration_state.json | jq -r '.completed_migrations[-1].document_id')
cargo run --bin migrate_to_s3 --features s3 -- --resume-from $LAST_ID

Issue: Permission Errors

# Verify IAM permissions
aws s3api put-object \
    --bucket $S3_BUCKET_NAME \
    --key test.txt \
    --body /tmp/test.txt

# Check bucket policy
aws s3api get-bucket-policy --bucket $S3_BUCKET_NAME

Issue: Network Timeouts

# Use screen/tmux for long migrations
screen -S migration
cargo run --bin migrate_to_s3 --features s3

# Detach: Ctrl+A, D
# Reattach: screen -r migration

Migration Optimization

Parallel Upload

# Split migration by user
for USER_ID in $(psql $DATABASE_URL -t -c "SELECT DISTINCT user_id FROM documents"); do
    cargo run --bin migrate_to_s3 --features s3 -- --user-id $USER_ID &
done

Bandwidth Management

# Limit upload bandwidth (if needed)
trickle -u 10240 cargo run --bin migrate_to_s3 --features s3

Progress Monitoring

# Real-time statistics
watch -n 10 'echo "=== Migration Progress ===" && \
    cat migration_state.json | jq "{
        progress_pct: ((.processed_files / .total_files) * 100),
        processed: .processed_files,
        total: .total_files,
        failed: .failed_migrations | length,
        elapsed: now - (.started_at | fromdate),
        rate_per_hour: (.processed_files / ((now - (.started_at | fromdate)) / 3600))
    }"'

Post-Migration Validation

Data Integrity Check

# Generate checksums for S3 objects
aws s3api list-objects-v2 --bucket $S3_BUCKET_NAME --prefix documents/ \
    --query 'Contents[].{Key:Key, ETag:ETag}' \
    --output json > s3_checksums.json

# Compare with database
psql $DATABASE_URL -c "SELECT id, file_path, file_hash FROM documents" > db_checksums.txt

Performance Testing

# Benchmark S3 retrieval
time for i in {1..100}; do
    curl -s https://readur.example.com/api/documents/random/download > /dev/null
done

Success Criteria

Migration is considered successful when:

All documents have S3 paths in database
No failed migrations in migration_state.json
Application can upload new documents to S3
Application can retrieve existing documents from S3
Thumbnails and processed images are accessible
Performance meets acceptable thresholds
Backup procedures are updated and tested

Next Steps

Monitor S3 costs and usage
Implement CloudFront CDN if needed
Set up cross-region replication for disaster recovery
Configure S3 lifecycle policies for cost optimization
Update documentation and runbooks

12 KiB Raw Blame History