# S3 Storage Backend Guide for Readur ## Overview Starting with version 2.5.4, Readur supports Amazon S3 and S3-compatible storage services as an alternative to local filesystem storage. This implementation provides full support for AWS S3, MinIO, Wasabi, Backblaze B2, and other S3-compatible services with automatic multipart upload for files larger than 100MB, structured storage paths with year/month organization, and automatic retry mechanisms with exponential backoff. This guide provides comprehensive instructions for configuring, deploying, and managing Readur with S3 storage. ### Key Benefits - **Scalability**: Unlimited storage capacity without local disk constraints - **Durability**: 99.999999999% (11 9's) durability with AWS S3 - **Cost-Effective**: Pay only for what you use with various storage tiers - **Global Access**: Access documents from anywhere with proper credentials - **Backup**: Built-in versioning and cross-region replication capabilities ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Configuration](#configuration) 3. [Migration from Local Storage](#migration-from-local-storage) 4. [Storage Structure](#storage-structure) 5. [Performance Optimization](#performance-optimization) 6. [Troubleshooting](#troubleshooting) 7. [Best Practices](#best-practices) ## Prerequisites Before configuring S3 storage, ensure you have: 1. **S3 Bucket Access** - An AWS S3 bucket or S3-compatible service (MinIO, Wasabi, Backblaze B2, etc.) - Access Key ID and Secret Access Key with appropriate permissions - Bucket name and region information 2. **Required S3 Permissions** ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket", "s3:HeadObject", "s3:HeadBucket", "s3:AbortMultipartUpload", "s3:CreateMultipartUpload", "s3:UploadPart", "s3:CompleteMultipartUpload" ], "Resource": [ "arn:aws:s3:::your-bucket-name/*", "arn:aws:s3:::your-bucket-name" ] } ] } ``` 3. **Readur Build Requirements** - Readur must be compiled with the `s3` feature flag enabled - Build command: `cargo build --release --features s3` ## Configuration ### Environment Variables Configure S3 storage by setting the following environment variables: ```bash # Enable S3 storage backend S3_ENABLED=true # Required S3 credentials S3_BUCKET_NAME=readur-documents S3_ACCESS_KEY_ID=your-access-key-id S3_SECRET_ACCESS_KEY=your-secret-access-key S3_REGION=us-east-1 # Optional: For S3-compatible services (MinIO, Wasabi, etc.) S3_ENDPOINT=https://s3-compatible-endpoint.com ``` ### Configuration File Example (.env) ```bash # Database Configuration DATABASE_URL=postgresql://readur:password@localhost/readur # Server Configuration SERVER_ADDRESS=0.0.0.0:8000 JWT_SECRET=your-secure-jwt-secret # S3 Storage Configuration S3_ENABLED=true S3_BUCKET_NAME=readur-production S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY S3_REGION=us-west-2 # Optional S3 endpoint for compatible services # S3_ENDPOINT=https://minio.example.com # Upload Configuration UPLOAD_PATH=./temp_uploads MAX_FILE_SIZE_MB=500 ``` ### S3-Compatible Services Configuration #### MinIO ```bash S3_ENABLED=true S3_BUCKET_NAME=readur-bucket S3_ACCESS_KEY_ID=minioadmin S3_SECRET_ACCESS_KEY=minioadmin S3_REGION=us-east-1 S3_ENDPOINT=http://localhost:9000 ``` #### Wasabi ```bash S3_ENABLED=true S3_BUCKET_NAME=readur-bucket S3_ACCESS_KEY_ID=your-wasabi-key S3_SECRET_ACCESS_KEY=your-wasabi-secret S3_REGION=us-east-1 S3_ENDPOINT=https://s3.wasabisys.com ``` #### Backblaze B2 ```bash S3_ENABLED=true S3_BUCKET_NAME=readur-bucket S3_ACCESS_KEY_ID=your-b2-key-id S3_SECRET_ACCESS_KEY=your-b2-application-key S3_REGION=us-west-002 S3_ENDPOINT=https://s3.us-west-002.backblazeb2.com ``` ## Migration from Local Storage ### Using the Migration Tool Readur includes a migration utility to transfer existing local files to S3: 1. **Prepare for Migration** ```bash # Backup your database first pg_dump readur > readur_backup.sql # Set S3 configuration export S3_ENABLED=true export S3_BUCKET_NAME=readur-production export S3_ACCESS_KEY_ID=your-key export S3_SECRET_ACCESS_KEY=your-secret export S3_REGION=us-east-1 ``` 2. **Run Dry Run First** ```bash # Preview what will be migrated cargo run --bin migrate_to_s3 --features s3 -- --dry-run ``` 3. **Execute Migration** ```bash # Migrate all files cargo run --bin migrate_to_s3 --features s3 # Migrate with options cargo run --bin migrate_to_s3 --features s3 -- \ --delete-local \ # Delete local files after successful upload --limit 100 \ # Limit to 100 files (for testing) --enable-rollback # Enable automatic rollback on failure ``` 4. **Migrate Specific User's Files** ```bash cargo run --bin migrate_to_s3 --features s3 -- \ --user-id 550e8400-e29b-41d4-a716-446655440000 ``` 5. **Resume Failed Migration** ```bash # Resume from specific document ID cargo run --bin migrate_to_s3 --features s3 -- \ --resume-from 550e8400-e29b-41d4-a716-446655440001 ``` ### Migration Process Details The migration tool performs the following steps: 1. Connects to database and S3 2. Identifies all documents with local file paths 3. For each document: - Reads the local file - Uploads to S3 with structured path - Updates database with S3 path - Migrates associated thumbnails and processed images - Optionally deletes local files 4. Tracks migration state for recovery 5. Supports rollback on failure ### Post-Migration Verification ```sql -- Check migrated documents SELECT COUNT(*) FILTER (WHERE file_path LIKE 's3://%') as s3_documents, COUNT(*) FILTER (WHERE file_path NOT LIKE 's3://%') as local_documents FROM documents; -- Find any remaining local files SELECT id, filename, file_path FROM documents WHERE file_path NOT LIKE 's3://%' LIMIT 10; ``` ## Storage Structure ### S3 Path Organization Readur uses a structured path format in S3: ``` bucket-name/ ├── documents/ │ └── {user_id}/ │ └── {year}/ │ └── {month}/ │ └── {document_id}.{extension} ├── thumbnails/ │ └── {user_id}/ │ └── {document_id}_thumb.jpg └── processed_images/ └── {user_id}/ └── {document_id}_processed.png ``` ### Example Paths ``` readur-production/ ├── documents/ │ └── 550e8400-e29b-41d4-a716-446655440000/ │ └── 2024/ │ └── 03/ │ ├── 123e4567-e89b-12d3-a456-426614174000.pdf │ └── 987fcdeb-51a2-43f1-b321-123456789abc.docx ├── thumbnails/ │ └── 550e8400-e29b-41d4-a716-446655440000/ │ ├── 123e4567-e89b-12d3-a456-426614174000_thumb.jpg │ └── 987fcdeb-51a2-43f1-b321-123456789abc_thumb.jpg └── processed_images/ └── 550e8400-e29b-41d4-a716-446655440000/ ├── 123e4567-e89b-12d3-a456-426614174000_processed.png └── 987fcdeb-51a2-43f1-b321-123456789abc_processed.png ``` ## Performance Optimization ### Multipart Upload Readur automatically uses multipart upload for files larger than 100MB: - **Chunk Size**: 16MB per part - **Automatic Retry**: Exponential backoff with up to 3 retries - **Progress Tracking**: Real-time upload progress via WebSocket ### Network Optimization 1. **Region Selection**: Choose S3 region closest to your Readur server 2. **Transfer Acceleration**: Enable S3 Transfer Acceleration for global users 3. **CloudFront CDN**: Use CloudFront for serving frequently accessed documents ### Caching Strategy ```nginx # Nginx caching configuration for S3-backed documents location /api/documents/ { proxy_cache_valid 200 1h; proxy_cache_valid 404 1m; proxy_cache_bypass $http_authorization; add_header X-Cache-Status $upstream_cache_status; } ``` ## Troubleshooting ### Common Issues and Solutions #### 1. S3 Connection Errors **Error**: "Failed to access S3 bucket" **Solution**: ```bash # Verify credentials aws s3 ls s3://your-bucket-name --profile readur # Check IAM permissions aws iam get-user-policy --user-name readur-user --policy-name ReadurS3Policy # Test connectivity curl -I https://s3.amazonaws.com/your-bucket-name ``` #### 2. Upload Failures **Error**: "Failed to store file: RequestTimeout" **Solution**: - Check network connectivity - Verify S3 endpoint configuration - Increase timeout values if using S3-compatible service - Monitor S3 request metrics in AWS CloudWatch #### 3. Permission Denied **Error**: "AccessDenied: Access Denied" **Solution**: ```bash # Verify bucket policy aws s3api get-bucket-policy --bucket your-bucket-name # Check object ACLs aws s3api get-object-acl --bucket your-bucket-name --key test-object # Ensure CORS configuration for web access aws s3api put-bucket-cors --bucket your-bucket-name --cors-configuration file://cors.json ``` #### 4. Migration Stuck **Problem**: Migration process hangs or fails repeatedly **Solution**: ```bash # Check migration state cat migration_state.json | jq '.failed_migrations' # Resume from last successful migration LAST_SUCCESS=$(cat migration_state.json | jq -r '.completed_migrations[-1].document_id') cargo run --bin migrate_to_s3 --features s3 -- --resume-from $LAST_SUCCESS # Force rollback if needed cargo run --bin migrate_to_s3 --features s3 -- --rollback ``` ### Debugging S3 Operations Enable detailed S3 logging: ```bash # Set environment variables for debugging export RUST_LOG=readur=debug,aws_sdk_s3=debug export AWS_SDK_LOAD_CONFIG=true # Run Readur with debug logging cargo run --features s3 ``` ### Performance Monitoring Monitor S3 performance metrics: ```sql -- Query document upload times SELECT DATE(created_at) as upload_date, AVG(file_size / 1024.0 / 1024.0) as avg_size_mb, COUNT(*) as documents_uploaded, AVG(EXTRACT(EPOCH FROM (updated_at - created_at))) as avg_processing_time_seconds FROM documents WHERE file_path LIKE 's3://%' GROUP BY DATE(created_at) ORDER BY upload_date DESC; ``` ## Best Practices ### 1. Security - **Encryption**: Enable S3 server-side encryption (SSE-S3 or SSE-KMS) - **Access Control**: Use IAM roles instead of access keys when possible - **Bucket Policies**: Implement least-privilege bucket policies - **VPC Endpoints**: Use VPC endpoints for private S3 access ```bash # Enable default encryption on bucket aws s3api put-bucket-encryption \ --bucket readur-production \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "AES256" } }] }' ``` ### 2. Cost Optimization - **Lifecycle Policies**: Archive old documents to Glacier - **Intelligent-Tiering**: Enable for automatic cost optimization - **Request Metrics**: Monitor and optimize S3 request patterns ```json { "Rules": [{ "Id": "ArchiveOldDocuments", "Status": "Enabled", "Transitions": [{ "Days": 90, "StorageClass": "GLACIER" }], "NoncurrentVersionTransitions": [{ "NoncurrentDays": 30, "StorageClass": "GLACIER" }] }] } ``` ### 3. Reliability - **Versioning**: Enable S3 versioning for document recovery - **Cross-Region Replication**: Set up for disaster recovery - **Backup Strategy**: Regular backups to separate bucket or region ```bash # Enable versioning aws s3api put-bucket-versioning \ --bucket readur-production \ --versioning-configuration Status=Enabled # Set up replication aws s3api put-bucket-replication \ --bucket readur-production \ --replication-configuration file://replication.json ``` ### 4. Monitoring Set up CloudWatch alarms for: - High error rates - Unusual request patterns - Storage quota approaching - Failed multipart uploads ```bash # Create CloudWatch alarm for S3 errors aws cloudwatch put-metric-alarm \ --alarm-name readur-s3-errors \ --alarm-description "Alert on S3 4xx errors" \ --metric-name 4xxErrors \ --namespace AWS/S3 \ --statistic Sum \ --period 300 \ --threshold 10 \ --comparison-operator GreaterThanThreshold ``` ### 5. Compliance - **Data Residency**: Ensure S3 region meets data residency requirements - **Audit Logging**: Enable S3 access logging and AWS CloudTrail - **Retention Policies**: Implement compliant data retention policies - **GDPR Compliance**: Implement proper data deletion procedures ```bash # Enable access logging aws s3api put-bucket-logging \ --bucket readur-production \ --bucket-logging-status '{ "LoggingEnabled": { "TargetBucket": "readur-logs", "TargetPrefix": "s3-access/" } }' ``` ## Next Steps - Review the [Configuration Reference](./configuration-reference.md) for all S3 options - Explore [S3 Troubleshooting Guide](./s3-troubleshooting.md) for common issues and solutions - Check [Migration Guide](./migration-guide.md) for moving from local to S3 storage - Read [Deployment Guide](./deployment.md) for production deployment best practices