Readur/docs/self-hosting/storage.md

424 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Storage Configuration Guide
## Overview
Readur supports multiple storage backends for document management. Choose the backend that best fits your infrastructure and scaling needs.
## Storage Backends
### Local Storage
Best for single-server deployments and small installations.
#### Configuration
```bash
# In .env file
STORAGE_BACKEND=local
LOCAL_STORAGE_PATH=/data/readur/documents
LOCAL_STORAGE_MAX_SIZE_GB=500 # Optional: limit storage usage
```
#### Directory Structure
```
/data/readur/documents/
├── users/
│ ├── user1/
│ │ ├── uploads/
│ │ └── processed/
│ └── user2/
├── temp/
└── cache/
```
#### Permissions
Set proper ownership and permissions:
```bash
# Create directory structure
sudo mkdir -p /data/readur/documents
sudo chown -R readur:readur /data/readur/documents
sudo chmod 750 /data/readur/documents
```
#### Backup Considerations
- Use filesystem snapshots (LVM, ZFS, Btrfs)
- Rsync to backup location
- Exclude temp/ and cache/ directories
### S3-Compatible Storage
Recommended for production deployments requiring scalability.
#### AWS S3 Configuration
```bash
# In .env file
STORAGE_BACKEND=s3
S3_BUCKET=readur-documents
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Optional settings
S3_ENDPOINT= # Leave empty for AWS
S3_USE_SSL=true
S3_VERIFY_SSL=true
S3_SIGNATURE_VERSION=s3v4
```
#### MinIO Configuration
For self-hosted S3-compatible storage:
```bash
STORAGE_BACKEND=s3
S3_BUCKET=readur
S3_ENDPOINT=https://minio.company.com:9000
S3_ACCESS_KEY_ID=minioadmin
S3_SECRET_ACCESS_KEY=minioadmin123
S3_USE_SSL=true
S3_VERIFY_SSL=false # For self-signed certificates
S3_REGION=us-east-1 # MinIO default
```
#### Bucket Setup
Create and configure your S3 bucket:
```bash
# AWS CLI
aws s3api create-bucket --bucket readur-documents --region us-east-1
# Set lifecycle policy for temp files
aws s3api put-bucket-lifecycle-configuration \
--bucket readur-documents \
--lifecycle-configuration file://lifecycle.json
```
Lifecycle policy (lifecycle.json):
```json
{
"Rules": [
{
"Id": "DeleteTempFiles",
"Status": "Enabled",
"Prefix": "temp/",
"Expiration": {
"Days": 7
}
}
]
}
```
#### IAM Permissions
Minimum required S3 permissions:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::readur-documents",
"arn:aws:s3:::readur-documents/*"
]
}
]
}
```
### WebDAV Storage
For integration with existing WebDAV servers.
#### Configuration
```bash
STORAGE_BACKEND=webdav
WEBDAV_URL=https://webdav.company.com/readur
WEBDAV_USERNAME=readur_user
WEBDAV_PASSWORD=secure_password
WEBDAV_VERIFY_SSL=true
```
#### Nextcloud Integration
```bash
STORAGE_BACKEND=webdav
WEBDAV_URL=https://nextcloud.company.com/remote.php/dav/files/readur/
WEBDAV_USERNAME=readur
WEBDAV_PASSWORD=app-password-here
```
## Storage Migration
### Migrating Between Backends
Use the built-in Rust migration tool:
```bash
# Migrate from local to S3
docker-compose exec readur cargo run --bin migrate_to_s3 -- \
--batch-size 100 \
--enable-rollback \
--verbose
# Or using the compiled binary in production
docker-compose exec readur /app/migrate_to_s3 \
--batch-size 100 \
--enable-rollback
```
### Progressive Migration
Migrate in stages to minimize downtime:
```bash
# Stage 1: Test migration with dry run
docker-compose exec readur cargo run --bin migrate_to_s3 -- --dry-run
# Stage 2: Migrate specific user's documents
docker-compose exec readur cargo run --bin migrate_to_s3 -- \
--user-id "user-uuid" \
--enable-rollback
# Stage 3: Full migration with rollback capability
docker-compose exec readur cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 500 \
--parallel-uploads 5
# Stage 4: Update configuration
# Update .env: STORAGE_BACKEND=s3
docker-compose restart readur
```
## Performance Optimization
### Local Storage
#### Filesystem Choice
- **ext4**: Good general performance
- **XFS**: Better for large files
- **ZFS**: Advanced features, snapshots, compression
- **Btrfs**: Copy-on-write, snapshots
#### Mount Options
```bash
# /etc/fstab optimization
/dev/sdb1 /data/readur ext4 defaults,noatime,nodiratime 0 2
```
#### RAID Configuration
For better performance and redundancy:
```bash
# RAID 10 for balanced performance/redundancy
mdadm --create /dev/md0 --level=10 --raid-devices=4 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
```
### S3 Storage
#### Connection Pooling
```bash
# Optimize S3 connections
S3_MAX_CONNECTIONS=100
S3_CONNECTION_TIMEOUT=30
S3_READ_TIMEOUT=300
```
#### Transfer Acceleration
```bash
# Enable for AWS S3
S3_USE_ACCELERATE_ENDPOINT=true
S3_MULTIPART_THRESHOLD=64MB
S3_MULTIPART_CHUNKSIZE=16MB
```
#### CDN Integration
Use CloudFront for read-heavy workloads:
```bash
# Serve documents through CDN
CDN_ENABLED=true
CDN_URL=https://d1234567890.cloudfront.net
CDN_PRIVATE_KEY=/etc/readur/cloudfront-key.pem
```
## Storage Monitoring
### Disk Usage Monitoring
```bash
# Check local storage usage
df -h /data/readur/documents
# Monitor growth rate
du -sh /data/readur/documents/* | sort -rh
# Set up alerts
cat > /etc/cron.d/storage-alert << EOF
0 * * * * root df /data/readur | awk '\$5+0 > 80' && mail -s "Storage Alert" admin@company.com
EOF
```
### S3 Metrics
Monitor S3 usage and costs:
```bash
# Get bucket size
aws s3 ls s3://readur-documents --recursive --summarize | grep "Total Size"
# CloudWatch metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/S3 \
--metric-name BucketSizeBytes \
--dimensions Name=BucketName,Value=readur-documents \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-31T23:59:59Z \
--period 86400 \
--statistics Average
```
## Troubleshooting
### Local Storage Issues
#### Disk Full
```bash
# Find large files
find /data/readur -type f -size +100M -exec ls -lh {} \;
# Clean temporary files
find /data/readur/documents/temp -mtime +7 -delete
# Check for orphaned files
# This is typically done via database queries or custom scripts
docker-compose exec readur psql -U readur -d readur -c \
"SELECT * FROM documents WHERE file_path NOT IN (SELECT path FROM files)"
```
#### Permission Errors
```bash
# Fix ownership
sudo chown -R readur:readur /data/readur/documents
# Fix permissions
find /data/readur/documents -type d -exec chmod 755 {} \;
find /data/readur/documents -type f -exec chmod 644 {} \;
```
### S3 Issues
#### Connection Errors
```bash
# Test S3 connectivity
aws s3 ls s3://readur-documents --debug
# Check credentials
aws sts get-caller-identity
# Verify S3 environment variables are set correctly
# Should use S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY
docker-compose exec readur env | grep -E '^S3_' | sed 's/=.*/=***/'
# Verify bucket policy
aws s3api get-bucket-policy --bucket readur-documents
```
#### Slow Uploads
```bash
# Increase multipart settings
S3_MULTIPART_THRESHOLD=32MB
S3_MULTIPART_CHUNKSIZE=8MB
S3_MAX_CONCURRENCY=10
# Enable transfer acceleration (AWS only)
aws s3api put-bucket-accelerate-configuration \
--bucket readur-documents \
--accelerate-configuration Status=Enabled
```
## Best Practices
### Security
1. **Encryption at rest**:
- Local: Use encrypted filesystems (LUKS)
- S3: Enable SSE-S3 or SSE-KMS
2. **Encryption in transit**:
- Always use HTTPS/TLS
- Verify SSL certificates
3. **Access control**:
- Principle of least privilege
- Regular credential rotation
- IP whitelisting where possible
### Backup Strategy
1. **3-2-1 Rule**:
- 3 copies of data
- 2 different storage types
- 1 offsite backup
2. **Testing**:
- Regular restore tests
- Document recovery procedures
- Monitor backup completion
3. **Retention**:
- Define retention policies
- Automate old backup cleanup
- Comply with regulations
### Capacity Planning
1. **Growth estimation**:
```
Daily documents × Average size × Retention days = Required storage
```
2. **Buffer space**:
- Keep 20% free space minimum
- Monitor growth trends
- Plan upgrades proactively
3. **Cost optimization**:
- Use lifecycle policies
- Archive old documents
- Compress where appropriate
## Related Documentation
- [S3 Storage Guide](../s3-storage-guide.md)
- [Storage Migration](../administration/storage-migration.md)
- [Backup Strategies](./backup.md)
- [Performance Tuning](./performance.md)