Readur/README.md

456 lines
12 KiB
Markdown

# Readur 📄
A powerful, modern document management system built with Rust and React. Readur provides intelligent document processing with OCR capabilities, full-text search, and a beautiful web interface designed for 2026 tech standards.
## ✨ Features
- 🔐 **Secure Authentication**: JWT-based user authentication with bcrypt password hashing
- 📤 **Smart File Upload**: Drag-and-drop support for PDF, images, text files, and Office documents
- 🔍 **Advanced OCR**: Automatic text extraction using Tesseract for searchable document content
- 🔎 **Powerful Search**: PostgreSQL full-text search with advanced filtering and ranking
- 👁️ **Folder Monitoring**: Non-destructive file watching (unlike paperless-ngx, doesn't consume source files)
- 🎨 **Modern UI**: Beautiful React frontend with Material-UI components and responsive design
- 🐳 **Docker Ready**: Complete containerization with production-ready multi-stage builds
-**High Performance**: Rust backend for speed and reliability
- 📊 **Analytics Dashboard**: Document statistics and processing status overview
## 🚀 Quick Start
### Using Docker Compose (Recommended)
The fastest way to get Readur running:
```bash
# Clone the repository
git clone <repository-url>
cd readur
# Start all services
docker compose up --build
# Access the application
open http://localhost:8000
```
**Default login credentials:**
- Username: `admin`
- Password: `admin123`
> ⚠️ **Important**: Change the default admin password immediately after first login!
### What You Get
After deployment, you'll have:
- **Web Interface**: Modern document management UI at `http://localhost:8000`
- **PostgreSQL Database**: Document metadata and full-text search indexes
- **File Storage**: Persistent document storage with OCR processing
- **REST API**: Full API access for integrations
## 🏗️ Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ React Frontend │────│ Rust Backend │────│ PostgreSQL DB │
│ (Port 8000) │ │ (Axum API) │ │ (Port 5433) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ┌─────────────────┐ │
└──────────────│ File Storage │─────────────┘
│ + OCR Engine │
└─────────────────┘
```
## 📋 System Requirements
### Minimum Requirements
- **CPU**: 2 cores
- **RAM**: 2GB
- **Storage**: 10GB free space
- **OS**: Linux, macOS, or Windows with Docker
### Recommended for Production
- **CPU**: 4+ cores
- **RAM**: 4GB+
- **Storage**: 50GB+ SSD
- **Network**: Stable internet connection for OCR processing
## 🛠️ Manual Installation
For development or custom deployments without Docker:
### Prerequisites
Install these dependencies on your system:
```bash
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y \
tesseract-ocr tesseract-ocr-eng \
libtesseract-dev libleptonica-dev \
postgresql postgresql-contrib \
pkg-config libclang-dev
# macOS (requires Homebrew)
brew install tesseract leptonica postgresql rust nodejs npm
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
### Backend Setup
1. **Configure Database**:
```bash
# Create database and user
sudo -u postgres psql
CREATE DATABASE readur;
CREATE USER readur_user WITH ENCRYPTED PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE readur TO readur_user;
\q
```
2. **Environment Configuration**:
```bash
# Copy environment template
cp .env.example .env
# Edit configuration
nano .env
```
Required environment variables:
```env
DATABASE_URL=postgresql://readur_user:your_password@localhost/readur
JWT_SECRET=your-super-secret-jwt-key-change-this
SERVER_ADDRESS=0.0.0.0:8000
UPLOAD_PATH=./uploads
WATCH_FOLDER=./watch
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,gif,bmp,tiff,txt,rtf,doc,docx
```
3. **Build and Run Backend**:
```bash
# Install dependencies and run
cargo build --release
cargo run
```
### Frontend Setup
1. **Install Dependencies**:
```bash
cd frontend
npm install
```
2. **Development Mode**:
```bash
npm run dev
# Frontend available at http://localhost:5173
```
3. **Production Build**:
```bash
npm run build
# Built files in frontend/dist/
```
## 📖 User Guide
### Getting Started
1. **First Login**: Use the default admin credentials to access the system
2. **Upload Documents**: Drag and drop files or use the upload button
3. **Wait for Processing**: OCR processing happens automatically in the background
4. **Search and Organize**: Use the powerful search features to find your documents
### Supported File Types
| Type | Extensions | OCR Support | Notes |
|------|-----------|-------------|-------|
| **PDF** | `.pdf` | ✅ | Text extraction + OCR for scanned pages |
| **Images** | `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | ✅ | Full OCR text extraction |
| **Text** | `.txt`, `.rtf` | ❌ | Direct text indexing |
| **Office** | `.doc`, `.docx` | ⚠️ | Limited support |
### Using the Interface
#### Dashboard
- **Document Statistics**: Total documents, storage usage, OCR status
- **Recent Activity**: Latest uploads and processing status
- **Quick Actions**: Fast access to upload and search
#### Document Management
- **List/Grid View**: Toggle between different viewing modes
- **Sorting**: Sort by date, name, size, or file type
- **Filtering**: Filter by tags, file types, and OCR status
- **Bulk Actions**: Select multiple documents for batch operations
#### Advanced Search
- **Full-text Search**: Search within document content
- **Metadata Filters**: Filter by upload date, file size, type
- **Tag System**: Organize documents with custom tags
- **OCR Status**: Find processed vs. pending documents
#### Folder Watching
- **Non-destructive**: Unlike paperless-ngx, source files remain untouched
- **Automatic Processing**: New files are detected and processed automatically
- **Configurable**: Set custom watch directories
### Tips for Best Results
1. **OCR Quality**: Higher resolution images (300+ DPI) produce better OCR results
2. **File Organization**: Use consistent naming conventions for easier searching
3. **Regular Backups**: Backup both database and file storage regularly
4. **Performance**: For large document collections, consider increasing server resources
## 🔧 Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `DATABASE_URL` | - | PostgreSQL connection string |
| `JWT_SECRET` | - | Secret key for JWT tokens (required) |
| `SERVER_ADDRESS` | `0.0.0.0:8000` | Server bind address |
| `UPLOAD_PATH` | `./uploads` | Document storage directory |
| `WATCH_FOLDER` | `./watch` | Folder monitoring directory |
| `ALLOWED_FILE_TYPES` | `pdf,png,jpg,jpeg,txt,doc,docx` | Allowed file extensions |
### Docker Configuration
Customize `docker-compose.yml` for your environment:
```yaml
services:
readur:
environment:
- JWT_SECRET=change-this-secret-key
- UPLOAD_PATH=/app/uploads
volumes:
- ./data/uploads:/app/uploads
- ./data/watch:/app/watch
ports:
- "8000:8000"
```
### Database Tuning
For better search performance with large document collections:
```sql
-- Increase shared_buffers for better caching
ALTER SYSTEM SET shared_buffers = '256MB';
-- Optimize for full-text search
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
-- Restart PostgreSQL after changes
```
## 🔌 API Reference
### Authentication Endpoints
```bash
# Register new user
POST /api/auth/register
Content-Type: application/json
{
"username": "john_doe",
"email": "john@example.com",
"password": "secure_password"
}
# Login
POST /api/auth/login
Content-Type: application/json
{
"username": "john_doe",
"password": "secure_password"
}
# Get current user
GET /api/auth/me
Authorization: Bearer <jwt_token>
```
### Document Management
```bash
# Upload document
POST /api/documents
Authorization: Bearer <jwt_token>
Content-Type: multipart/form-data
file: <binary_file_data>
# List documents
GET /api/documents?limit=50&offset=0
Authorization: Bearer <jwt_token>
# Download document
GET /api/documents/{id}/download
Authorization: Bearer <jwt_token>
```
### Search
```bash
# Search documents
GET /api/search?query=contract&limit=20
Authorization: Bearer <jwt_token>
# Advanced search with filters
GET /api/search?query=invoice&mime_types=application/pdf&tags=important
Authorization: Bearer <jwt_token>
```
## 🧪 Testing
### Run All Tests
```bash
# Backend tests
cargo test
# Frontend tests
cd frontend && npm test
# Integration tests with Docker
docker compose -f docker-compose.test.yml up --build
```
### Test Coverage
```bash
# Install cargo-tarpaulin for coverage
cargo install cargo-tarpaulin
# Generate coverage report
cargo tarpaulin --out Html
```
## 🔒 Security Considerations
### Production Deployment
1. **Change Default Credentials**: Update admin password immediately
2. **Use Strong JWT Secret**: Generate a secure random key
3. **Enable HTTPS**: Use a reverse proxy with SSL/TLS
4. **Database Security**: Use strong passwords and restrict network access
5. **File Permissions**: Ensure proper file system permissions
6. **Regular Updates**: Keep dependencies and base images updated
### Recommended Production Setup
```bash
# Use environment-specific secrets
JWT_SECRET=$(openssl rand -base64 64)
# Restrict database access
# Only allow connections from application container
# Use read-only file system where possible
# Mount uploads and watch folders as separate volumes
```
## 🚀 Deployment Options
### Docker Swarm
```yaml
version: '3.8'
services:
readur:
image: readur:latest
deploy:
replicas: 2
restart_policy:
condition: on-failure
networks:
- readur-network
secrets:
- jwt_secret
- db_password
```
### Kubernetes
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: readur
spec:
replicas: 3
selector:
matchLabels:
app: readur
template:
spec:
containers:
- name: readur
image: readur:latest
env:
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: readur-secrets
key: jwt-secret
```
### Cloud Platforms
- **AWS**: Use ECS with RDS PostgreSQL
- **Google Cloud**: Deploy to Cloud Run with Cloud SQL
- **Azure**: Use Container Instances with Azure Database
- **DigitalOcean**: App Platform with Managed Database
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Fork and clone the repository
git clone https://github.com/yourusername/readur.git
cd readur
# Create a feature branch
git checkout -b feature/amazing-feature
# Make your changes and test
cargo test
cd frontend && npm test
# Submit a pull request
```
### Code Style
- **Rust**: Follow `rustfmt` and `clippy` recommendations
- **Frontend**: Use Prettier and ESLint configurations
- **Commits**: Use conventional commit format
## 📝 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) for text extraction
- [Axum](https://github.com/tokio-rs/axum) for the web framework
- [Material-UI](https://mui.com/) for the beautiful frontend components
- [PostgreSQL](https://www.postgresql.org/) for robust full-text search
## 📞 Support
- **Documentation**: Check this README and inline code comments
- **Issues**: Report bugs and request features on GitHub Issues
- **Discussions**: Join community discussions on GitHub Discussions
---
**Made with ❤️ and ☕ by the Readur team**