Quick, painless, intuitive OCR platform written in Rust and TypeScript. Modern UI with modern API, with an emphasis on intuitive user experience.
Go to file
perf3ct 5d5a586669 feat(readme): update readme 2025-06-12 02:25:22 +00:00
.github/workflows fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
frontend feat(client): do a better details page 2025-06-12 01:50:10 +00:00
src fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
test_data fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
.env.example fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
.gitignore feat(everything): wow it works 2025-06-12 01:15:47 +00:00
Cargo.toml fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
Dockerfile fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
Dockerfile.test fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
README.md feat(readme): update readme 2025-06-12 02:25:22 +00:00
docker-compose.test.yml fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
docker-compose.yml fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
run_tests.sh fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00
test.sh fix(everything): wow, it runs 2025-06-12 00:05:43 +00:00

README.md

Readur 📄

A powerful, modern document management system built with Rust and React. Readur provides intelligent document processing with OCR capabilities, full-text search, and a beautiful web interface designed for 2026 tech standards.

Features

  • 🔐 Secure Authentication: JWT-based user authentication with bcrypt password hashing
  • 📤 Smart File Upload: Drag-and-drop support for PDF, images, text files, and Office documents
  • 🔍 Advanced OCR: Automatic text extraction using Tesseract for searchable document content
  • 🔎 Powerful Search: PostgreSQL full-text search with advanced filtering and ranking
  • 👁️ Folder Monitoring: Non-destructive file watching (unlike paperless-ngx, doesn't consume source files)
  • 🎨 Modern UI: Beautiful React frontend with Material-UI components and responsive design
  • 🐳 Docker Ready: Complete containerization with production-ready multi-stage builds
  • High Performance: Rust backend for speed and reliability
  • 📊 Analytics Dashboard: Document statistics and processing status overview

🚀 Quick Start

The fastest way to get Readur running:

# Clone the repository
git clone <repository-url>
cd readur

# Start all services
docker compose up --build

# Access the application
open http://localhost:8000

Default login credentials:

  • Username: admin
  • Password: admin123

⚠️ Important: Change the default admin password immediately after first login!

What You Get

After deployment, you'll have:

  • Web Interface: Modern document management UI at http://localhost:8000
  • PostgreSQL Database: Document metadata and full-text search indexes
  • File Storage: Persistent document storage with OCR processing
  • REST API: Full API access for integrations

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   React Frontend │────│   Rust Backend  │────│  PostgreSQL DB  │
│   (Port 8000)   │    │   (Axum API)    │    │   (Port 5433)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │              ┌─────────────────┐             │
         └──────────────│  File Storage   │─────────────┘
                        │  + OCR Engine   │
                        └─────────────────┘

📋 System Requirements

Minimum Requirements

  • CPU: 2 cores
  • RAM: 2GB
  • Storage: 10GB free space
  • OS: Linux, macOS, or Windows with Docker
  • CPU: 4+ cores
  • RAM: 4GB+
  • Storage: 50GB+ SSD
  • Network: Stable internet connection for OCR processing

🛠️ Manual Installation

For development or custom deployments without Docker:

Prerequisites

Install these dependencies on your system:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y \
    tesseract-ocr tesseract-ocr-eng \
    libtesseract-dev libleptonica-dev \
    postgresql postgresql-contrib \
    pkg-config libclang-dev

# macOS (requires Homebrew)
brew install tesseract leptonica postgresql rust nodejs npm

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Backend Setup

  1. Configure Database:
# Create database and user
sudo -u postgres psql
CREATE DATABASE readur;
CREATE USER readur_user WITH ENCRYPTED PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE readur TO readur_user;
\q
  1. Environment Configuration:
# Copy environment template
cp .env.example .env

# Edit configuration
nano .env

Required environment variables:

DATABASE_URL=postgresql://readur_user:your_password@localhost/readur
JWT_SECRET=your-super-secret-jwt-key-change-this
SERVER_ADDRESS=0.0.0.0:8000
UPLOAD_PATH=./uploads
WATCH_FOLDER=./watch
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,gif,bmp,tiff,txt,rtf,doc,docx
  1. Build and Run Backend:
# Install dependencies and run
cargo build --release
cargo run

Frontend Setup

  1. Install Dependencies:
cd frontend
npm install
  1. Development Mode:
npm run dev
# Frontend available at http://localhost:5173
  1. Production Build:
npm run build
# Built files in frontend/dist/

📖 User Guide

Getting Started

  1. First Login: Use the default admin credentials to access the system
  2. Upload Documents: Drag and drop files or use the upload button
  3. Wait for Processing: OCR processing happens automatically in the background
  4. Search and Organize: Use the powerful search features to find your documents

Supported File Types

Type Extensions OCR Support Notes
PDF .pdf Text extraction + OCR for scanned pages
Images .png, .jpg, .jpeg, .tiff, .bmp, .gif Full OCR text extraction
Text .txt, .rtf Direct text indexing
Office .doc, .docx ⚠️ Limited support

Using the Interface

Dashboard

  • Document Statistics: Total documents, storage usage, OCR status
  • Recent Activity: Latest uploads and processing status
  • Quick Actions: Fast access to upload and search

Document Management

  • List/Grid View: Toggle between different viewing modes
  • Sorting: Sort by date, name, size, or file type
  • Filtering: Filter by tags, file types, and OCR status
  • Bulk Actions: Select multiple documents for batch operations
  • Full-text Search: Search within document content
  • Metadata Filters: Filter by upload date, file size, type
  • Tag System: Organize documents with custom tags
  • OCR Status: Find processed vs. pending documents

Folder Watching

  • Non-destructive: Unlike paperless-ngx, source files remain untouched
  • Automatic Processing: New files are detected and processed automatically
  • Configurable: Set custom watch directories

Tips for Best Results

  1. OCR Quality: Higher resolution images (300+ DPI) produce better OCR results
  2. File Organization: Use consistent naming conventions for easier searching
  3. Regular Backups: Backup both database and file storage regularly
  4. Performance: For large document collections, consider increasing server resources

🔧 Configuration

Environment Variables

Variable Default Description
DATABASE_URL - PostgreSQL connection string
JWT_SECRET - Secret key for JWT tokens (required)
SERVER_ADDRESS 0.0.0.0:8000 Server bind address
UPLOAD_PATH ./uploads Document storage directory
WATCH_FOLDER ./watch Folder monitoring directory
ALLOWED_FILE_TYPES pdf,png,jpg,jpeg,txt,doc,docx Allowed file extensions

Docker Configuration

Customize docker-compose.yml for your environment:

services:
  readur:
    environment:
      - JWT_SECRET=change-this-secret-key
      - UPLOAD_PATH=/app/uploads
    volumes:
      - ./data/uploads:/app/uploads
      - ./data/watch:/app/watch
    ports:
      - "8000:8000"

Database Tuning

For better search performance with large document collections:

-- Increase shared_buffers for better caching
ALTER SYSTEM SET shared_buffers = '256MB';

-- Optimize for full-text search
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';

-- Restart PostgreSQL after changes

🔌 API Reference

Authentication Endpoints

# Register new user
POST /api/auth/register
Content-Type: application/json
{
  "username": "john_doe",
  "email": "john@example.com",
  "password": "secure_password"
}

# Login
POST /api/auth/login
Content-Type: application/json
{
  "username": "john_doe",
  "password": "secure_password"
}

# Get current user
GET /api/auth/me
Authorization: Bearer <jwt_token>

Document Management

# Upload document
POST /api/documents
Authorization: Bearer <jwt_token>
Content-Type: multipart/form-data
file: <binary_file_data>

# List documents
GET /api/documents?limit=50&offset=0
Authorization: Bearer <jwt_token>

# Download document
GET /api/documents/{id}/download
Authorization: Bearer <jwt_token>
# Search documents
GET /api/search?query=contract&limit=20
Authorization: Bearer <jwt_token>

# Advanced search with filters
GET /api/search?query=invoice&mime_types=application/pdf&tags=important
Authorization: Bearer <jwt_token>

🧪 Testing

Run All Tests

# Backend tests
cargo test

# Frontend tests
cd frontend && npm test

# Integration tests with Docker
docker compose -f docker-compose.test.yml up --build

Test Coverage

# Install cargo-tarpaulin for coverage
cargo install cargo-tarpaulin

# Generate coverage report
cargo tarpaulin --out Html

🔒 Security Considerations

Production Deployment

  1. Change Default Credentials: Update admin password immediately
  2. Use Strong JWT Secret: Generate a secure random key
  3. Enable HTTPS: Use a reverse proxy with SSL/TLS
  4. Database Security: Use strong passwords and restrict network access
  5. File Permissions: Ensure proper file system permissions
  6. Regular Updates: Keep dependencies and base images updated
# Use environment-specific secrets
JWT_SECRET=$(openssl rand -base64 64)

# Restrict database access
# Only allow connections from application container

# Use read-only file system where possible
# Mount uploads and watch folders as separate volumes

🚀 Deployment Options

Docker Swarm

version: '3.8'
services:
  readur:
    image: readur:latest
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
    networks:
      - readur-network
    secrets:
      - jwt_secret
      - db_password

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: readur
spec:
  replicas: 3
  selector:
    matchLabels:
      app: readur
  template:
    spec:
      containers:
      - name: readur
        image: readur:latest
        env:
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              name: readur-secrets
              key: jwt-secret

Cloud Platforms

  • AWS: Use ECS with RDS PostgreSQL
  • Google Cloud: Deploy to Cloud Run with Cloud SQL
  • Azure: Use Container Instances with Azure Database
  • DigitalOcean: App Platform with Managed Database

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Fork and clone the repository
git clone https://github.com/yourusername/readur.git
cd readur

# Create a feature branch
git checkout -b feature/amazing-feature

# Make your changes and test
cargo test
cd frontend && npm test

# Submit a pull request

Code Style

  • Rust: Follow rustfmt and clippy recommendations
  • Frontend: Use Prettier and ESLint configurations
  • Commits: Use conventional commit format

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📞 Support

  • Documentation: Check this README and inline code comments
  • Issues: Report bugs and request features on GitHub Issues
  • Discussions: Join community discussions on GitHub Discussions

Made with ❤️ and by the Readur team