Readur/README.md

3.7 KiB
Raw History

Readur

A Rust-based document management system similar to paperless-ngx, featuring OCR, full-text search, and a modern React frontend.

Features

  • = Authentication: JWT-based user authentication
  • =<3D> File Upload: Support for PDF, text, images, and Office documents
  • = OCR Processing: Automatic text extraction using Tesseract
  • = Full-text Search: PostgreSQL-powered search with ranking
  • =<3D> Folder Monitoring: Automatic processing of files in watch folder
  • < Web Interface: Modern React frontend with drag-and-drop uploads
  • =3 Docker Support: Complete containerization with multi-stage builds

Quick Start

  1. Clone the repository:
git clone <repository-url>
cd readur
  1. Start the services:
docker-compose up -d
  1. Access the application at http://localhost:8000

Manual Setup

Prerequisites

  • Rust 1.75+
  • PostgreSQL 15+
  • Tesseract OCR
  • Node.js 18+

Backend Setup

  1. Install system dependencies:
# Ubuntu/Debian
sudo apt-get install tesseract-ocr tesseract-ocr-eng libtesseract-dev libleptonica-dev

# macOS
brew install tesseract leptonica
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your database URL and other settings
  1. Run database migrations:
cargo run
  1. Start the backend:
cargo run

Frontend Setup

  1. Navigate to frontend directory:
cd frontend
  1. Install dependencies:
npm install
  1. Start development server:
npm run dev
  1. Build for production:
npm run build

API Endpoints

Authentication

  • POST /api/auth/register - User registration
  • POST /api/auth/login - User login
  • GET /api/auth/me - Get current user

Documents

  • POST /api/documents - Upload document
  • GET /api/documents - List user documents
  • GET /api/documents/:id/download - Download document
  • GET /api/search?query=text - Search documents

Configuration

Environment variables:

  • DATABASE_URL - PostgreSQL connection string
  • JWT_SECRET - Secret key for JWT tokens
  • SERVER_ADDRESS - Server bind address (default: 0.0.0.0:8000)
  • UPLOAD_PATH - Directory for uploaded files (default: ./uploads)
  • WATCH_FOLDER - Directory to monitor for new files (default: ./watch)
  • ALLOWED_FILE_TYPES - Comma-separated list of allowed extensions

File Processing

The system supports:

  • PDFs: Text extraction and OCR for scanned documents
  • Images: OCR text extraction (PNG, JPG, JPEG, TIFF, BMP)
  • Text files: Direct content indexing
  • Office documents: DOC, DOCX support

Testing

Backend Tests

cargo test

Frontend Tests

cd frontend
npm test

Development

Project Structure

readur/
 src/                 # Rust backend source
    auth.rs         # Authentication logic
    db.rs           # Database operations
    models.rs       # Data models
    ocr.rs          # OCR processing
    routes/         # API routes
    tests/          # Unit tests
 frontend/           # React frontend
    src/
       components/ # React components
       contexts/   # React contexts
       services/   # API services
    package.json
 Dockerfile          # Multi-stage Docker build
 docker-compose.yml  # Development environment

Adding New Features

  1. Backend changes go in src/
  2. Frontend changes go in frontend/src/
  3. Add tests for new functionality
  4. Update API documentation

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is licensed under the MIT License.