Readur/TESTING.md

5.7 KiB

Testing Guide

This document describes the testing strategy for the Readur OCR document management system.

🧪 Testing Strategy

We have a clean three-tier testing approach:

  1. Unit Tests (Rust) - Fast, no dependencies, test individual components
  2. Integration Tests (Python) - Test against running services, user workflow validation
  3. Frontend Tests (JavaScript) - Component and API integration testing

🚀 Quick Start

# Run all tests
cargo run --bin test_runner

# Run specific test types
cargo run --bin test_runner unit         # Unit tests only
cargo run --bin test_runner integration  # Integration tests only  
cargo run --bin test_runner frontend     # Frontend tests only

Manual Test Execution

# Unit tests (fast, no dependencies)
cargo test --test unit_tests

# Integration tests (requires running server)
# 1. Start server: cargo run
# 2. Run tests: cargo test --test integration_tests

# Frontend tests
cd frontend && npm test -- --run

📋 Test Categories

Unit Tests (tests/unit_tests.rs)

Rust-based tests for core data structures and conversions without external dependencies:

  • Document response conversion (with/without OCR)
  • OCR field validation (confidence, word count, processing time)
  • User response conversion (security - no password leaks)
  • Search mode defaults and enums

Run with: cargo test --test unit_tests or cargo run --bin test_runner unit

Integration Tests (tests/integration_tests.rs)

Rust-based tests for complete user workflows against running services:

  • User registration and authentication (using CreateUser, LoginRequest types)
  • Document upload via multipart form (returns DocumentResponse)
  • OCR processing completion (with timeout and type validation)
  • OCR text retrieval via API endpoint (validates response structure)
  • Error handling (401, 404 responses)
  • Health endpoint validation

Run with: cargo test --test integration_tests or cargo run --bin test_runner integration

Advantages of Rust Integration Tests:

  • 🔒 Type Safety - Uses same models/types as main application
  • 🚀 Performance - Faster execution than Python scripts
  • 🛠️ IDE Support - Full autocomplete and refactoring support
  • 🔗 Code Reuse - Can import validation logic and test helpers

Frontend Tests

Located in frontend/src/:

  • Document details page with OCR functionality
  • API service mocking and integration
  • Component behavior and user interactions

Run with: cd frontend && npm test

🔧 Test Configuration

Server Requirements

Integration tests expect the server running at:

  • URL: http://localhost:8080
  • Health endpoint: /api/health returns {"status": "ok"}

Test Data

Integration tests use:

  • Test user: integrationtest@test.com
  • Test document: Simple text file with known content
  • Timeout: 30 seconds for OCR processing

📊 Test Coverage

What We Test

OCR Functionality:

  • Document upload → OCR processing → text retrieval
  • OCR metadata validation (confidence, word count, timing)
  • Error handling for failed OCR processing

API Endpoints:

  • Authentication flow (register/login)
  • Document management (upload/list)
  • OCR text retrieval (/api/documents/{id}/ocr)
  • Error responses (401, 404, 500)

Data Models:

  • Type safety and field validation
  • Response structure consistency
  • Security (no password leaks)

Frontend Components:

  • OCR dialog behavior
  • API integration and error handling
  • User interaction flows

What We Don't Test

  • Tesseract OCR accuracy (external library)
  • Database schema migrations (handled by SQLx)
  • File system operations (handled by OS)
  • Network failures (covered by error handling)

🐛 Debugging Test Failures

Integration Test Failures

  1. "Server is not running"

    # Start the server first
    cargo run
    # Then run tests
    ./run_user_tests.sh
    
  2. "OCR processing timed out"

    • Check server logs for OCR errors
    • Ensure Tesseract is installed and configured
    • Increase timeout in test if needed
  3. "Authentication failed"

    • Check JWT secret configuration
    • Verify database is accessible

Unit Test Failures

Unit tests should never fail due to external dependencies. If they do:

  1. Check for compilation errors in models
  2. Verify type definitions match expectations
  3. Review recent changes to data structures

🔄 Continuous Integration

For CI/CD pipelines:

# Example GitHub Actions workflow
- name: Run Unit Tests
  run: cargo test --lib

- name: Start Services  
  run: docker-compose up -d

- name: Wait for Health
  run: timeout 60s bash -c 'until curl -s http://localhost:8080/api/health | grep -q "ok"; do sleep 2; done'

- name: Run Integration Tests
  run: cargo test --test integration_ocr_test

📈 Adding New Tests

For New API Endpoints

  1. Add unit tests for data models in tests/unit_tests.rs
  2. Add integration test in tests/integration_ocr_test.rs
  3. Add frontend tests if UI components involved

For New OCR Features

  1. Test the happy path (document → processing → retrieval)
  2. Test error conditions (file format, processing failures)
  3. Test performance/timeout scenarios
  4. Validate response structure changes

🎯 Test Philosophy

Fast Feedback: Unit tests run in milliseconds, integration tests in seconds.

Real User Scenarios: Integration tests simulate actual user workflows.

Maintainable: Tests are simple, focused, and well-documented.

Reliable: Tests pass consistently and fail for good reasons.

Comprehensive: Critical paths are covered, edge cases are handled.