843 lines
23 KiB
Markdown
843 lines
23 KiB
Markdown
# Architecture Overview
|
|
|
|
This document provides a comprehensive overview of Readur's system architecture, component interactions, data flows, and design decisions.
|
|
|
|
## System Components
|
|
|
|
### High-Level Architecture
|
|
|
|
**Important:** Readur is designed as a single-instance, monolithic application. It does NOT support multiple server instances, clustering, or high availability configurations.
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Readur Single Instance │
|
|
│ │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
|
|
│ │ Web Server │ │ Business │ │ Background Services │ │
|
|
│ │ (Axum) │ │ Logic │ │ - OCR Worker │ │
|
|
│ └─────────────┘ └─────────────┘ │ - File Watcher │ │
|
|
│ │ - Queue Processor │ │
|
|
│ └──────────────────────┘ │
|
|
└───────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌───────────────────▼─────────────────────┐
|
|
│ Data Layer │
|
|
│ ┌────────┐ ┌────────┐ ┌──────────┐ │
|
|
│ │Database│ │Storage │ │ Queue │ │
|
|
│ │ (PG) │ │(S3/FS) │ │(DB-based) │ │
|
|
│ └────────┘ └────────┘ └──────────┘ │
|
|
└──────────────────────────────────────────┘
|
|
```
|
|
|
|
### Component Breakdown
|
|
|
|
```
|
|
Readur Application Instance
|
|
├── Web Server (Axum)
|
|
│ ├── HTTP API Endpoints
|
|
│ ├── WebSocket Server
|
|
│ ├── Static File Server
|
|
│ └── Middleware Stack
|
|
├── Business Logic Layer
|
|
│ ├── Document Management
|
|
│ ├── Search Engine
|
|
│ ├── User Management
|
|
│ ├── OCR Processing
|
|
│ └── Source Synchronization
|
|
├── Data Access Layer
|
|
│ ├── Database Connection Pool
|
|
│ ├── File Storage Interface
|
|
│ ├── Cache Layer
|
|
│ └── External API Clients
|
|
└── Background Services
|
|
├── OCR Queue Worker
|
|
├── File Watcher
|
|
├── Source Scheduler
|
|
└── Cleanup Tasks
|
|
```
|
|
|
|
## Data Flow Architecture
|
|
|
|
### Document Upload Flow
|
|
|
|
```
|
|
User Upload Request
|
|
│
|
|
▼
|
|
[1] Nginx/Reverse Proxy
|
|
│
|
|
├─── Rate Limiting
|
|
├─── Request Validation
|
|
└─── Load Balancing
|
|
│
|
|
▼
|
|
[2] Authentication Middleware
|
|
│
|
|
├─── JWT Validation
|
|
└─── Permission Check
|
|
│
|
|
▼
|
|
[3] File Upload Handler
|
|
│
|
|
├─── File Type Validation
|
|
├─── Size Validation
|
|
└─── Virus Scanning (optional)
|
|
│
|
|
▼
|
|
[4] Storage Service
|
|
│
|
|
├─── Generate UUID
|
|
├─── Calculate Hash
|
|
└─── Store File
|
|
│
|
|
▼
|
|
[5] Database Transaction
|
|
│
|
|
├─── Create Document Record
|
|
├─── Add Metadata
|
|
└─── Queue for OCR
|
|
│
|
|
▼
|
|
[6] OCR Queue
|
|
│
|
|
├─── Priority Assignment
|
|
└─── Worker Notification
|
|
│
|
|
▼
|
|
[7] Response to Client
|
|
│
|
|
└─── Document ID + Status
|
|
```
|
|
|
|
### OCR Processing Pipeline
|
|
|
|
```
|
|
OCR Queue Entry
|
|
│
|
|
▼
|
|
[1] Queue Worker Pickup
|
|
│
|
|
├─── Lock Document
|
|
└─── Update Status
|
|
│
|
|
▼
|
|
[2] File Retrieval
|
|
│
|
|
├─── Load from Storage
|
|
└─── Verify Integrity
|
|
│
|
|
▼
|
|
[3] Preprocessing
|
|
│
|
|
├─── Image Enhancement
|
|
├─── Format Conversion
|
|
└─── Page Splitting
|
|
│
|
|
▼
|
|
[4] OCR Engine (Tesseract)
|
|
│
|
|
├─── Language Detection
|
|
├─── Text Extraction
|
|
└─── Confidence Scoring
|
|
│
|
|
▼
|
|
[5] Post-processing
|
|
│
|
|
├─── Text Cleaning
|
|
├─── Format Normalization
|
|
└─── Metadata Extraction
|
|
│
|
|
▼
|
|
[6] Database Update
|
|
│
|
|
├─── Store Extracted Text
|
|
├─── Update Search Index
|
|
└─── Record Metrics
|
|
│
|
|
▼
|
|
[7] Notification
|
|
│
|
|
├─── WebSocket Update
|
|
└─── Email (if configured)
|
|
```
|
|
|
|
### Search Request Flow
|
|
|
|
```
|
|
Search Query
|
|
│
|
|
▼
|
|
[1] Query Parser
|
|
│
|
|
├─── Tokenization
|
|
├─── Stemming
|
|
└─── Query Expansion
|
|
│
|
|
▼
|
|
[2] Search Executor
|
|
│
|
|
├─── Full-Text Search (PostgreSQL)
|
|
├─── Filter Application
|
|
└─── Ranking Algorithm
|
|
│
|
|
▼
|
|
[3] Result Processing
|
|
│
|
|
├─── Snippet Generation
|
|
├─── Highlighting
|
|
└─── Facet Calculation
|
|
│
|
|
▼
|
|
[4] Permission Filter
|
|
│
|
|
└─── User Access Check
|
|
│
|
|
▼
|
|
[5] Response Assembly
|
|
│
|
|
├─── Pagination
|
|
├─── Metadata Enrichment
|
|
└─── JSON Serialization
|
|
```
|
|
|
|
## Queue Architecture
|
|
|
|
### OCR Queue System
|
|
|
|
```sql
|
|
-- Queue table structure
|
|
CREATE TABLE ocr_queue (
|
|
id UUID PRIMARY KEY,
|
|
document_id UUID REFERENCES documents(id),
|
|
status VARCHAR(20), -- pending, processing, completed, failed
|
|
priority INTEGER DEFAULT 5,
|
|
retry_count INTEGER DEFAULT 0,
|
|
max_retries INTEGER DEFAULT 3,
|
|
created_at TIMESTAMP,
|
|
started_at TIMESTAMP,
|
|
completed_at TIMESTAMP,
|
|
error_message TEXT,
|
|
worker_id VARCHAR(100)
|
|
);
|
|
|
|
-- Efficient queue fetching with SKIP LOCKED
|
|
SELECT * FROM ocr_queue
|
|
WHERE status = 'pending'
|
|
ORDER BY priority DESC, created_at ASC
|
|
FOR UPDATE SKIP LOCKED
|
|
LIMIT 1;
|
|
```
|
|
|
|
### Queue Worker Architecture
|
|
|
|
```rust
|
|
// Queue processing with fixed thread pools
|
|
pub struct OcrQueueService {
|
|
pool: PgPool,
|
|
workers: Vec<JoinHandle<()>>,
|
|
shutdown: Arc<AtomicBool>,
|
|
}
|
|
|
|
impl OcrQueueService {
|
|
pub async fn start_workers(&self) {
|
|
// Fixed thread allocation:
|
|
// - OCR runtime: 3 threads
|
|
// - Background runtime: 2 threads
|
|
// - Database runtime: 2 threads
|
|
let ocr_workers = 3;
|
|
|
|
for worker_id in 0..ocr_workers {
|
|
let pool = self.pool.clone();
|
|
let shutdown = self.shutdown.clone();
|
|
|
|
let handle = tokio::spawn(async move {
|
|
while !shutdown.load(Ordering::Relaxed) {
|
|
if let Some(job) = fetch_next_job(&pool).await {
|
|
process_ocr_job(job, &pool).await;
|
|
} else {
|
|
tokio::time::sleep(Duration::from_secs(1)).await;
|
|
}
|
|
}
|
|
});
|
|
|
|
self.workers.push(handle);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Storage Architecture
|
|
|
|
### Storage Abstraction Layer
|
|
|
|
```rust
|
|
// Storage trait for multiple backends
|
|
#[async_trait]
|
|
pub trait StorageBackend: Send + Sync {
|
|
async fn store(&self, key: &str, data: &[u8]) -> Result<()>;
|
|
async fn retrieve(&self, key: &str) -> Result<Vec<u8>>;
|
|
async fn delete(&self, key: &str) -> Result<()>;
|
|
async fn exists(&self, key: &str) -> Result<bool>;
|
|
async fn list(&self, prefix: &str) -> Result<Vec<String>>;
|
|
}
|
|
|
|
// Implementations
|
|
pub struct LocalStorage { base_path: PathBuf }
|
|
pub struct S3Storage { bucket: String, client: S3Client }
|
|
pub struct AzureStorage { container: String, client: BlobClient }
|
|
```
|
|
|
|
### File Organization
|
|
|
|
```
|
|
Storage Root/
|
|
├── documents/
|
|
│ ├── {year}/{month}/{day}/
|
|
│ │ └── {uuid}.{extension}
|
|
├── thumbnails/
|
|
│ ├── {year}/{month}/{day}/
|
|
│ │ └── {uuid}_thumb.jpg
|
|
├── processed/
|
|
│ ├── ocr/
|
|
│ │ └── {uuid}_ocr.txt
|
|
│ └── metadata/
|
|
│ └── {uuid}_meta.json
|
|
└── temp/
|
|
└── {session_id}/
|
|
└── {temp_files}
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
### Core Tables
|
|
|
|
```sql
|
|
-- Users table
|
|
CREATE TABLE users (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
username VARCHAR(100) UNIQUE NOT NULL,
|
|
email VARCHAR(255) UNIQUE NOT NULL,
|
|
password_hash VARCHAR(255) NOT NULL,
|
|
role VARCHAR(20) DEFAULT 'viewer',
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
last_login TIMESTAMP,
|
|
settings JSONB DEFAULT '{}'::jsonb
|
|
);
|
|
|
|
-- Documents table
|
|
CREATE TABLE documents (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
|
|
title VARCHAR(500),
|
|
filename VARCHAR(255) NOT NULL,
|
|
file_path VARCHAR(500) NOT NULL,
|
|
file_hash VARCHAR(64),
|
|
file_size BIGINT,
|
|
mime_type VARCHAR(100),
|
|
content TEXT,
|
|
content_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
|
|
ocr_status VARCHAR(20) DEFAULT 'pending',
|
|
ocr_confidence FLOAT,
|
|
metadata JSONB DEFAULT '{}'::jsonb,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
processed_at TIMESTAMP,
|
|
source_id UUID REFERENCES sources(id),
|
|
CONSTRAINT unique_file_hash UNIQUE(file_hash, user_id)
|
|
);
|
|
|
|
-- Create indexes for performance
|
|
CREATE INDEX idx_documents_content_vector ON documents USING gin(content_vector);
|
|
CREATE INDEX idx_documents_user_created ON documents(user_id, created_at DESC);
|
|
CREATE INDEX idx_documents_metadata ON documents USING gin(metadata jsonb_path_ops);
|
|
CREATE INDEX idx_documents_file_hash ON documents(file_hash) WHERE file_hash IS NOT NULL;
|
|
```
|
|
|
|
### Search Optimization
|
|
|
|
```sql
|
|
-- Full-text search function
|
|
CREATE OR REPLACE FUNCTION search_documents(
|
|
query_text TEXT,
|
|
user_id_param UUID,
|
|
limit_param INT DEFAULT 20,
|
|
offset_param INT DEFAULT 0
|
|
) RETURNS TABLE (
|
|
id UUID,
|
|
title TEXT,
|
|
content TEXT,
|
|
rank REAL,
|
|
snippet TEXT
|
|
) AS $$
|
|
BEGIN
|
|
RETURN QUERY
|
|
WITH search_query AS (
|
|
SELECT plainto_tsquery('english', query_text) AS q
|
|
),
|
|
ranked_results AS (
|
|
SELECT
|
|
d.id,
|
|
d.title,
|
|
d.content,
|
|
ts_rank_cd(d.content_vector, sq.q) AS rank,
|
|
ts_headline(
|
|
'english',
|
|
d.content,
|
|
sq.q,
|
|
'MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>'
|
|
) AS snippet
|
|
FROM documents d, search_query sq
|
|
WHERE
|
|
d.user_id = user_id_param
|
|
AND d.content_vector @@ sq.q
|
|
)
|
|
SELECT * FROM ranked_results
|
|
ORDER BY rank DESC
|
|
LIMIT limit_param
|
|
OFFSET offset_param;
|
|
END;
|
|
$$ LANGUAGE plpgsql;
|
|
```
|
|
|
|
## Synchronization Architecture
|
|
|
|
### WebDAV Sync
|
|
|
|
```rust
|
|
pub struct WebDavSync {
|
|
client: WebDavClient,
|
|
db: Arc<DbConnection>,
|
|
progress: Arc<Mutex<SyncProgress>>,
|
|
}
|
|
|
|
impl WebDavSync {
|
|
pub async fn smart_sync(&self) -> Result<SyncResult> {
|
|
// 1. Fetch remote file list with ETags
|
|
let remote_files = self.client.list_files().await?;
|
|
|
|
// 2. Compare with local database
|
|
let local_files = self.db.get_source_files().await?;
|
|
|
|
// 3. Determine changes
|
|
let changes = self.calculate_changes(&remote_files, &local_files);
|
|
|
|
// 4. Process changes in batches
|
|
for batch in changes.chunks(100) {
|
|
self.process_batch(batch).await?;
|
|
self.update_progress().await?;
|
|
}
|
|
|
|
// 5. Clean up deleted files
|
|
self.process_deletions(&remote_files, &local_files).await?;
|
|
|
|
Ok(SyncResult {
|
|
added: changes.added.len(),
|
|
updated: changes.updated.len(),
|
|
deleted: changes.deleted.len()
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
### Source Scheduler
|
|
|
|
```rust
|
|
pub struct SourceScheduler {
|
|
sources: Arc<RwLock<Vec<Source>>>,
|
|
executor: Arc<ThreadPool>,
|
|
}
|
|
|
|
impl SourceScheduler {
|
|
pub async fn run(&self) {
|
|
loop {
|
|
let now = Utc::now();
|
|
let sources = self.sources.read().await;
|
|
|
|
for source in sources.iter() {
|
|
if source.should_sync(now) {
|
|
let source_clone = source.clone();
|
|
self.executor.spawn(async move {
|
|
match source_clone.sync().await {
|
|
Ok(result) => log::info!("Sync completed: {:?}", result),
|
|
Err(e) => log::error!("Sync failed: {}", e),
|
|
}
|
|
});
|
|
}
|
|
}
|
|
|
|
tokio::time::sleep(Duration::from_secs(60)).await;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
### Connection Pooling
|
|
|
|
```rust
|
|
// Database connection pool configuration
|
|
let pool = PgPoolOptions::new()
|
|
.max_connections(32)
|
|
.min_connections(5)
|
|
.connect_timeout(Duration::from_secs(5))
|
|
.acquire_timeout(Duration::from_secs(10))
|
|
.idle_timeout(Duration::from_secs(600))
|
|
.max_lifetime(Duration::from_secs(1800))
|
|
.connect(&database_url)
|
|
.await?;
|
|
```
|
|
|
|
### Caching Strategy
|
|
|
|
```rust
|
|
// Multi-level caching
|
|
pub struct CacheManager {
|
|
l1_cache: Arc<DashMap<String, CachedItem>>, // In-memory
|
|
l2_cache: Option<RedisClient>, // Redis (optional)
|
|
}
|
|
|
|
impl CacheManager {
|
|
pub async fn get<T: DeserializeOwned>(&self, key: &str) -> Option<T> {
|
|
// Check L1 cache
|
|
if let Some(item) = self.l1_cache.get(key) {
|
|
if !item.is_expired() {
|
|
return Some(item.value.clone());
|
|
}
|
|
}
|
|
|
|
// Check L2 cache
|
|
if let Some(redis) = &self.l2_cache {
|
|
if let Ok(value) = redis.get(key).await {
|
|
self.l1_cache.insert(key.to_string(), value.clone());
|
|
return Some(value);
|
|
}
|
|
}
|
|
|
|
None
|
|
}
|
|
}
|
|
```
|
|
|
|
### Batch Processing
|
|
|
|
```rust
|
|
// Batch document processing
|
|
pub async fn batch_process_documents(
|
|
documents: Vec<Document>,
|
|
batch_size: usize,
|
|
) -> Result<Vec<ProcessResult>> {
|
|
let semaphore = Arc::new(Semaphore::new(batch_size));
|
|
let mut tasks = Vec::new();
|
|
|
|
for doc in documents {
|
|
let permit = semaphore.clone().acquire_owned().await?;
|
|
let task = tokio::spawn(async move {
|
|
let result = process_document(doc).await;
|
|
drop(permit);
|
|
result
|
|
});
|
|
tasks.push(task);
|
|
}
|
|
|
|
let results = futures::future::join_all(tasks).await;
|
|
Ok(results.into_iter().filter_map(Result::ok).collect())
|
|
}
|
|
```
|
|
|
|
## Security Architecture
|
|
|
|
### Authentication Flow
|
|
|
|
```
|
|
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
│ Client │─────►│ API │─────►│ Auth │
|
|
└─────────┘ └─────────┘ └─────────┘
|
|
│ │ │
|
|
│ POST /login │ Validate │
|
|
│ {user,pass} │ Credentials │
|
|
│ │ │
|
|
│◄───────────────┼─────────────────┤
|
|
│ JWT Token │ Generate │
|
|
│ │ Token │
|
|
│ │ │
|
|
│ GET /api/* │ Verify │
|
|
│ Auth: Bearer │ JWT │
|
|
│ │ │
|
|
│◄───────────────┼─────────────────┤
|
|
│ API Response │ Authorized │
|
|
```
|
|
|
|
### Permission Model
|
|
|
|
```rust
|
|
// Role-based access control
|
|
#[derive(Debug, Clone, PartialEq)]
|
|
pub enum Role {
|
|
Admin, // Full system access
|
|
Editor, // Create, read, update, delete own documents
|
|
Viewer, // Read-only access to own documents
|
|
}
|
|
|
|
impl Role {
|
|
pub fn can_upload(&self) -> bool {
|
|
matches!(self, Role::Admin | Role::Editor)
|
|
}
|
|
|
|
pub fn can_delete(&self) -> bool {
|
|
matches!(self, Role::Admin | Role::Editor)
|
|
}
|
|
|
|
pub fn can_manage_users(&self) -> bool {
|
|
matches!(self, Role::Admin)
|
|
}
|
|
|
|
pub fn can_configure_system(&self) -> bool {
|
|
matches!(self, Role::Admin)
|
|
}
|
|
}
|
|
```
|
|
|
|
## Scalability Considerations
|
|
|
|
### Single-Instance Optimization
|
|
|
|
Since Readur is a single-instance application, scaling is achieved through:
|
|
|
|
1. **Vertical Scaling**: Increase CPU, RAM, and storage on the single server
|
|
2. **Storage Offloading**: Use S3 or compatible object storage
|
|
3. **Database Optimization**: Tune PostgreSQL for better performance
|
|
4. **Queue Management**: Optimize OCR queue processing
|
|
|
|
```yaml
|
|
# Docker Compose single-instance configuration
|
|
version: '3.8'
|
|
services:
|
|
readur:
|
|
image: readur:latest
|
|
# Single instance only - do NOT use replicas
|
|
deploy:
|
|
replicas: 1 # MUST be 1
|
|
resources:
|
|
limits:
|
|
cpus: '4' # Increase for better performance
|
|
memory: 4G # Increase for larger workloads
|
|
environment:
|
|
- DATABASE_URL=postgresql://db:5432/readur
|
|
- CONCURRENT_OCR_JOBS=3 # Fixed thread pool
|
|
depends_on:
|
|
- db
|
|
```
|
|
|
|
### Database Sharding Strategy
|
|
|
|
```sql
|
|
-- Partition documents by user_id for horizontal scaling
|
|
CREATE TABLE documents_partition_template (
|
|
LIKE documents INCLUDING ALL
|
|
) PARTITION BY HASH (user_id);
|
|
|
|
-- Create partitions
|
|
CREATE TABLE documents_part_0 PARTITION OF documents_partition_template
|
|
FOR VALUES WITH (modulus 4, remainder 0);
|
|
CREATE TABLE documents_part_1 PARTITION OF documents_partition_template
|
|
FOR VALUES WITH (modulus 4, remainder 1);
|
|
CREATE TABLE documents_part_2 PARTITION OF documents_partition_template
|
|
FOR VALUES WITH (modulus 4, remainder 2);
|
|
CREATE TABLE documents_part_3 PARTITION OF documents_partition_template
|
|
FOR VALUES WITH (modulus 4, remainder 3);
|
|
```
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Metrics Collection
|
|
|
|
```rust
|
|
// Prometheus metrics
|
|
lazy_static! {
|
|
static ref HTTP_REQUESTS: IntCounterVec = register_int_counter_vec!(
|
|
"http_requests_total",
|
|
"Total HTTP requests",
|
|
&["method", "endpoint", "status"]
|
|
).unwrap();
|
|
|
|
static ref OCR_PROCESSING_TIME: HistogramVec = register_histogram_vec!(
|
|
"ocr_processing_duration_seconds",
|
|
"OCR processing time",
|
|
&["language", "status"]
|
|
).unwrap();
|
|
|
|
static ref ACTIVE_USERS: IntGauge = register_int_gauge!(
|
|
"active_users_total",
|
|
"Number of active users"
|
|
).unwrap();
|
|
}
|
|
```
|
|
|
|
### Distributed Tracing
|
|
|
|
```rust
|
|
// OpenTelemetry integration
|
|
use opentelemetry::trace::Tracer;
|
|
|
|
pub async fn process_document_traced(doc: Document) -> Result<()> {
|
|
let tracer = opentelemetry::global::tracer("readur");
|
|
let span = tracer.start("process_document");
|
|
let cx = Context::current_with_span(span);
|
|
|
|
// Trace document loading
|
|
let _load_span = tracer.start_with_context("load_document", &cx);
|
|
let file_data = load_file(&doc.file_path).await?;
|
|
|
|
// Trace OCR processing
|
|
let _ocr_span = tracer.start_with_context("ocr_processing", &cx);
|
|
let text = extract_text(&file_data).await?;
|
|
|
|
// Trace database update
|
|
let _db_span = tracer.start_with_context("update_database", &cx);
|
|
update_document_content(&doc.id, &text).await?;
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
## Deployment Architecture
|
|
|
|
### Kubernetes Deployment
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: readur
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: readur
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: readur
|
|
spec:
|
|
containers:
|
|
- name: readur
|
|
image: readur:latest
|
|
ports:
|
|
- containerPort: 8080
|
|
resources:
|
|
requests:
|
|
memory: "1Gi"
|
|
cpu: "500m"
|
|
limits:
|
|
memory: "2Gi"
|
|
cpu: "1000m"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health
|
|
port: 8080
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /ready
|
|
port: 8080
|
|
initialDelaySeconds: 5
|
|
periodSeconds: 5
|
|
env:
|
|
- name: DATABASE_URL
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: readur-secrets
|
|
key: database-url
|
|
```
|
|
|
|
## Development Workflow
|
|
|
|
### Local Development Setup
|
|
|
|
```bash
|
|
# Development environment
|
|
docker-compose -f docker-compose.dev.yml up -d
|
|
|
|
# Database migrations
|
|
cargo run --bin migrate
|
|
|
|
# Run with hot reload
|
|
cargo watch -x run
|
|
|
|
# Frontend development
|
|
cd frontend && npm run dev
|
|
```
|
|
|
|
### Testing Strategy
|
|
|
|
```rust
|
|
// Unit test example
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
|
|
#[tokio::test]
|
|
async fn test_document_processing() {
|
|
let doc = create_test_document();
|
|
let result = process_document(doc).await;
|
|
assert!(result.is_ok());
|
|
assert_eq!(result.unwrap().status, "completed");
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn test_search_functionality() {
|
|
let pool = create_test_pool().await;
|
|
seed_test_data(&pool).await;
|
|
|
|
let results = search_documents("test query", &pool).await;
|
|
assert!(!results.is_empty());
|
|
}
|
|
}
|
|
```
|
|
|
|
## Future Architecture Considerations
|
|
|
|
### Planned Enhancements
|
|
|
|
1. **Elasticsearch Integration**: For advanced search capabilities
|
|
2. **Machine Learning Pipeline**: For document classification and smart tagging
|
|
3. **Microservices Migration**: Separate OCR, search, and storage services
|
|
4. **GraphQL API**: Alternative to REST for flexible querying
|
|
5. **Event Sourcing**: For audit trail and time-travel debugging
|
|
6. **Multi-tenancy**: Support for multiple organizations
|
|
|
|
### Technology Roadmap
|
|
|
|
- **Q1 2025**: Redis caching layer
|
|
- **Q2 2025**: Elasticsearch integration
|
|
- **Q3 2025**: ML-based document classification
|
|
- **Q4 2025**: Microservices architecture
|
|
|
|
## Architecture Decision Records (ADRs)
|
|
|
|
### ADR-001: Use Rust for Backend
|
|
|
|
**Status**: Accepted
|
|
**Context**: Need high performance and memory safety
|
|
**Decision**: Use Rust with Axum framework
|
|
**Consequences**: Steep learning curve but excellent performance
|
|
|
|
### ADR-002: PostgreSQL for Primary Database
|
|
|
|
**Status**: Accepted
|
|
**Context**: Need reliable ACID compliance and full-text search
|
|
**Decision**: Use PostgreSQL with built-in FTS
|
|
**Consequences**: Single point of failure without replication
|
|
|
|
### ADR-003: Monolithic Single-Instance Architecture
|
|
|
|
**Status**: Accepted
|
|
**Context**: Simpler architecture, easier deployment and maintenance
|
|
**Decision**: Single-instance monolithic application without clustering support
|
|
**Consequences**:
|
|
- Pros: Simple deployment, no distributed system complexity, easier debugging
|
|
- Cons: No high availability, scaling limited to vertical scaling
|
|
- Note: This is a deliberate design choice for simplicity and reliability |