diff --git a/README.md b/README.md index 57ee228..a9c5e73 100644 --- a/README.md +++ b/README.md @@ -7,11 +7,16 @@ A powerful, modern document management system built with Rust and React. Readur ## ✨ Features -- 🔐 **Secure Authentication**: JWT-based user authentication with bcrypt password hashing +- 🔐 **Secure Authentication**: JWT-based user authentication with bcrypt password hashing + OIDC/SSO support +- 👥 **User Management**: Role-based access control with Admin and User roles - 📤 **Smart File Upload**: Drag-and-drop support for PDF, images, text files, and Office documents - 🔍 **Advanced OCR**: Automatic text extraction using Tesseract for searchable document content -- 🔎 **Powerful Search**: PostgreSQL full-text search with advanced filtering and ranking -- 👁️ **Folder Monitoring**: Non-destructive file watching (unlike paperless-ngx, doesn't consume source files) +- 🔎 **Powerful Search**: PostgreSQL full-text search with multiple modes (simple, phrase, fuzzy, boolean) +- 🔗 **Multi-Source Sync**: WebDAV, Local Folders, and S3-compatible storage integration +- 🏷️ **Labels & Organization**: Comprehensive tagging system with color-coding and hierarchical structure +- 👁️ **Folder Monitoring**: Non-destructive file watching with intelligent sync scheduling +- 📊 **Health Monitoring**: Proactive source validation and system health tracking +- 🔔 **Notifications**: Real-time alerts for sync events, OCR completion, and system status - 🎨 **Modern UI**: Beautiful React frontend with Material-UI components and responsive design - 🐳 **Docker Ready**: Complete containerization with production-ready multi-stage builds - ⚡ **High Performance**: Rust backend for speed and reliability @@ -44,6 +49,13 @@ open http://localhost:8000 - [🔧 Configuration](docs/configuration.md) - Environment variables and settings - [📖 User Guide](docs/user-guide.md) - How to use Readur effectively +### Core Features +- [🔗 Sources Guide](docs/sources-guide.md) - WebDAV, Local Folders, and S3 integration +- [👥 User Management](docs/user-management-guide.md) - Authentication, roles, and administration +- [🏷️ Labels & Organization](docs/labels-and-organization.md) - Document tagging and categorization +- [🔎 Advanced Search](docs/advanced-search.md) - Search modes, syntax, and optimization +- [🔐 OIDC Setup](docs/oidc-setup.md) - Single Sign-On integration + ### Deployment & Operations - [🚀 Deployment Guide](docs/deployment.md) - Production deployment, SSL, monitoring - [🔄 Reverse Proxy Setup](docs/REVERSE_PROXY.md) - Nginx, Traefik, and more diff --git a/docs/advanced-search.md b/docs/advanced-search.md new file mode 100644 index 0000000..fb9c51b --- /dev/null +++ b/docs/advanced-search.md @@ -0,0 +1,687 @@ +# Advanced Search Guide + +Readur provides powerful search capabilities that go far beyond simple text matching. This comprehensive guide covers all search modes, advanced filtering, query syntax, and optimization techniques. + +## Table of Contents + +- [Overview](#overview) +- [Search Modes](#search-modes) +- [Query Syntax](#query-syntax) +- [Advanced Filtering](#advanced-filtering) +- [Search Interface](#search-interface) +- [Search Optimization](#search-optimization) +- [Saved Searches](#saved-searches) +- [Search Analytics](#search-analytics) +- [API Search](#api-search) +- [Troubleshooting](#troubleshooting) + +## Overview + +Readur's search system is built on PostgreSQL's full-text search capabilities with additional enhancements for document-specific requirements. + +### Search Capabilities + +- **Full-Text Search**: Search within document content and OCR-extracted text +- **Multiple Search Modes**: Simple, phrase, fuzzy, and boolean search options +- **Advanced Filtering**: Filter by file type, date, size, labels, and source +- **Real-Time Suggestions**: Auto-complete and query suggestions as you type +- **Faceted Search**: Browse documents by categories and properties +- **Cross-Language Support**: Search in multiple languages with OCR text +- **Relevance Ranking**: Intelligent scoring and result ordering + +### Search Sources + +Readur searches across multiple content sources: + +1. **Document Content**: Original text from text files and PDFs +2. **OCR Text**: Extracted text from images and scanned documents +3. **Metadata**: File names, descriptions, and document properties +4. **Labels**: User-created and system-generated tags +5. **Source Information**: Upload source and file paths + +## Search Modes + +### Simple Search (Smart Search) + +**Best for**: General purpose searching and quick document discovery + +**How it works**: +- Automatically applies stemming and fuzzy matching +- Searches across all text content and metadata +- Provides intelligent relevance scoring +- Handles common typos and variations + +**Example**: +``` +invoice 2024 +``` +Finds: "Invoice Q1 2024", "invoicing for 2024", "2024 invoice data" + +**Features**: +- **Auto-stemming**: "running" matches "run", "runs", "runner" +- **Fuzzy tolerance**: "recieve" matches "receive" +- **Partial matching**: "doc" matches "document", "documentation" +- **Relevance ranking**: More relevant matches appear first + +### Phrase Search (Exact Match) + +**Best for**: Finding exact phrases or specific terminology + +**How it works**: +- Searches for the exact sequence of words +- Case-insensitive but order-sensitive +- Useful for finding specific quotes, names, or technical terms + +**Syntax**: Use quotes around the phrase +``` +"quarterly financial report" +"John Smith" +"error code 404" +``` + +**Features**: +- **Exact word order**: Only matches the precise sequence +- **Case insensitive**: "John Smith" matches "john smith" +- **Punctuation ignored**: "error-code" matches "error code" + +### Fuzzy Search (Approximate Matching) + +**Best for**: Handling typos, OCR errors, and spelling variations + +**How it works**: +- Uses trigram similarity to find approximate matches +- Configurable similarity threshold (default: 0.8) +- Particularly useful for OCR-processed documents with errors + +**Syntax**: Use the `~` operator +``` +invoice~ # Finds "invoice", "invoce", "invoise" +contract~ # Finds "contract", "contarct", "conract" +``` + +**Configuration**: +- **Threshold adjustment**: Configure sensitivity via user settings +- **Language-specific**: Different languages may need different thresholds +- **OCR optimization**: Higher tolerance for OCR-processed documents + +### Boolean Search (Logical Operators) + +**Best for**: Complex queries with multiple conditions and precise control + +**Operators**: +- **AND**: Both terms must be present +- **OR**: Either term can be present +- **NOT**: Exclude documents with the term +- **Parentheses**: Group conditions + +**Examples**: +``` +budget AND 2024 # Both "budget" and "2024" +invoice OR receipt # Either "invoice" or "receipt" +contract NOT draft # "contract" but not "draft" +(budget OR financial) AND 2024 # Complex grouping +marketing AND (campaign OR strategy) # Marketing documents about campaigns or strategy +``` + +**Advanced Boolean Examples**: +``` +# Find completed project documents +project AND (final OR completed OR approved) NOT draft + +# Financial documents excluding personal items +(invoice OR receipt OR budget) NOT personal + +# Recent important documents +(urgent OR priority OR critical) AND label:"this month" +``` + +## Query Syntax + +### Field-Specific Search + +Search within specific document fields for precise targeting. + +#### Available Fields + +| Field | Description | Example | +|-------|-------------|---------| +| `filename:` | Search in file names | `filename:invoice` | +| `content:` | Search in document text | `content:"project status"` | +| `label:` | Search by labels | `label:urgent` | +| `type:` | Search by file type | `type:pdf` | +| `source:` | Search by upload source | `source:webdav` | +| `size:` | Search by file size | `size:>10MB` | +| `date:` | Search by date | `date:2024-01-01` | + +#### Field Search Examples + +``` +filename:contract AND date:2024 # Contracts from 2024 +label:"high priority" OR label:urgent # Priority documents +type:pdf AND content:budget # PDF files containing "budget" +source:webdav AND label:approved # Approved docs from WebDAV +``` + +### Range Queries + +#### Date Ranges +``` +date:2024-01-01..2024-03-31 # Q1 2024 documents +date:>2024-01-01 # After January 1, 2024 +date:<2024-12-31 # Before December 31, 2024 +``` + +#### Size Ranges +``` +size:1MB..10MB # Between 1MB and 10MB +size:>50MB # Larger than 50MB +size:<1KB # Smaller than 1KB +``` + +### Wildcard Search + +Use wildcards for partial matching: + +``` +proj* # Matches "project", "projects", "projection" +*report # Matches "annual report", "status report" +doc?ment # Matches "document", "documents" (? = single character) +``` + +### Exclusion Operators + +Exclude unwanted results: + +``` +invoice -draft # Invoices but not drafts +budget NOT personal # Budget documents excluding personal +-label:archive proposal # Proposals not in archive +``` + +## Advanced Filtering + +### File Type Filters + +Filter by specific file formats: + +**Common File Types**: +- **Documents**: PDF, DOC, DOCX, TXT, RTF +- **Images**: PNG, JPG, JPEG, TIFF, BMP, GIF +- **Spreadsheets**: XLS, XLSX, CSV +- **Presentations**: PPT, PPTX + +**Filter Interface**: +1. **Checkbox Filters**: Select multiple file types +2. **MIME Type Groups**: Filter by general categories +3. **Custom Extensions**: Add specific file extensions + +**Search Syntax**: +``` +type:pdf # Only PDF files +type:(pdf OR doc) # PDF or Word documents +-type:image # Exclude all images +``` + +### Date and Time Filters + +**Predefined Ranges**: +- Today, Yesterday, This Week, Last Week +- This Month, Last Month, This Quarter, Last Quarter +- This Year, Last Year + +**Custom Date Ranges**: +- **Start Date**: Documents uploaded after specific date +- **End Date**: Documents uploaded before specific date +- **Date Range**: Documents within specific period + +**Advanced Date Syntax**: +``` +created:today # Documents uploaded today +modified:>2024-01-01 # Modified after January 1st +accessed:last-week # Accessed in the last week +``` + +### Size Filters + +**Size Categories**: +- **Small**: < 1MB +- **Medium**: 1MB - 10MB +- **Large**: 10MB - 50MB +- **Very Large**: > 50MB + +**Custom Size Ranges**: +``` +size:>10MB # Larger than 10MB +size:1MB..5MB # Between 1MB and 5MB +size:<100KB # Smaller than 100KB +``` + +### Label Filters + +**Label Selection**: +- **Multiple Labels**: Select multiple labels with AND/OR logic +- **Label Hierarchy**: Navigate nested label structures +- **Label Suggestions**: Auto-complete based on existing labels + +**Label Search Syntax**: +``` +label:project # Documents with "project" label +label:"high priority" # Multi-word labels in quotes +label:(urgent OR critical) # Documents with either label +-label:archive # Exclude archived documents +``` + +### Source Filters + +Filter by document source or origin: + +**Source Types**: +- **Manual Upload**: Documents uploaded directly +- **WebDAV Sync**: Documents from WebDAV sources +- **Local Folder**: Documents from watched folders +- **S3 Sync**: Documents from S3 buckets + +**Source-Specific Filters**: +``` +source:webdav # WebDAV synchronized documents +source:manual # Manually uploaded documents +source:"My Nextcloud" # Specific named source +``` + +### OCR Status Filters + +Filter by OCR processing status: + +**Status Options**: +- **Completed**: OCR successfully completed +- **Pending**: Waiting for OCR processing +- **Failed**: OCR processing failed +- **Not Applicable**: Text documents that don't need OCR + +**OCR Quality Filters**: +- **High Confidence**: OCR confidence > 90% +- **Medium Confidence**: OCR confidence 70-90% +- **Low Confidence**: OCR confidence < 70% + +## Search Interface + +### Global Search Bar + +**Location**: Available in the header on all pages +**Features**: +- **Real-time suggestions**: Shows results as you type +- **Quick results**: Top 5 matches with snippets +- **Fast navigation**: Direct access to documents +- **Search history**: Recent searches for quick access + +**Usage**: +1. Click on the search bar in the header +2. Start typing your query +3. View instant suggestions and results +4. Click a result to navigate directly to the document + +### Advanced Search Page + +**Location**: Dedicated search page with full interface +**Features**: +- **Multiple search modes**: Toggle between search types +- **Filter sidebar**: All filtering options in one place +- **Result options**: Sorting, pagination, view modes +- **Export capabilities**: Export search results + +**Interface Sections**: + +#### Search Input Area +- **Query builder**: Visual query construction +- **Mode selector**: Choose search type (simple, phrase, fuzzy, boolean) +- **Suggestions**: Auto-complete and query recommendations + +#### Filter Sidebar +- **File type filters**: Checkboxes for different formats +- **Date range picker**: Calendar interface for date selection +- **Size sliders**: Visual size range selection +- **Label selector**: Hierarchical label browser +- **Source filters**: Filter by upload source + +#### Results Area +- **Sort options**: Relevance, date, filename, size +- **View modes**: List view, grid view, detail view +- **Pagination**: Navigate through result pages +- **Export options**: CSV, JSON export of results + +### Search Results + +#### Result Display Elements + +**Document Cards**: +- **Filename**: Primary document identifier +- **Snippet**: Highlighted text excerpt showing search matches +- **Metadata**: File size, type, upload date, labels +- **Relevance Score**: Numerical relevance ranking +- **Quick Actions**: Download, view, edit labels + +**Highlighting**: +- **Search terms**: Highlighted in yellow +- **Context**: Surrounding text for context +- **Multiple matches**: All instances highlighted +- **Snippet length**: Configurable in user settings + +#### Result Sorting + +**Sort Options**: +- **Relevance**: Best matches first (default) +- **Date**: Newest or oldest first +- **Filename**: Alphabetical order +- **Size**: Largest or smallest first +- **Score**: Highest search score first + +**Secondary Sorting**: +- Apply secondary criteria when primary sort values are equal +- Example: Sort by relevance, then by date + +### Search Configuration + +#### User Preferences + +**Search Settings** (accessible via Settings → Search): +- **Results per page**: 10, 25, 50, 100 +- **Snippet length**: 100, 200, 300, 500 characters +- **Fuzzy threshold**: Sensitivity for approximate matching +- **Default sort**: Preferred default sorting option +- **Search history**: Enable/disable query history + +#### Search Behavior +- **Auto-complete**: Enable search suggestions +- **Real-time search**: Search as you type +- **Search highlighting**: Highlight search terms in results +- **Context snippets**: Show surrounding text in results + +## Search Optimization + +### Query Optimization + +#### Best Practices + +1. **Use Specific Terms**: More specific queries yield better results + ``` + Good: "quarterly sales report Q1" + Poor: "document" + ``` + +2. **Combine Search Modes**: Use appropriate mode for your needs + ``` + Exact phrases: "status update" + Flexible terms: project~ + Complex logic: (budget OR financial) AND 2024 + ``` + +3. **Leverage Filters**: Combine text search with filters + ``` + Query: budget + Filters: Type = PDF, Date = This Quarter, Label = Finance + ``` + +4. **Use Field Search**: Target specific document aspects + ``` + filename:invoice date:2024 + content:"project milestone" label:important + ``` + +### Performance Tips + +#### Efficient Searching + +1. **Start Broad, Then Narrow**: Begin with general terms, then add filters +2. **Use Filters Early**: Apply filters before complex text queries +3. **Avoid Wildcards at Start**: `*report` is slower than `report*` +4. **Combine Short Queries**: Use multiple short terms rather than long phrases + +#### Search Index Optimization + +The search system automatically optimizes for: +- **Frequent Terms**: Common words are indexed for fast retrieval +- **Document Updates**: New documents are indexed immediately +- **Language Support**: Multi-language stemming and analysis +- **Cache Management**: Frequent searches are cached + +### OCR Search Optimization + +#### Handling OCR Text + +OCR-extracted text may contain errors that affect search: + +**Strategies**: +1. **Use Fuzzy Search**: Handle OCR errors with approximate matching +2. **Try Variations**: Search for common OCR mistakes +3. **Use Context**: Include surrounding words for better matches +4. **Check Original**: Compare with original document when possible + +**Common OCR Issues**: +- **Character confusion**: "m" vs "rn", "cl" vs "d" +- **Word boundaries**: "some thing" vs "something" +- **Special characters**: Missing or incorrect punctuation + +**Optimization Examples**: +``` +# Original: "invoice" +# OCR might produce: "irwoice", "invoce", "mvoice" +# Solution: Use fuzzy search +invoice~ + +# Or search for context +"invoice number" OR "irwoice number" OR "invoce number" +``` + +## Saved Searches + +### Creating Saved Searches + +1. **Build Your Query**: Create a search with desired parameters +2. **Test Results**: Verify the search returns expected documents +3. **Save Search**: Click "Save Search" button +4. **Name Search**: Provide descriptive name +5. **Configure Options**: Set update frequency and notifications + +### Managing Saved Searches + +**Saved Search Features**: +- **Quick Access**: Available in sidebar or dashboard +- **Automatic Updates**: Results update as new documents are added +- **Shared Access**: Share searches with other users (future feature) +- **Export Options**: Export results automatically + +**Search Organization**: +- **Categories**: Group related searches +- **Favorites**: Mark frequently used searches +- **Recent**: Quick access to recently used searches + +### Smart Collections + +Saved searches that automatically include new documents: + +**Examples**: +- **"This Month's Reports"**: `type:pdf AND content:report AND date:this-month` +- **"Pending Review"**: `label:"needs review" AND -label:completed` +- **"High Priority Items"**: `label:(urgent OR critical OR "high priority")` + +## Search Analytics + +### Search Performance Metrics + +**Available Metrics**: +- **Query Performance**: Average search response times +- **Popular Searches**: Most frequently used search terms +- **Result Quality**: Click-through rates and user engagement +- **Search Patterns**: Common search behaviors and trends + +### User Search History + +**History Features**: +- **Recent Searches**: Quick access to previous queries +- **Search Suggestions**: Based on search history +- **Query Refinement**: Improve searches based on past patterns +- **Export History**: Download search history for analysis + +## API Search + +### Basic Search API + +```bash +GET /api/search?query=invoice&limit=20 +Authorization: Bearer +``` + +**Query Parameters**: +- `query`: Search query string +- `limit`: Number of results (default: 50, max: 100) +- `offset`: Pagination offset +- `sort`: Sort order (relevance, date, filename, size) + +### Advanced Search API + +```bash +POST /api/search/advanced +Authorization: Bearer +Content-Type: application/json + +{ + "query": "budget report", + "mode": "phrase", + "filters": { + "file_types": ["pdf", "docx"], + "labels": ["Q1 2024", "Finance"], + "date_range": { + "start": "2024-01-01", + "end": "2024-03-31" + }, + "size_range": { + "min": 1048576, + "max": 52428800 + } + }, + "options": { + "fuzzy_threshold": 0.8, + "snippet_length": 200, + "highlight": true + } +} +``` + +### Search Response Format + +```json +{ + "results": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "filename": "Q1_Budget_Report.pdf", + "snippet": "The quarterly budget report shows a 10% increase in revenue...", + "score": 0.95, + "highlights": ["budget", "report"], + "metadata": { + "size": 2048576, + "type": "application/pdf", + "uploaded_at": "2024-01-15T10:30:00Z", + "labels": ["Q1 2024", "Finance", "Budget"], + "source": "WebDAV Sync" + } + } + ], + "total": 42, + "limit": 20, + "offset": 0, + "query_time": 0.085 +} +``` + +## Troubleshooting + +### Common Search Issues + +#### No Results Found + +**Possible Causes**: +1. **Typos**: Check spelling in search query +2. **Too Specific**: Query might be too restrictive +3. **Wrong Mode**: Using exact search when fuzzy would be better +4. **Filters**: Remove filters to check if they're excluding results + +**Solutions**: +1. **Simplify Query**: Start with broader terms +2. **Check Spelling**: Use fuzzy search for typo tolerance +3. **Remove Filters**: Test without date, type, or label filters +4. **Try Synonyms**: Use alternative terms for the same concept + +#### Irrelevant Results + +**Possible Causes**: +1. **Too Broad**: Query matches too many unrelated documents +2. **Common Terms**: Using very common words that appear everywhere +3. **Wrong Mode**: Using fuzzy when exact match is needed + +**Solutions**: +1. **Add Specificity**: Include more specific terms or context +2. **Use Filters**: Add file type, date, or label filters +3. **Phrase Search**: Use quotes for exact phrases +4. **Boolean Logic**: Use AND/OR/NOT for better control + +#### Slow Search Performance + +**Possible Causes**: +1. **Complex Queries**: Very complex boolean queries +2. **Large Result Sets**: Queries matching many documents +3. **Wildcard Overuse**: Starting queries with wildcards + +**Solutions**: +1. **Simplify Queries**: Break complex queries into simpler ones +2. **Add Filters**: Use filters to reduce result set size +3. **Avoid Leading Wildcards**: Use `term*` instead of `*term` +4. **Use Pagination**: Request smaller result sets + +### OCR Search Issues + +#### OCR Text Not Searchable + +**Symptoms**: Can't find text that's visible in document images +**Solutions**: +1. **Check OCR Status**: Verify OCR processing completed +2. **Retry OCR**: Manually retry OCR processing +3. **Use Fuzzy Search**: OCR might have character recognition errors +4. **Check Language Settings**: Ensure correct OCR language is configured + +#### Poor OCR Search Quality + +**Symptoms**: Fuzzy search required for most queries on scanned documents +**Solutions**: +1. **Improve Source Quality**: Use higher resolution scans (300+ DPI) +2. **OCR Language**: Verify correct language setting for documents +3. **Image Enhancement**: Enable OCR preprocessing options +4. **Manual Correction**: Consider manual text correction for important documents + +### Search Configuration Issues + +#### Settings Not Applied + +**Symptoms**: Search settings changes don't take effect +**Solutions**: +1. **Reload Page**: Refresh browser to apply settings +2. **Clear Cache**: Clear browser cache and cookies +3. **Check Permissions**: Ensure user has permission to modify settings +4. **Database Issues**: Check if settings are being saved to database + +#### Filter Problems + +**Symptoms**: Filters not working as expected +**Solutions**: +1. **Clear All Filters**: Reset filters and apply one at a time +2. **Check Filter Logic**: Ensure AND/OR logic is correct +3. **Label Validation**: Verify labels exist and are spelled correctly +4. **Date Format**: Ensure dates are in correct format + +## Next Steps + +- Explore [labels and organization](labels-and-organization.md) for better search categorization +- Set up [sources](sources-guide.md) for automatic content ingestion +- Review [user guide](user-guide.md) for general search tips +- Check [API reference](api-reference.md) for programmatic search integration +- Configure [OCR optimization](dev/OCR_OPTIMIZATION_GUIDE.md) for better text extraction \ No newline at end of file diff --git a/docs/labels-and-organization.md b/docs/labels-and-organization.md new file mode 100644 index 0000000..8c1003f --- /dev/null +++ b/docs/labels-and-organization.md @@ -0,0 +1,501 @@ +# Labels and Organization Guide + +Readur's labeling system provides powerful document organization and categorization capabilities. This guide covers creating, managing, and using labels to organize your document collection effectively. + +## Table of Contents + +- [Overview](#overview) +- [Label Types](#label-types) +- [Creating and Managing Labels](#creating-and-managing-labels) +- [Assigning Labels to Documents](#assigning-labels-to-documents) +- [Label-Based Search and Filtering](#label-based-search-and-filtering) +- [Label Organization Strategies](#label-organization-strategies) +- [Advanced Label Features](#advanced-label-features) +- [Best Practices](#best-practices) +- [API Integration](#api-integration) + +## Overview + +Labels in Readur provide a flexible tagging system that allows you to: + +- **Categorize Documents**: Organize documents by type, project, department, or any custom criteria +- **Enhanced Search**: Filter search results by specific labels for precise document discovery +- **Visual Organization**: Color-coded labels provide instant visual categorization +- **Bulk Operations**: Apply or remove labels from multiple documents simultaneously +- **Project Management**: Track documents across projects, workflows, or time periods + +### Key Features + +- **Hierarchical Organization**: Create nested label structures for complex categorization +- **Color Coding**: Visual identification with customizable label colors +- **System Labels**: Automatic labels generated by Readur for administrative purposes +- **User Labels**: Custom labels created and managed by users +- **Smart Collections**: Save searches that automatically include documents with specific labels +- **Label Statistics**: Track document counts and usage analytics per label + +## Label Types + +### User Labels + +**Custom labels** created and managed by users for personal or organizational categorization. + +**Features:** +- **Full Control**: Create, edit, rename, and delete user-created labels +- **Color Customization**: Choose from a wide range of colors for visual organization +- **Flexible Naming**: Use any descriptive names that fit your workflow +- **Sharing**: Labels are visible to all users with access to labeled documents + +**Common Use Cases:** +- Project names (e.g., "Project Alpha", "Q1 Budget") +- Document types (e.g., "Invoices", "Contracts", "Reports") +- Departments (e.g., "HR", "Engineering", "Marketing") +- Priority levels (e.g., "Urgent", "Review Needed", "Archive") +- Status indicators (e.g., "Draft", "Final", "Approved") + +### System Labels + +**Automatic labels** generated by Readur based on document properties and processing status. + +**Examples:** +- **OCR Status**: "OCR Completed", "OCR Failed", "OCR Pending" +- **File Type**: "PDF", "Image", "Text Document" +- **Source Origin**: "WebDAV Upload", "Local Folder", "Manual Upload" +- **Processing Status**: "Recently Added", "High Confidence OCR", "Needs Review" +- **Size Categories**: "Large File", "Small File" +- **Date-based**: "This Week", "This Month", "This Year" + +**Characteristics:** +- **Read-only**: Cannot be edited or deleted by users +- **Automatic Assignment**: Applied automatically based on document properties +- **System Managed**: Updated automatically when document properties change +- **Consistent Formatting**: Standardized naming and color scheme + +## Creating and Managing Labels + +### Creating New Labels + +#### Via Label Management Page + +1. **Navigate to Labels**: Go to Settings → Labels +2. **Click "Create Label"** +3. **Configure Label Properties**: + ``` + Name: Project Documentation + Color: Blue (#2196F3) + Description: Documents related to current projects + ``` +4. **Save** to create the label + +#### During Document Upload + +1. **Upload Document(s)**: Use the upload interface +2. **Add Labels Field**: In the upload form +3. **Create New Label**: Type a new label name +4. **Assign Color**: Choose color for the new label +5. **Complete Upload**: Label is created and assigned automatically + +#### Quick Label Creation + +- **Search Interface**: Create labels while filtering search results +- **Document Details**: Add new labels directly from document pages +- **Bulk Operations**: Create labels during bulk document operations + +### Editing Labels + +#### Renaming Labels + +1. **Access Label Management**: Settings → Labels +2. **Find Target Label**: Use search or browse the label list +3. **Click "Edit"** or double-click the label name +4. **Modify Name**: Change to new descriptive name +5. **Save Changes**: Updates all documents using this label + +#### Changing Colors + +1. **Edit Label**: Follow renaming steps above +2. **Select New Color**: Choose from color palette or enter hex code +3. **Preview Changes**: See how the color looks in different contexts +4. **Apply**: Color updates immediately across all interfaces + +#### Merging Labels + +1. **Identify Similar Labels**: Find labels with overlapping purposes +2. **Select Target Label**: Choose the label to keep +3. **Merge Operation**: Use "Merge with..." option +4. **Confirm Merge**: All documents transfer to target label +5. **Source Label Deletion**: Original label is removed after merge + +### Deleting Labels + +#### Individual Label Deletion + +1. **Label Management Page**: Access via Settings → Labels +2. **Select Label**: Find the label to delete +3. **Delete Action**: Click delete button or menu option +4. **Confirm Deletion**: Confirm removal (this cannot be undone) +5. **Document Update**: Label is removed from all associated documents + +#### Bulk Label Cleanup + +- **Unused Labels**: Automatically identify and remove labels with no documents +- **Duplicate Labels**: Find and merge labels with similar names +- **Batch Deletion**: Select multiple labels for simultaneous removal + +## Assigning Labels to Documents + +### Single Document Labeling + +#### Document Details Page + +1. **Open Document**: Click on any document to view details +2. **Labels Section**: Find the labels area in document metadata +3. **Add Labels**: Click "+" or "Add Label" button +4. **Select or Create**: Choose existing labels or create new ones +5. **Apply Changes**: Labels are assigned immediately + +#### Quick Label Assignment + +- **Hover Actions**: Quick label buttons appear when hovering over documents +- **Right-Click Menu**: Context menu with common label operations +- **Keyboard Shortcuts**: Assign frequently used labels with key combinations + +### Bulk Label Operations + +#### Multi-Document Selection + +1. **Document Browser**: Navigate to documents page +2. **Select Documents**: Use checkboxes to select multiple documents +3. **Bulk Actions**: Click "Actions" or "Labels" in the toolbar +4. **Apply Labels**: Choose labels to add or remove +5. **Execute**: Apply changes to all selected documents + +#### Search-Based Labeling + +1. **Search for Documents**: Use search to find specific document sets +2. **Select All Results**: Choose all documents matching criteria +3. **Bulk Label Assignment**: Apply labels to entire result set +4. **Confirmation**: Review and confirm bulk changes + +### Label Assignment During Upload + +#### Upload Interface Labeling + +1. **File Selection**: Choose files to upload +2. **Label Assignment**: Add labels before starting upload +3. **Label Creation**: Create new labels during upload process +4. **Automatic Application**: Labels assigned to all uploaded files + +#### Drag and Drop Labeling + +- **Pre-configured Areas**: Drag files to labeled drop zones +- **Automatic Tagging**: Labels applied based on drop location +- **Batch Processing**: Assign labels to multiple files simultaneously + +## Label-Based Search and Filtering + +### Label Filters in Search + +#### Basic Label Filtering + +1. **Search Interface**: Access the main search page +2. **Label Filter Section**: Find label filters in the sidebar +3. **Select Labels**: Check boxes for desired labels +4. **Apply Filter**: Search results automatically update +5. **Multiple Labels**: Combine multiple labels with AND/OR logic + +#### Advanced Label Queries + +**Search Syntax Examples:** +``` +label:urgent # Documents with "urgent" label +label:"project alpha" # Documents with multi-word label +label:urgent AND label:review # Documents with both labels +label:draft OR label:final # Documents with either label +-label:archive # Exclude archived documents +``` + +### Smart Collections + +#### Creating Smart Collections + +1. **Build Search Query**: Create search with label filters +2. **Save Search**: Use "Save Search" option +3. **Name Collection**: Give descriptive name (e.g., "Active Projects") +4. **Automatic Updates**: Collection updates as documents are labeled +5. **Quick Access**: Access collections from sidebar or dashboard + +#### Collection Examples + +**Project-Based Collections:** +- "Q1 Budget Documents": `label:"Q1 budget" OR label:"financial planning"` +- "Marketing Materials": `label:marketing AND (label:final OR label:approved)` +- "Pending Review": `label:"needs review" AND -label:completed` + +**Status-Based Collections:** +- "Recent Uploads": `label:"this month" AND -label:processed` +- "High Priority": `label:urgent OR label:critical` +- "Archive Ready": `label:completed AND label:final` + +### Label-Based Dashboard Views + +#### Custom Dashboard Widgets + +- **Label Statistics**: Show document counts per label +- **Recent Activity**: Display recently labeled documents +- **Label Trends**: Track labeling patterns over time +- **Quick Access**: Direct links to frequently used label filters + +## Label Organization Strategies + +### Hierarchical Labeling + +#### Category-Based Organization + +**Structure Example:** +``` +Projects/ +├── Project Alpha/ +│ ├── Requirements +│ ├── Design +│ └── Implementation +├── Project Beta/ +│ ├── Research +│ ├── Proposals +│ └── Contracts +└── Infrastructure/ + ├── Servers + ├── Network + └── Security +``` + +#### Implementation Approach + +1. **Top-Level Categories**: Create broad organizational labels +2. **Subcategories**: Use descriptive naming for specific areas +3. **Consistent Naming**: Establish naming conventions across categories +4. **Cross-References**: Documents can belong to multiple hierarchies + +### Functional Organization + +#### Document Lifecycle Labels + +**Workflow Stages:** +- **Creation**: "Draft", "In Progress", "Under Review" +- **Approval**: "Pending Approval", "Approved", "Rejected" +- **Distribution**: "Published", "Distributed", "Archived" +- **Maintenance**: "Current", "Outdated", "Superseded" + +#### Department-Based Labeling + +**Organizational Structure:** +- **Human Resources**: "HR Policy", "Employee Records", "Benefits" +- **Finance**: "Invoices", "Budget", "Audit", "Tax Documents" +- **Legal**: "Contracts", "Compliance", "IP Documents" +- **Operations**: "Procedures", "Manuals", "Incident Reports" + +### Time-Based Organization + +#### Date-Driven Labels + +- **Fiscal Periods**: "Q1 2024", "FY2024", "H1 2024" +- **Project Phases**: "Phase 1", "Phase 2", "Final Phase" +- **Event-Based**: "Pre-Launch", "Launch", "Post-Launch" +- **Seasonal**: "Annual Review", "Budget Season", "Audit Period" + +## Advanced Label Features + +### Label Analytics + +#### Usage Statistics + +**Metrics Available:** +- **Document Count**: Number of documents per label +- **Recent Activity**: Labels used in recent uploads or assignments +- **Growth Trends**: How label usage changes over time +- **Popular Labels**: Most frequently used labels +- **Unused Labels**: Labels with no current document assignments + +#### Label Performance + +- **Search Frequency**: How often labels are used in searches +- **Click-Through Rates**: User engagement with labeled content +- **Organization Effectiveness**: How labels improve document discovery + +### Label Automation + +#### Auto-Labeling Rules + +**OCR-Based Labeling:** +- **Content Detection**: Automatically label documents based on detected text +- **Template Recognition**: Recognize document types and apply appropriate labels +- **Entity Extraction**: Label documents based on detected entities (names, dates, amounts) + +**Source-Based Labeling:** +- **Upload Location**: Apply labels based on upload source or folder +- **File Type**: Automatic labels based on file format and structure +- **Metadata**: Labels derived from file properties and EXIF data + +#### Workflow Integration + +- **Process Triggers**: Apply labels based on workflow stage completion +- **Approval Status**: Automatic labeling based on approval workflows +- **Time-Based Rules**: Apply labels based on document age or schedule + +### Label Import/Export + +#### Bulk Label Operations + +**Import Scenarios:** +- **Migration**: Import existing label structures from other systems +- **Template Application**: Apply predefined label sets to document collections +- **Organizational Standards**: Implement company-wide labeling standards + +**Export Capabilities:** +- **Backup**: Export label definitions for backup purposes +- **Reporting**: Generate reports of label usage and document organization +- **Integration**: Share label structures with other systems + +## Best Practices + +### Label Design + +#### Naming Conventions + +1. **Descriptive Names**: Use clear, self-explanatory label names +2. **Consistent Format**: Establish and follow naming patterns +3. **Avoid Ambiguity**: Choose names that won't be confused with similar concepts +4. **Length Consideration**: Keep names concise but informative +5. **Special Characters**: Avoid special characters that may cause issues + +**Good Examples:** +- "Q1-2024-Budget" ✅ +- "Legal-Contract-Template" ✅ +- "Marketing-Campaign-Assets" ✅ + +**Poor Examples:** +- "Stuff" ❌ (too vague) +- "Q1 Budget Documents for 2024 Financial Planning" ❌ (too long) +- "Legal/Contract#Template@2024" ❌ (special characters) + +#### Color Strategy + +1. **Consistent Color Families**: Use similar colors for related label categories +2. **High Contrast**: Ensure labels are readable against various backgrounds +3. **Color Meaning**: Establish color conventions (e.g., red for urgent, green for completed) +4. **Accessibility**: Consider color-blind users when choosing colors +5. **Limited Palette**: Don't use too many different colors + +### Organization Strategy + +#### Start Simple + +1. **Basic Categories**: Begin with broad, obvious categories +2. **Organic Growth**: Add labels as needs become apparent +3. **User Feedback**: Incorporate user suggestions for new labels +4. **Regular Review**: Periodically assess and refine label structure + +#### Maintain Consistency + +1. **Documentation**: Document labeling standards and conventions +2. **Training**: Educate users on proper labeling practices +3. **Regular Cleanup**: Remove unused or redundant labels +4. **Standardization**: Ensure consistent application across teams + +### Performance Optimization + +#### Label Management + +1. **Avoid Over-Labeling**: Don't create too many similar labels +2. **Regular Cleanup**: Remove unused labels to reduce clutter +3. **Search Optimization**: Focus on labels that improve searchability +4. **User Training**: Educate users on effective labeling practices + +#### System Performance + +- **Index Optimization**: Labels are indexed for fast search performance +- **Bulk Operations**: Use bulk assignment for better efficiency +- **Caching**: Frequently used labels are cached for quick access + +## API Integration + +### Label Management API + +#### Creating Labels + +```bash +POST /api/labels +Authorization: Bearer +Content-Type: application/json + +{ + "name": "Project Documentation", + "color": "#2196F3" +} +``` + +#### Listing Labels + +```bash +GET /api/labels +Authorization: Bearer +``` + +Response: +```json +{ + "labels": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "Project Documentation", + "color": "#2196F3", + "document_count": 42, + "created_at": "2024-01-01T00:00:00Z" + } + ] +} +``` + +#### Assigning Labels to Documents + +```bash +PATCH /api/documents/{document_id} +Authorization: Bearer +Content-Type: application/json + +{ + "labels": ["Project Documentation", "Q1 2024", "High Priority"] +} +``` + +### Search Integration + +#### Label-Based Search + +```bash +GET /api/search?query=invoice&labels=urgent,review +Authorization: Bearer +``` + +#### Advanced Label Queries + +```bash +POST /api/search/advanced +Authorization: Bearer +Content-Type: application/json + +{ + "query": "budget", + "filters": { + "labels": ["Q1 2024", "Finance"], + "label_logic": "AND" + } +} +``` + +## Next Steps + +- Configure [advanced search](advanced-search.md) with label-based filtering +- Set up [sources](sources-guide.md) with automatic labeling rules +- Explore [user management](user-management-guide.md) for collaborative labeling +- Review [API reference](api-reference.md) for programmatic label management +- Check [best practices](user-guide.md#tips-for-best-results) for document organization \ No newline at end of file diff --git a/docs/sources-guide.md b/docs/sources-guide.md new file mode 100644 index 0000000..4cf551d --- /dev/null +++ b/docs/sources-guide.md @@ -0,0 +1,498 @@ +# Sources Guide + +Readur's Sources feature provides powerful automated document ingestion from multiple external storage systems. This comprehensive guide covers all supported source types and their configuration. + +## Table of Contents + +- [Overview](#overview) +- [Source Types](#source-types) + - [WebDAV Sources](#webdav-sources) + - [Local Folder Sources](#local-folder-sources) + - [S3 Sources](#s3-sources) +- [Getting Started](#getting-started) +- [Configuration](#configuration) +- [Sync Operations](#sync-operations) +- [Health Monitoring](#health-monitoring) +- [Troubleshooting](#troubleshooting) +- [Best Practices](#best-practices) + +## Overview + +Sources allow Readur to automatically discover, download, and process documents from external storage systems. Key features include: + +- **Multi-Protocol Support**: WebDAV, Local Folders, and S3-compatible storage +- **Automated Syncing**: Scheduled synchronization with configurable intervals +- **Health Monitoring**: Proactive monitoring and validation of source connections +- **Intelligent Processing**: Duplicate detection, incremental syncs, and OCR integration +- **Real-time Status**: Live sync progress and comprehensive statistics + +### How Sources Work + +1. **Configuration**: Set up a source with connection details and preferences +2. **Discovery**: Readur scans the source for supported file types +3. **Synchronization**: New and changed files are downloaded and processed +4. **OCR Processing**: Documents are automatically queued for text extraction +5. **Search Integration**: Processed documents become searchable in your collection + +## Source Types + +### WebDAV Sources + +WebDAV sources connect to cloud storage services and self-hosted servers that support the WebDAV protocol. + +#### Supported WebDAV Servers + +| Server Type | Status | Notes | +|-------------|--------|-------| +| **Nextcloud** | ✅ Fully Supported | Optimized discovery and authentication | +| **ownCloud** | ✅ Fully Supported | Native integration with server detection | +| **Apache WebDAV** | ✅ Supported | Generic WebDAV implementation | +| **nginx WebDAV** | ✅ Supported | Works with nginx dav module | +| **Box.com** | ⚠️ Limited | Basic WebDAV support | +| **Other WebDAV** | ✅ Supported | Generic WebDAV protocol compliance | + +#### WebDAV Configuration + +**Required Fields:** +- **Name**: Descriptive name for the source +- **Server URL**: Full WebDAV server URL (e.g., `https://cloud.example.com/remote.php/dav/files/username/`) +- **Username**: WebDAV authentication username +- **Password**: WebDAV authentication password or app password + +**Optional Configuration:** +- **Watch Folders**: Specific directories to monitor (leave empty to sync entire accessible space) +- **File Extensions**: Limit to specific file types (default: all supported types) +- **Auto Sync**: Enable automatic scheduled synchronization +- **Sync Interval**: How often to check for changes (15 minutes to 24 hours) +- **Server Type**: Specify server type for optimizations (auto-detected) + +#### Setting Up WebDAV Sources + +1. **Navigate to Sources**: Go to Settings → Sources in the Readur interface +2. **Add New Source**: Click "Add Source" and select "WebDAV" +3. **Configure Connection**: + ``` + Name: My Nextcloud Documents + Server URL: https://cloud.mycompany.com/remote.php/dav/files/john/ + Username: john + Password: app-password-here + ``` +4. **Test Connection**: Use the "Test Connection" button to verify credentials +5. **Configure Folders**: Specify directories to monitor: + ``` + Watch Folders: + - Documents/ + - Projects/2024/ + - Invoices/ + ``` +6. **Set Sync Schedule**: Choose automatic sync interval (recommended: 30 minutes) +7. **Save and Sync**: Save configuration and trigger initial sync + +#### WebDAV Best Practices + +- **Use App Passwords**: Create dedicated app passwords instead of using main account passwords +- **Limit Scope**: Specify watch folders to avoid syncing unnecessary files +- **Server Optimization**: Let Readur auto-detect server type for optimal performance +- **Network Considerations**: Use longer sync intervals for slow connections + +### Local Folder Sources + +Local folder sources monitor directories on the Readur server's filesystem, including mounted network drives. + +#### Use Cases + +- **Watch Folders**: Monitor directories where documents are dropped +- **Network Mounts**: Sync from NFS, SMB/CIFS, or other mounted filesystems +- **Batch Processing**: Automatically process documents placed in specific folders +- **Archive Integration**: Monitor existing document archives + +#### Local Folder Configuration + +**Required Fields:** +- **Name**: Descriptive name for the source +- **Watch Folders**: Absolute paths to monitor directories + +**Optional Configuration:** +- **File Extensions**: Filter by specific file types +- **Auto Sync**: Enable scheduled monitoring +- **Sync Interval**: Frequency of directory scans +- **Recursive**: Include subdirectories in scans +- **Follow Symlinks**: Follow symbolic links (use with caution) + +#### Setting Up Local Folder Sources + +1. **Prepare Directory**: Ensure the directory exists and is accessible + ```bash + # Create watch folder + mkdir -p /mnt/documents/inbox + + # Set permissions (if needed) + chmod 755 /mnt/documents/inbox + ``` + +2. **Configure Source**: + ``` + Name: Document Inbox + Watch Folders: /mnt/documents/inbox + File Extensions: pdf,jpg,png,txt,docx + Auto Sync: Enabled + Sync Interval: 5 minutes + Recursive: Yes + ``` + +3. **Test Setup**: Place a test document in the folder and verify detection + +#### Network Mount Examples + +**NFS Mount:** +```bash +# Mount NFS share +sudo mount -t nfs 192.168.1.100:/documents /mnt/nfs-docs + +# Configure in Readur +Watch Folders: /mnt/nfs-docs/inbox +``` + +**SMB/CIFS Mount:** +```bash +# Mount SMB share +sudo mount -t cifs //server/documents /mnt/smb-docs -o username=user + +# Configure in Readur +Watch Folders: /mnt/smb-docs/processing +``` + +### S3 Sources + +S3 sources connect to Amazon S3 or S3-compatible storage services for document synchronization. + +#### Supported S3 Services + +| Service | Status | Configuration | +|---------|--------|---------------| +| **Amazon S3** | ✅ Fully Supported | Standard AWS configuration | +| **MinIO** | ✅ Fully Supported | Custom endpoint URL | +| **DigitalOcean Spaces** | ✅ Supported | S3-compatible API | +| **Wasabi** | ✅ Supported | Custom endpoint configuration | +| **Google Cloud Storage** | ⚠️ Limited | S3-compatible mode only | + +#### S3 Configuration + +**Required Fields:** +- **Name**: Descriptive name for the source +- **Bucket Name**: S3 bucket to monitor +- **Region**: AWS region (e.g., `us-east-1`) +- **Access Key ID**: AWS/S3 access key +- **Secret Access Key**: AWS/S3 secret key + +**Optional Configuration:** +- **Endpoint URL**: Custom endpoint for S3-compatible services +- **Prefix**: Bucket path prefix to limit scope +- **Watch Folders**: Specific S3 "directories" to monitor +- **File Extensions**: Filter by file types +- **Auto Sync**: Enable scheduled synchronization +- **Sync Interval**: Frequency of bucket scans + +#### Setting Up S3 Sources + +1. **Prepare S3 Bucket**: Ensure bucket exists and credentials have access +2. **Configure Source**: + ``` + Name: Company Documents S3 + Bucket Name: company-documents + Region: us-west-2 + Access Key ID: AKIAIOSFODNN7EXAMPLE + Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY + Prefix: documents/ + Watch Folders: + - invoices/ + - contracts/ + - reports/ + ``` + +3. **Test Connection**: Verify credentials and bucket access + +#### S3-Compatible Services + +**MinIO Configuration:** +``` +Endpoint URL: https://minio.example.com:9000 +Bucket Name: documents +Region: us-east-1 (can be any value for MinIO) +``` + +**DigitalOcean Spaces:** +``` +Endpoint URL: https://nyc3.digitaloceanspaces.com +Bucket Name: my-documents +Region: nyc3 +``` + +## Getting Started + +### Adding Your First Source + +1. **Access Sources Management**: Navigate to Settings → Sources +2. **Choose Source Type**: Select WebDAV, Local Folder, or S3 based on your needs +3. **Configure Connection**: Enter required credentials and connection details +4. **Test Connection**: Verify connectivity before saving +5. **Configure Sync**: Set up folders to monitor and sync schedule +6. **Initial Sync**: Trigger first synchronization to import existing documents + +### Quick Setup Examples + +#### Nextcloud WebDAV +``` +Name: Nextcloud Documents +Server URL: https://cloud.company.com/remote.php/dav/files/username/ +Username: username +Password: app-password +Watch Folders: Documents/, Shared/ +Auto Sync: Every 30 minutes +``` + +#### Local Network Drive +``` +Name: Network Archive +Watch Folders: /mnt/network/documents +File Extensions: pdf,doc,docx,txt +Recursive: Yes +Auto Sync: Every 15 minutes +``` + +#### AWS S3 Bucket +``` +Name: AWS Document Bucket +Bucket: company-docs-bucket +Region: us-east-1 +Access Key: [AWS Access Key] +Secret Key: [AWS Secret Key] +Prefix: active-documents/ +Auto Sync: Every 1 hour +``` + +## Configuration + +### Sync Settings + +**Sync Intervals:** +- **Real-time**: Immediate processing (local folders only) +- **5-15 minutes**: High-frequency monitoring +- **30-60 minutes**: Standard monitoring (recommended) +- **2-24 hours**: Low-frequency, large dataset sync + +**File Filtering:** +- **File Extensions**: `pdf,jpg,jpeg,png,txt,doc,docx,rtf` +- **Size Limits**: Configurable maximum file size (default: 50MB) +- **Path Exclusions**: Skip specific directories or file patterns + +### Advanced Configuration + +**Concurrency Settings:** +- **Concurrent Files**: Number of files processed simultaneously (default: 5) +- **Network Timeout**: Connection timeout for network sources +- **Retry Logic**: Automatic retry for failed downloads + +**Deduplication:** +- **Hash-based**: SHA-256 content hashing prevents duplicate storage +- **Cross-source**: Duplicates detected across all sources +- **Metadata Preservation**: Tracks file origins while avoiding storage duplication + +## Sync Operations + +### Manual Sync + +**Trigger Immediate Sync:** +1. Navigate to Sources page +2. Find the source to sync +3. Click the "Sync Now" button +4. Monitor progress in real-time + +**Deep Scan:** +- Forces complete re-scan of entire source +- Useful for detecting changes in large directories +- Automatically triggered periodically + +### Sync Status + +**Status Indicators:** +- 🟢 **Idle**: Source ready, no sync in progress +- 🟡 **Syncing**: Active synchronization in progress +- 🔴 **Error**: Sync failed, requires attention +- ⚪ **Disabled**: Source disabled, no automatic sync + +**Progress Information:** +- Files discovered vs. processed +- Current operation (scanning, downloading, processing) +- Estimated completion time +- Transfer speeds and statistics + +### Stopping Sync + +**Graceful Cancellation:** +1. Click "Stop Sync" button during active sync +2. Current file processing completes +3. Sync stops cleanly without corruption +4. Partial progress is saved + +## Health Monitoring + +### Health Scores + +Sources are continuously monitored and assigned health scores (0-100): + +- **90-100**: ✅ Excellent - No issues detected +- **75-89**: ⚠️ Good - Minor issues or warnings +- **50-74**: ⚠️ Fair - Moderate issues requiring attention +- **25-49**: ❌ Poor - Significant problems +- **0-24**: ❌ Critical - Severe issues, manual intervention required + +### Health Checks + +**Automatic Validation** (every 30 minutes): +- Connection testing +- Credential verification +- Configuration validation +- Sync pattern analysis +- Error rate monitoring + +**Common Health Issues:** +- Authentication failures +- Network connectivity problems +- Permission or access issues +- Configuration errors +- Rate limiting or throttling + +### Health Notifications + +**Alert Types:** +- Connection failures +- Authentication expires +- Sync errors +- Performance degradation +- Configuration warnings + +## Troubleshooting + +### Common Issues + +#### WebDAV Connection Problems + +**Symptom**: "Connection failed" or authentication errors +**Solutions**: +1. Verify server URL format: + - Nextcloud: `https://server.com/remote.php/dav/files/username/` + - ownCloud: `https://server.com/remote.php/dav/files/username/` + - Generic: `https://server.com/webdav/` + +2. Check credentials: + - Use app passwords instead of main passwords + - Verify username/password combination + - Test credentials in web browser or WebDAV client + +3. Network issues: + - Verify server is accessible from Readur + - Check firewall and SSL certificate issues + - Test with curl: `curl -u username:password https://server.com/webdav/` + +#### Local Folder Issues + +**Symptom**: "Permission denied" or "Directory not found" +**Solutions**: +1. Check directory permissions: + ```bash + ls -la /path/to/watch/folder + chmod 755 /path/to/watch/folder # If needed + ``` + +2. Verify path exists: + ```bash + stat /path/to/watch/folder + ``` + +3. For network mounts: + ```bash + mount | grep /path/to/mount # Verify mount + ls -la /path/to/mount # Test access + ``` + +#### S3 Access Problems + +**Symptom**: "Access denied" or "Bucket not found" +**Solutions**: +1. Verify credentials and permissions: + ```bash + aws s3 ls s3://bucket-name --profile your-profile + ``` + +2. Check bucket policy and IAM permissions +3. Verify region configuration matches bucket region +4. For S3-compatible services, ensure correct endpoint URL + +### Performance Issues + +#### Slow Sync Performance + +**Causes and Solutions**: +1. **Large file sizes**: Increase timeout values, consider file size limits +2. **Network latency**: Reduce concurrent connections, increase intervals +3. **Server throttling**: Implement longer delays between requests +4. **Large directories**: Use watch folders to limit scope + +#### High Resource Usage + +**Optimization Strategies**: +1. **Reduce concurrency**: Lower concurrent file processing +2. **Increase intervals**: Less frequent sync checks +3. **Filter files**: Limit to specific file types and sizes +4. **Stagger syncs**: Avoid multiple sources syncing simultaneously + +### Error Recovery + +**Automatic Recovery:** +- Failed files are automatically retried +- Temporary network issues are handled gracefully +- Sync resumes from last successful point + +**Manual Recovery:** +1. Check source health status +2. Review error logs in source details +3. Test connection manually +4. Trigger deep scan to reset sync state + +## Best Practices + +### Security + +1. **Use Dedicated Credentials**: Create app-specific passwords and access keys +2. **Limit Permissions**: Grant minimum required access to source accounts +3. **Regular Rotation**: Periodically update passwords and access keys +4. **Network Security**: Use HTTPS/TLS for all connections + +### Performance + +1. **Strategic Scheduling**: Stagger sync times for multiple sources +2. **Scope Limitation**: Use watch folders to limit sync scope +3. **File Filtering**: Exclude unnecessary file types and large files +4. **Monitor Resources**: Watch CPU, memory, and network usage + +### Organization + +1. **Descriptive Names**: Use clear, descriptive source names +2. **Consistent Structure**: Maintain consistent folder organization +3. **Documentation**: Document source purposes and configurations +4. **Regular Maintenance**: Periodically review and clean up sources + +### Reliability + +1. **Health Monitoring**: Regularly check source health scores +2. **Backup Configuration**: Document source configurations +3. **Test Scenarios**: Periodically test sync and recovery procedures +4. **Monitor Logs**: Review sync logs for patterns or issues + +## Next Steps + +- Configure [notifications](notifications.md) for sync events +- Set up [advanced search](advanced-search.md) to find synced documents +- Review [OCR optimization](dev/OCR_OPTIMIZATION_GUIDE.md) for processing improvements +- Explore [labels and organization](labels-and-organization.md) for document management \ No newline at end of file diff --git a/docs/user-guide.md b/docs/user-guide.md index 598b3cb..a73928f 100644 --- a/docs/user-guide.md +++ b/docs/user-guide.md @@ -10,11 +10,12 @@ A comprehensive guide to using Readur's features for document management, OCR pr - [Dashboard](#dashboard) - [Document Management](#document-management) - [Advanced Search](#advanced-search) - - [Folder Watching](#folder-watching) + - [Sources and Synchronization](#sources-and-synchronization) - [Document Upload](#document-upload) - [OCR Processing](#ocr-processing) - [Search Features](#search-features) -- [Tags and Organization](#tags-and-organization) +- [Labels and Organization](#labels-and-organization) +- [User Management](#user-management) - [User Settings](#user-settings) - [Tips for Best Results](#tips-for-best-results) @@ -117,20 +118,30 @@ tag:important invoice # Search within tagged documents type:pdf contract # Search only PDFs ``` -### Folder Watching +### Sources and Synchronization -The folder watching feature automatically imports documents: +Readur's Sources feature provides automated document ingestion from multiple external storage systems: -1. **Non-destructive**: Source files remain untouched -2. **Automatic Processing**: New files are detected and processed -3. **Configurable Intervals**: Adjust scan frequency -4. **Multiple Sources**: Watch local folders, network drives, cloud storage +1. **Multi-Protocol Support**: WebDAV, Local Folders, and S3-compatible storage +2. **Non-destructive**: Source files remain untouched in their original locations +3. **Automated Syncing**: Scheduled synchronization with configurable intervals +4. **Health Monitoring**: Proactive monitoring and validation of source connections +5. **Intelligent Processing**: Duplicate detection, incremental syncs, and OCR integration -#### Setting Up Watch Folders -1. Go to Settings → Sources -2. Add a new source with type "Local Folder" -3. Configure the path and scan interval -4. Enable/disable the source as needed +#### Supported Source Types + +- **WebDAV Sources**: Nextcloud, ownCloud, generic WebDAV servers +- **Local Folder Sources**: Local filesystem directories and network mounts +- **S3 Sources**: Amazon S3 and S3-compatible storage (MinIO, DigitalOcean Spaces) + +#### Setting Up Sources +1. Navigate to Settings → Sources +2. Click "Add Source" and select source type +3. Configure connection details and credentials +4. Test connection and configure sync settings +5. Set up folders to monitor and sync schedule + +> 📖 **For comprehensive source configuration**, see the [Sources Guide](sources-guide.md) ## Document Upload @@ -171,43 +182,147 @@ The folder watching feature automatically imports documents: ## Search Features -### Quick Search +Readur provides powerful search capabilities with multiple modes and advanced filtering options. + +### Search Modes + +- **Simple Search**: General purpose searching with automatic stemming and fuzzy matching +- **Phrase Search**: Find exact phrases using quotes (e.g., `"quarterly report"`) +- **Fuzzy Search**: Handle typos and OCR errors with approximate matching (e.g., `invoice~`) +- **Boolean Search**: Complex queries with AND, OR, NOT operators + +### Search Interface + +#### Quick Search - Available in the header on all pages - Instant results as you type - Shows top 5 matches with snippets +- Real-time suggestions -### Advanced Search Page +#### Advanced Search Page - Full search interface with all filters +- Multiple search modes selector +- Comprehensive filtering options - Export search results - Save frequently used searches -- Search history +- Search history and analytics + +### Advanced Filtering + +- **File Types**: Filter by PDF, images, documents, etc. +- **Date Ranges**: Search within specific time periods +- **Labels**: Filter by document tags and categories +- **Sources**: Search within specific sync sources +- **File Size**: Filter by document size ranges +- **OCR Status**: Filter by text extraction status ### Search Tips -1. Use quotes for exact phrases -2. Combine filters for precise results -3. Use wildcards: `inv*` matches invoice, inventory -4. Search in specific fields: `filename:report` +1. Use quotes for exact phrases: `"project status"` +2. Combine text search with filters for precision +3. Use wildcards: `proj*` matches project, projects, projection +4. Search specific fields: `filename:report`, `label:urgent` +5. Use boolean logic: `(budget OR financial) AND 2024` -## Tags and Organization +> 🔍 **For detailed search techniques**, see the [Advanced Search Guide](advanced-search.md) -### Creating Tags -1. Select document(s) -2. Click "Add Tag" -3. Enter tag name or select existing -4. Tags are color-coded for easy identification +## Labels and Organization -### Tag Management -- Rename tags globally -- Merge similar tags -- Delete unused tags -- Set tag colors +Readur's labeling system provides comprehensive document organization and categorization capabilities. + +### Label Types + +- **User Labels**: Custom labels created and managed by users with full control +- **System Labels**: Automatic labels generated by Readur (OCR status, file type, etc.) +- **Color Coding**: Visual identification with customizable label colors +- **Hierarchical Structure**: Organize labels in categories and subcategories + +### Creating and Managing Labels + +#### Creating Labels +1. **Via Settings**: Go to Settings → Labels and click "Create Label" +2. **During Upload**: Add labels while uploading documents +3. **Document Details**: Add labels directly from document pages +4. **Bulk Operations**: Create and assign labels to multiple documents + +#### Label Operations +- **Rename**: Change label names (updates all documents) +- **Merge**: Combine similar labels into one +- **Color Management**: Customize label colors for visual organization +- **Bulk Assignment**: Apply labels to multiple documents at once + +### Organization Strategies + +#### Category-Based Organization +- **Projects**: "Project Alpha", "Q1 Budget", "Infrastructure" +- **Departments**: "HR", "Finance", "Legal", "Marketing" +- **Document Types**: "Invoices", "Contracts", "Reports", "Policies" +- **Status**: "Draft", "Final", "Approved", "Archived" + +#### Time-Based Organization +- **Fiscal Periods**: "Q1 2024", "FY2024", "Annual Review" +- **Project Phases**: "Planning", "Implementation", "Review" +- **Event-Based**: "Pre-Launch", "Launch", "Post-Launch" ### Smart Collections -Create saved searches based on: -- Tag combinations -- Date ranges -- File types -- Custom criteria +Create saved searches that automatically include documents with specific labels: +- **Active Projects**: Documents with current project labels +- **Pending Review**: Documents labeled for review +- **High Priority**: Documents with urgent or critical labels + +> 🏷️ **For comprehensive labeling strategies**, see the [Labels and Organization Guide](labels-and-organization.md) + +## User Management + +Readur provides comprehensive user management with support for both local authentication and enterprise SSO integration. + +### Authentication Methods + +#### Local Authentication +- **Traditional Login**: Username and password authentication +- **Secure Storage**: Passwords hashed with bcrypt for security +- **Self Registration**: Users can create their own accounts (if enabled) + +#### OIDC/SSO Authentication +- **Enterprise Integration**: Single Sign-On with corporate identity providers +- **Supported Providers**: Microsoft Azure AD, Google Workspace, Okta, Auth0, Keycloak +- **Automatic Provisioning**: User accounts created automatically on first login +- **Seamless Experience**: Users authenticate with existing corporate credentials + +### User Roles and Permissions + +#### User Role +Standard users with access to core document management functionality: +- Upload and manage documents +- Search and view documents +- Configure personal settings +- Create and manage labels +- Set up personal sources + +#### Admin Role +Administrators with full system access and user management capabilities: +- **User Management**: Create, modify, and delete user accounts +- **System Settings**: Configure global system parameters +- **Role Management**: Assign and modify user roles +- **System Monitoring**: View system health and performance metrics + +### Administrative Features + +Administrators can access user management via Settings → Users: +- **Create Users**: Add new user accounts with role assignment +- **Modify Users**: Update user information, roles, and passwords +- **User Overview**: View all users with creation dates and roles +- **Authentication Methods**: Manage both local and OIDC users +- **Bulk Operations**: Perform operations on multiple users + +### Mixed Authentication Environments + +Readur supports both local and OIDC users in the same installation: +- Local admin accounts for system management +- OIDC user accounts for regular enterprise users +- Flexible role assignment regardless of authentication method + +> 👥 **For detailed user administration**, see the [User Management Guide](user-management-guide.md) +> 🔐 **For OIDC configuration**, see the [OIDC Setup Guide](oidc-setup.md) ## User Settings @@ -276,7 +391,21 @@ Create saved searches based on: ## Next Steps -- Explore the [API Reference](api-reference.md) for automation -- Learn about [advanced configuration](configuration.md) -- Set up [automated workflows](WATCH_FOLDER.md) -- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md) \ No newline at end of file +### Explore Advanced Features +- [🔗 Sources Guide](sources-guide.md) - Set up WebDAV, Local Folder, and S3 synchronization +- [🔎 Advanced Search](advanced-search.md) - Master search modes, syntax, and optimization +- [🏷️ Labels & Organization](labels-and-organization.md) - Implement effective document organization +- [👥 User Management](user-management-guide.md) - Configure authentication and user administration +- [🔐 OIDC Setup](oidc-setup.md) - Integrate with enterprise identity providers + +### System Administration +- [📦 Installation Guide](installation.md) - Full installation and setup instructions +- [🔧 Configuration](configuration.md) - Environment variables and advanced configuration +- [🚀 Deployment Guide](deployment.md) - Production deployment with SSL and monitoring +- [📁 Watch Folder Guide](WATCH_FOLDER.md) - Legacy folder watching setup + +### Development and Integration +- [🔌 API Reference](api-reference.md) - REST API for automation and integration +- [🏗️ Developer Documentation](dev/) - Architecture and development setup +- [🔍 OCR Optimization](dev/OCR_OPTIMIZATION_GUIDE.md) - Improve OCR performance +- [📊 Queue Architecture](dev/QUEUE_IMPROVEMENTS.md) - Background processing optimization \ No newline at end of file diff --git a/docs/user-management-guide.md b/docs/user-management-guide.md new file mode 100644 index 0000000..214f508 --- /dev/null +++ b/docs/user-management-guide.md @@ -0,0 +1,440 @@ +# User Management Guide + +This comprehensive guide covers user administration, authentication, role-based access control, and user preferences in Readur. + +## Table of Contents + +- [Overview](#overview) +- [Authentication Methods](#authentication-methods) +- [User Roles and Permissions](#user-roles-and-permissions) +- [Admin User Management](#admin-user-management) +- [User Settings and Preferences](#user-settings-and-preferences) +- [OIDC/SSO Integration](#oidcsso-integration) +- [Security Best Practices](#security-best-practices) +- [Troubleshooting](#troubleshooting) + +## Overview + +Readur provides a comprehensive user management system with support for both local authentication and enterprise SSO integration. The system features: + +- **Dual Authentication**: Local accounts and OIDC/SSO support +- **Role-Based Access Control**: Admin and User roles with distinct permissions +- **User Preferences**: Extensive per-user configuration options +- **Enterprise Integration**: OIDC support for corporate identity providers +- **Security Features**: JWT tokens, bcrypt password hashing, and session management + +## Authentication Methods + +### Local Authentication + +Local authentication uses traditional username/password combinations stored securely in Readur's database. + +#### Features: +- **Secure Storage**: Passwords hashed with bcrypt (cost factor 12) +- **JWT Tokens**: 24-hour token validity with secure signing +- **User Registration**: Self-service account creation (if enabled) +- **Password Requirements**: Configurable complexity requirements + +#### Creating Local Users: +1. **Admin Creation** (via Settings): + - Navigate to Settings → Users (Admin only) + - Click "Add User" + - Enter username, email, and initial password + - Assign user role (Admin or User) + +2. **Self Registration** (if enabled): + - Visit the registration page + - Provide username, email, and password + - Account created with default User role + +### OIDC/SSO Authentication + +OIDC (OpenID Connect) authentication integrates with enterprise identity providers for single sign-on. + +#### Supported Features: +- **Standard OIDC Flow**: Authorization code flow with PKCE +- **Automatic Discovery**: Reads provider configuration from `.well-known/openid-configuration` +- **User Provisioning**: Automatic user creation on first login +- **Identity Linking**: Maps OIDC identities to local user accounts +- **Profile Sync**: Updates user information from OIDC provider + +#### Supported Providers: +- **Microsoft Azure AD**: Enterprise identity management +- **Google Workspace**: Google's enterprise SSO +- **Okta**: Popular enterprise identity provider +- **Auth0**: Developer-friendly authentication platform +- **Keycloak**: Open-source identity management +- **Generic OIDC**: Any standards-compliant OIDC provider + +See the [OIDC Setup Guide](oidc-setup.md) for detailed configuration instructions. + +## User Roles and Permissions + +### User Role + +**Standard Users** have access to core document management functionality: + +**Permissions:** +- ✅ Upload and manage own documents +- ✅ Search all documents (based on sharing settings) +- ✅ Configure personal settings and preferences +- ✅ Create and manage personal labels +- ✅ Use OCR processing features +- ✅ Access personal sources (WebDAV, local folders, S3) +- ✅ View personal notifications +- ❌ User management (cannot create/modify other users) +- ❌ System-wide settings or configuration +- ❌ Access to other users' private documents + +### Admin Role + +**Administrators** have full system access and user management capabilities: + +**Additional Permissions:** +- ✅ **User Management**: Create, modify, and delete user accounts +- ✅ **System Settings**: Configure global system parameters +- ✅ **User Impersonation**: Access other users' documents (if needed) +- ✅ **System Monitoring**: View system health and performance metrics +- ✅ **Advanced Configuration**: OCR settings, source configurations +- ✅ **Security Management**: Token management, authentication settings + +**Default Admin Account:** +- Username: `admin` +- Default Password: `readur2024` ⚠️ **Change immediately in production!** + +## Admin User Management + +### Accessing User Management + +1. Log in as an administrator +2. Navigate to **Settings** → **Users** +3. The user management interface displays all system users + +### User Management Operations + +#### Creating Users + +1. **Click "Add User"** in the Users section +2. **Fill out user information**: + ``` + Username: john.doe + Email: john.doe@company.com + Password: [secure-password] + Role: User (or Admin) + ``` +3. **Save** to create the account +4. **Notify the user** of their credentials + +#### Modifying Users + +1. **Find the user** in the user list +2. **Click "Edit"** or the user row +3. **Update information**: + - Change email address + - Reset password + - Modify role (User ↔ Admin) + - Update username (if needed) +4. **Save changes** + +#### Deleting Users + +1. **Select the user** to delete +2. **Click "Delete"** +3. **Confirm deletion** (this action cannot be undone) + +**Important Notes:** +- Users cannot delete their own accounts +- Deleting a user removes all their documents and settings +- Consider disabling instead of deleting for user retention + +#### Bulk Operations + +**Future Feature**: Bulk user operations for enterprise deployments: +- Bulk user import from CSV +- Bulk role changes +- Bulk user deactivation + +### User Information Display + +The user management interface shows: +- **Username and Email**: Primary identification +- **Role**: Current role assignment +- **Created Date**: Account creation timestamp +- **Last Login**: Recent activity indicator +- **Auth Provider**: Local or OIDC authentication method +- **Status**: Active/disabled status (future feature) + +## User Settings and Preferences + +### Personal Settings Access + +Users can configure their preferences via: +1. **User Menu** → **Settings** (top-right corner) +2. **Settings Page** → **Personal** tab + +### Settings Categories + +#### OCR Preferences + +**Language Settings:** +- **OCR Language**: Primary language for text recognition (25+ languages) +- **Fallback Languages**: Secondary languages for mixed documents +- **Auto-Detection**: Automatic language detection (if supported) + +**Processing Options:** +- **Image Enhancement**: Enable preprocessing for better OCR results +- **Auto-Rotation**: Automatically rotate images for optimal text recognition +- **Confidence Threshold**: Minimum confidence level for OCR acceptance +- **Processing Priority**: User's OCR queue priority level + +#### Search Preferences + +**Display Settings:** +- **Results Per Page**: Number of search results to display (10-100) +- **Snippet Length**: Length of text previews in search results +- **Fuzzy Search Threshold**: Sensitivity for fuzzy/approximate matching +- **Search History**: Enable/disable search query history + +**Search Behavior:** +- **Default Sort Order**: Relevance, date, filename, size +- **Auto-Complete**: Enable search suggestions +- **Real-time Search**: Search as you type functionality + +#### File Processing + +**Upload Settings:** +- **Default File Types**: Preferred file types for uploads +- **Auto-OCR**: Automatically queue uploads for OCR processing +- **Duplicate Handling**: How to handle duplicate file uploads +- **File Size Limits**: Personal file size restrictions + +**Storage Preferences:** +- **Compression**: Enable compression for storage savings +- **Retention Period**: How long to keep documents (if configured) +- **Archive Behavior**: Automatic archiving of old documents + +#### Interface Preferences + +**Display Options:** +- **Theme**: Light/dark mode preference +- **Timezone**: Local timezone for timestamp display +- **Date Format**: Preferred date/time display format +- **Language**: Interface language (separate from OCR language) + +**Navigation:** +- **Default View**: List or grid view for document browser +- **Sidebar Collapsed**: Default sidebar state +- **Items Per Page**: Default pagination size + +#### Notification Settings + +**Notification Types:** +- **OCR Completion**: Notify when document processing completes +- **Source Sync**: Notifications for source synchronization events +- **System Alerts**: Important system messages and warnings +- **Storage Warnings**: Alerts for storage space or quota issues + +**Delivery Methods:** +- **In-App Notifications**: Browser notifications within Readur +- **Email Notifications**: Email delivery for important events (future) +- **Desktop Notifications**: Browser push notifications (future) + +### Source-Specific Settings + +**WebDAV Preferences:** +- **Connection Timeout**: How long to wait for WebDAV responses +- **Retry Attempts**: Number of retries for failed downloads +- **Sync Schedule**: Preferred automatic sync frequency + +**Local Folder Settings:** +- **Watch Interval**: How often to scan local directories +- **File Permissions**: Permission handling for processed files +- **Symlink Handling**: Follow symbolic links during scans + +### Saving and Applying Settings + +1. **Modify preferences** in the settings interface +2. **Click "Save Settings"** to apply changes +3. **Settings take effect immediately** for most options +4. **Some settings** may require logout/login to fully apply + +## OIDC/SSO Integration + +### Overview + +OIDC integration allows users to authenticate using their corporate credentials without creating separate passwords for Readur. + +### User Experience with OIDC + +#### First-Time Login + +1. **User clicks "Login with SSO"** on login page +2. **Redirected to corporate identity provider** (e.g., Azure AD, Okta) +3. **User authenticates** with corporate credentials +4. **Readur creates user account automatically** with information from OIDC provider +5. **User is logged in** and can immediately start using Readur + +#### Subsequent Logins + +1. **Click "Login with SSO"** +2. **Automatic redirect** to identity provider +3. **Single sign-on** (may not require re-authentication) +4. **Immediate access** to Readur + +### OIDC User Account Details + +**Automatic Account Creation:** +- **Username**: Derived from OIDC `preferred_username` or `sub` claim +- **Email**: Uses OIDC `email` claim +- **Role**: Default "User" role (admins can promote later) +- **Auth Provider**: Marked as "OIDC" in user management + +**Identity Mapping:** +- **OIDC Subject**: Unique identifier from identity provider +- **OIDC Issuer**: Identity provider URL +- **Linked Accounts**: Maps OIDC identity to Readur user + +### Mixed Authentication Environments + +Readur supports both local and OIDC users in the same installation: + +- **Local Admin Accounts**: For initial setup and emergency access +- **OIDC User Accounts**: For regular enterprise users +- **Role Management**: Admins can promote OIDC users to admin role +- **Account Linking**: Future feature to link local and OIDC accounts + +### OIDC Configuration + +See the detailed [OIDC Setup Guide](oidc-setup.md) for complete configuration instructions. + +## Security Best Practices + +### Password Security + +**For Local Accounts:** +1. **Use Strong Passwords**: Minimum 12 characters with mixed case, numbers, symbols +2. **Regular Rotation**: Change passwords periodically +3. **Unique Passwords**: Don't reuse passwords from other systems +4. **Admin Passwords**: Use extra-strong passwords for administrator accounts + +### JWT Token Security + +**Token Management:** +- **Secure Storage**: Tokens stored securely in browser localStorage +- **Automatic Expiration**: 24-hour token lifetime +- **Secure Transmission**: HTTPS required for production +- **Token Rotation**: Regular token refresh (future feature) + +### Access Control + +**Role Management:** +1. **Principle of Least Privilege**: Grant minimum necessary permissions +2. **Regular Review**: Periodically audit user roles and permissions +3. **Admin Accounts**: Limit number of administrator accounts +4. **Account Deactivation**: Disable accounts for departed users + +### OIDC Security + +**Provider Configuration:** +1. **Use HTTPS**: Ensure all OIDC endpoints use HTTPS +2. **Client Secret Protection**: Secure storage of OIDC client secrets +3. **Scope Limitation**: Request only necessary OIDC scopes +4. **Token Validation**: Proper verification of OIDC tokens + +### Monitoring and Auditing + +**Access Monitoring:** +- **Login Tracking**: Monitor successful and failed login attempts +- **Role Changes**: Audit administrator role assignments +- **Account Activity**: Track user document access patterns +- **Security Events**: Log authentication and authorization events + +## Troubleshooting + +### Common Authentication Issues + +#### Local Login Problems + +**Symptom**: "Invalid username or password" +**Solutions**: +1. **Verify credentials**: Check username/password carefully +2. **Account existence**: Confirm account exists in user management +3. **Password reset**: Admin can reset user password +4. **Account status**: Ensure account is active/enabled + +#### OIDC Login Problems + +**Symptom**: OIDC login fails or redirects incorrectly +**Solutions**: +1. **Check OIDC configuration**: Verify client ID, secret, and issuer URL +2. **Redirect URI**: Ensure redirect URI is registered with OIDC provider +3. **Provider status**: Confirm OIDC provider is operational +4. **Network connectivity**: Verify Readur can reach OIDC endpoints + +#### JWT Token Issues + +**Symptom**: "Invalid token" or frequent logouts +**Solutions**: +1. **Check system time**: Ensure server time is accurate +2. **JWT secret**: Verify JWT_SECRET environment variable +3. **Token expiration**: Tokens expire after 24 hours +4. **Browser storage**: Clear localStorage and re-login + +### User Management Issues + +#### Cannot Create Users + +**Symptom**: User creation fails +**Solutions**: +1. **Admin permissions**: Ensure logged in as administrator +2. **Duplicate usernames**: Check for existing username/email +3. **Database connectivity**: Verify database connection +4. **Input validation**: Ensure all required fields are provided + +#### User Settings Not Saving + +**Symptom**: Settings changes don't persist +**Solutions**: +1. **Check permissions**: Ensure user has permission to modify settings +2. **Database issues**: Verify database write permissions +3. **Browser issues**: Try clearing browser cache +4. **Network connectivity**: Ensure stable connection during save + +### Role and Permission Issues + +#### Users Cannot Access Features + +**Symptom**: User reports missing functionality +**Solutions**: +1. **Check user role**: Verify user has appropriate role assignment +2. **Permission scope**: Confirm feature is available to user role +3. **Session refresh**: User may need to logout/login after role change +4. **Feature availability**: Ensure feature is enabled in system configuration + +#### Admin Access Problems + +**Symptom**: Admin cannot access management features +**Solutions**: +1. **Role verification**: Confirm user has Admin role +2. **Token validity**: Ensure JWT token contains correct role information +3. **Database consistency**: Verify role is correctly stored in database +4. **Login refresh**: Try logging out and logging back in + +### Performance Issues + +#### Slow User Operations + +**Symptom**: User management operations are slow +**Solutions**: +1. **Database performance**: Check database query performance +2. **User count**: Large user counts may require pagination +3. **Network latency**: OIDC operations may be affected by provider latency +4. **System resources**: Monitor CPU and memory usage + +## Next Steps + +- Configure [OIDC integration](oidc-setup.md) for enterprise authentication +- Set up [sources](sources-guide.md) for document synchronization +- Review [security best practices](deployment.md#security-considerations) +- Explore [advanced search](advanced-search.md) capabilities +- Configure [labels and organization](labels-and-organization.md) for document management \ No newline at end of file