feat(docs): update documentation for quite a few things
This commit is contained in:
parent
fb8e61b0e4
commit
6b64cc0b42
18
README.md
18
README.md
|
|
@ -7,11 +7,16 @@ A powerful, modern document management system built with Rust and React. Readur
|
||||||
|
|
||||||
## ✨ Features
|
## ✨ Features
|
||||||
|
|
||||||
- 🔐 **Secure Authentication**: JWT-based user authentication with bcrypt password hashing
|
- 🔐 **Secure Authentication**: JWT-based user authentication with bcrypt password hashing + OIDC/SSO support
|
||||||
|
- 👥 **User Management**: Role-based access control with Admin and User roles
|
||||||
- 📤 **Smart File Upload**: Drag-and-drop support for PDF, images, text files, and Office documents
|
- 📤 **Smart File Upload**: Drag-and-drop support for PDF, images, text files, and Office documents
|
||||||
- 🔍 **Advanced OCR**: Automatic text extraction using Tesseract for searchable document content
|
- 🔍 **Advanced OCR**: Automatic text extraction using Tesseract for searchable document content
|
||||||
- 🔎 **Powerful Search**: PostgreSQL full-text search with advanced filtering and ranking
|
- 🔎 **Powerful Search**: PostgreSQL full-text search with multiple modes (simple, phrase, fuzzy, boolean)
|
||||||
- 👁️ **Folder Monitoring**: Non-destructive file watching (unlike paperless-ngx, doesn't consume source files)
|
- 🔗 **Multi-Source Sync**: WebDAV, Local Folders, and S3-compatible storage integration
|
||||||
|
- 🏷️ **Labels & Organization**: Comprehensive tagging system with color-coding and hierarchical structure
|
||||||
|
- 👁️ **Folder Monitoring**: Non-destructive file watching with intelligent sync scheduling
|
||||||
|
- 📊 **Health Monitoring**: Proactive source validation and system health tracking
|
||||||
|
- 🔔 **Notifications**: Real-time alerts for sync events, OCR completion, and system status
|
||||||
- 🎨 **Modern UI**: Beautiful React frontend with Material-UI components and responsive design
|
- 🎨 **Modern UI**: Beautiful React frontend with Material-UI components and responsive design
|
||||||
- 🐳 **Docker Ready**: Complete containerization with production-ready multi-stage builds
|
- 🐳 **Docker Ready**: Complete containerization with production-ready multi-stage builds
|
||||||
- ⚡ **High Performance**: Rust backend for speed and reliability
|
- ⚡ **High Performance**: Rust backend for speed and reliability
|
||||||
|
|
@ -44,6 +49,13 @@ open http://localhost:8000
|
||||||
- [🔧 Configuration](docs/configuration.md) - Environment variables and settings
|
- [🔧 Configuration](docs/configuration.md) - Environment variables and settings
|
||||||
- [📖 User Guide](docs/user-guide.md) - How to use Readur effectively
|
- [📖 User Guide](docs/user-guide.md) - How to use Readur effectively
|
||||||
|
|
||||||
|
### Core Features
|
||||||
|
- [🔗 Sources Guide](docs/sources-guide.md) - WebDAV, Local Folders, and S3 integration
|
||||||
|
- [👥 User Management](docs/user-management-guide.md) - Authentication, roles, and administration
|
||||||
|
- [🏷️ Labels & Organization](docs/labels-and-organization.md) - Document tagging and categorization
|
||||||
|
- [🔎 Advanced Search](docs/advanced-search.md) - Search modes, syntax, and optimization
|
||||||
|
- [🔐 OIDC Setup](docs/oidc-setup.md) - Single Sign-On integration
|
||||||
|
|
||||||
### Deployment & Operations
|
### Deployment & Operations
|
||||||
- [🚀 Deployment Guide](docs/deployment.md) - Production deployment, SSL, monitoring
|
- [🚀 Deployment Guide](docs/deployment.md) - Production deployment, SSL, monitoring
|
||||||
- [🔄 Reverse Proxy Setup](docs/REVERSE_PROXY.md) - Nginx, Traefik, and more
|
- [🔄 Reverse Proxy Setup](docs/REVERSE_PROXY.md) - Nginx, Traefik, and more
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,687 @@
|
||||||
|
# Advanced Search Guide
|
||||||
|
|
||||||
|
Readur provides powerful search capabilities that go far beyond simple text matching. This comprehensive guide covers all search modes, advanced filtering, query syntax, and optimization techniques.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Search Modes](#search-modes)
|
||||||
|
- [Query Syntax](#query-syntax)
|
||||||
|
- [Advanced Filtering](#advanced-filtering)
|
||||||
|
- [Search Interface](#search-interface)
|
||||||
|
- [Search Optimization](#search-optimization)
|
||||||
|
- [Saved Searches](#saved-searches)
|
||||||
|
- [Search Analytics](#search-analytics)
|
||||||
|
- [API Search](#api-search)
|
||||||
|
- [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Readur's search system is built on PostgreSQL's full-text search capabilities with additional enhancements for document-specific requirements.
|
||||||
|
|
||||||
|
### Search Capabilities
|
||||||
|
|
||||||
|
- **Full-Text Search**: Search within document content and OCR-extracted text
|
||||||
|
- **Multiple Search Modes**: Simple, phrase, fuzzy, and boolean search options
|
||||||
|
- **Advanced Filtering**: Filter by file type, date, size, labels, and source
|
||||||
|
- **Real-Time Suggestions**: Auto-complete and query suggestions as you type
|
||||||
|
- **Faceted Search**: Browse documents by categories and properties
|
||||||
|
- **Cross-Language Support**: Search in multiple languages with OCR text
|
||||||
|
- **Relevance Ranking**: Intelligent scoring and result ordering
|
||||||
|
|
||||||
|
### Search Sources
|
||||||
|
|
||||||
|
Readur searches across multiple content sources:
|
||||||
|
|
||||||
|
1. **Document Content**: Original text from text files and PDFs
|
||||||
|
2. **OCR Text**: Extracted text from images and scanned documents
|
||||||
|
3. **Metadata**: File names, descriptions, and document properties
|
||||||
|
4. **Labels**: User-created and system-generated tags
|
||||||
|
5. **Source Information**: Upload source and file paths
|
||||||
|
|
||||||
|
## Search Modes
|
||||||
|
|
||||||
|
### Simple Search (Smart Search)
|
||||||
|
|
||||||
|
**Best for**: General purpose searching and quick document discovery
|
||||||
|
|
||||||
|
**How it works**:
|
||||||
|
- Automatically applies stemming and fuzzy matching
|
||||||
|
- Searches across all text content and metadata
|
||||||
|
- Provides intelligent relevance scoring
|
||||||
|
- Handles common typos and variations
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
invoice 2024
|
||||||
|
```
|
||||||
|
Finds: "Invoice Q1 2024", "invoicing for 2024", "2024 invoice data"
|
||||||
|
|
||||||
|
**Features**:
|
||||||
|
- **Auto-stemming**: "running" matches "run", "runs", "runner"
|
||||||
|
- **Fuzzy tolerance**: "recieve" matches "receive"
|
||||||
|
- **Partial matching**: "doc" matches "document", "documentation"
|
||||||
|
- **Relevance ranking**: More relevant matches appear first
|
||||||
|
|
||||||
|
### Phrase Search (Exact Match)
|
||||||
|
|
||||||
|
**Best for**: Finding exact phrases or specific terminology
|
||||||
|
|
||||||
|
**How it works**:
|
||||||
|
- Searches for the exact sequence of words
|
||||||
|
- Case-insensitive but order-sensitive
|
||||||
|
- Useful for finding specific quotes, names, or technical terms
|
||||||
|
|
||||||
|
**Syntax**: Use quotes around the phrase
|
||||||
|
```
|
||||||
|
"quarterly financial report"
|
||||||
|
"John Smith"
|
||||||
|
"error code 404"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Features**:
|
||||||
|
- **Exact word order**: Only matches the precise sequence
|
||||||
|
- **Case insensitive**: "John Smith" matches "john smith"
|
||||||
|
- **Punctuation ignored**: "error-code" matches "error code"
|
||||||
|
|
||||||
|
### Fuzzy Search (Approximate Matching)
|
||||||
|
|
||||||
|
**Best for**: Handling typos, OCR errors, and spelling variations
|
||||||
|
|
||||||
|
**How it works**:
|
||||||
|
- Uses trigram similarity to find approximate matches
|
||||||
|
- Configurable similarity threshold (default: 0.8)
|
||||||
|
- Particularly useful for OCR-processed documents with errors
|
||||||
|
|
||||||
|
**Syntax**: Use the `~` operator
|
||||||
|
```
|
||||||
|
invoice~ # Finds "invoice", "invoce", "invoise"
|
||||||
|
contract~ # Finds "contract", "contarct", "conract"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Configuration**:
|
||||||
|
- **Threshold adjustment**: Configure sensitivity via user settings
|
||||||
|
- **Language-specific**: Different languages may need different thresholds
|
||||||
|
- **OCR optimization**: Higher tolerance for OCR-processed documents
|
||||||
|
|
||||||
|
### Boolean Search (Logical Operators)
|
||||||
|
|
||||||
|
**Best for**: Complex queries with multiple conditions and precise control
|
||||||
|
|
||||||
|
**Operators**:
|
||||||
|
- **AND**: Both terms must be present
|
||||||
|
- **OR**: Either term can be present
|
||||||
|
- **NOT**: Exclude documents with the term
|
||||||
|
- **Parentheses**: Group conditions
|
||||||
|
|
||||||
|
**Examples**:
|
||||||
|
```
|
||||||
|
budget AND 2024 # Both "budget" and "2024"
|
||||||
|
invoice OR receipt # Either "invoice" or "receipt"
|
||||||
|
contract NOT draft # "contract" but not "draft"
|
||||||
|
(budget OR financial) AND 2024 # Complex grouping
|
||||||
|
marketing AND (campaign OR strategy) # Marketing documents about campaigns or strategy
|
||||||
|
```
|
||||||
|
|
||||||
|
**Advanced Boolean Examples**:
|
||||||
|
```
|
||||||
|
# Find completed project documents
|
||||||
|
project AND (final OR completed OR approved) NOT draft
|
||||||
|
|
||||||
|
# Financial documents excluding personal items
|
||||||
|
(invoice OR receipt OR budget) NOT personal
|
||||||
|
|
||||||
|
# Recent important documents
|
||||||
|
(urgent OR priority OR critical) AND label:"this month"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Query Syntax
|
||||||
|
|
||||||
|
### Field-Specific Search
|
||||||
|
|
||||||
|
Search within specific document fields for precise targeting.
|
||||||
|
|
||||||
|
#### Available Fields
|
||||||
|
|
||||||
|
| Field | Description | Example |
|
||||||
|
|-------|-------------|---------|
|
||||||
|
| `filename:` | Search in file names | `filename:invoice` |
|
||||||
|
| `content:` | Search in document text | `content:"project status"` |
|
||||||
|
| `label:` | Search by labels | `label:urgent` |
|
||||||
|
| `type:` | Search by file type | `type:pdf` |
|
||||||
|
| `source:` | Search by upload source | `source:webdav` |
|
||||||
|
| `size:` | Search by file size | `size:>10MB` |
|
||||||
|
| `date:` | Search by date | `date:2024-01-01` |
|
||||||
|
|
||||||
|
#### Field Search Examples
|
||||||
|
|
||||||
|
```
|
||||||
|
filename:contract AND date:2024 # Contracts from 2024
|
||||||
|
label:"high priority" OR label:urgent # Priority documents
|
||||||
|
type:pdf AND content:budget # PDF files containing "budget"
|
||||||
|
source:webdav AND label:approved # Approved docs from WebDAV
|
||||||
|
```
|
||||||
|
|
||||||
|
### Range Queries
|
||||||
|
|
||||||
|
#### Date Ranges
|
||||||
|
```
|
||||||
|
date:2024-01-01..2024-03-31 # Q1 2024 documents
|
||||||
|
date:>2024-01-01 # After January 1, 2024
|
||||||
|
date:<2024-12-31 # Before December 31, 2024
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Size Ranges
|
||||||
|
```
|
||||||
|
size:1MB..10MB # Between 1MB and 10MB
|
||||||
|
size:>50MB # Larger than 50MB
|
||||||
|
size:<1KB # Smaller than 1KB
|
||||||
|
```
|
||||||
|
|
||||||
|
### Wildcard Search
|
||||||
|
|
||||||
|
Use wildcards for partial matching:
|
||||||
|
|
||||||
|
```
|
||||||
|
proj* # Matches "project", "projects", "projection"
|
||||||
|
*report # Matches "annual report", "status report"
|
||||||
|
doc?ment # Matches "document", "documents" (? = single character)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exclusion Operators
|
||||||
|
|
||||||
|
Exclude unwanted results:
|
||||||
|
|
||||||
|
```
|
||||||
|
invoice -draft # Invoices but not drafts
|
||||||
|
budget NOT personal # Budget documents excluding personal
|
||||||
|
-label:archive proposal # Proposals not in archive
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Filtering
|
||||||
|
|
||||||
|
### File Type Filters
|
||||||
|
|
||||||
|
Filter by specific file formats:
|
||||||
|
|
||||||
|
**Common File Types**:
|
||||||
|
- **Documents**: PDF, DOC, DOCX, TXT, RTF
|
||||||
|
- **Images**: PNG, JPG, JPEG, TIFF, BMP, GIF
|
||||||
|
- **Spreadsheets**: XLS, XLSX, CSV
|
||||||
|
- **Presentations**: PPT, PPTX
|
||||||
|
|
||||||
|
**Filter Interface**:
|
||||||
|
1. **Checkbox Filters**: Select multiple file types
|
||||||
|
2. **MIME Type Groups**: Filter by general categories
|
||||||
|
3. **Custom Extensions**: Add specific file extensions
|
||||||
|
|
||||||
|
**Search Syntax**:
|
||||||
|
```
|
||||||
|
type:pdf # Only PDF files
|
||||||
|
type:(pdf OR doc) # PDF or Word documents
|
||||||
|
-type:image # Exclude all images
|
||||||
|
```
|
||||||
|
|
||||||
|
### Date and Time Filters
|
||||||
|
|
||||||
|
**Predefined Ranges**:
|
||||||
|
- Today, Yesterday, This Week, Last Week
|
||||||
|
- This Month, Last Month, This Quarter, Last Quarter
|
||||||
|
- This Year, Last Year
|
||||||
|
|
||||||
|
**Custom Date Ranges**:
|
||||||
|
- **Start Date**: Documents uploaded after specific date
|
||||||
|
- **End Date**: Documents uploaded before specific date
|
||||||
|
- **Date Range**: Documents within specific period
|
||||||
|
|
||||||
|
**Advanced Date Syntax**:
|
||||||
|
```
|
||||||
|
created:today # Documents uploaded today
|
||||||
|
modified:>2024-01-01 # Modified after January 1st
|
||||||
|
accessed:last-week # Accessed in the last week
|
||||||
|
```
|
||||||
|
|
||||||
|
### Size Filters
|
||||||
|
|
||||||
|
**Size Categories**:
|
||||||
|
- **Small**: < 1MB
|
||||||
|
- **Medium**: 1MB - 10MB
|
||||||
|
- **Large**: 10MB - 50MB
|
||||||
|
- **Very Large**: > 50MB
|
||||||
|
|
||||||
|
**Custom Size Ranges**:
|
||||||
|
```
|
||||||
|
size:>10MB # Larger than 10MB
|
||||||
|
size:1MB..5MB # Between 1MB and 5MB
|
||||||
|
size:<100KB # Smaller than 100KB
|
||||||
|
```
|
||||||
|
|
||||||
|
### Label Filters
|
||||||
|
|
||||||
|
**Label Selection**:
|
||||||
|
- **Multiple Labels**: Select multiple labels with AND/OR logic
|
||||||
|
- **Label Hierarchy**: Navigate nested label structures
|
||||||
|
- **Label Suggestions**: Auto-complete based on existing labels
|
||||||
|
|
||||||
|
**Label Search Syntax**:
|
||||||
|
```
|
||||||
|
label:project # Documents with "project" label
|
||||||
|
label:"high priority" # Multi-word labels in quotes
|
||||||
|
label:(urgent OR critical) # Documents with either label
|
||||||
|
-label:archive # Exclude archived documents
|
||||||
|
```
|
||||||
|
|
||||||
|
### Source Filters
|
||||||
|
|
||||||
|
Filter by document source or origin:
|
||||||
|
|
||||||
|
**Source Types**:
|
||||||
|
- **Manual Upload**: Documents uploaded directly
|
||||||
|
- **WebDAV Sync**: Documents from WebDAV sources
|
||||||
|
- **Local Folder**: Documents from watched folders
|
||||||
|
- **S3 Sync**: Documents from S3 buckets
|
||||||
|
|
||||||
|
**Source-Specific Filters**:
|
||||||
|
```
|
||||||
|
source:webdav # WebDAV synchronized documents
|
||||||
|
source:manual # Manually uploaded documents
|
||||||
|
source:"My Nextcloud" # Specific named source
|
||||||
|
```
|
||||||
|
|
||||||
|
### OCR Status Filters
|
||||||
|
|
||||||
|
Filter by OCR processing status:
|
||||||
|
|
||||||
|
**Status Options**:
|
||||||
|
- **Completed**: OCR successfully completed
|
||||||
|
- **Pending**: Waiting for OCR processing
|
||||||
|
- **Failed**: OCR processing failed
|
||||||
|
- **Not Applicable**: Text documents that don't need OCR
|
||||||
|
|
||||||
|
**OCR Quality Filters**:
|
||||||
|
- **High Confidence**: OCR confidence > 90%
|
||||||
|
- **Medium Confidence**: OCR confidence 70-90%
|
||||||
|
- **Low Confidence**: OCR confidence < 70%
|
||||||
|
|
||||||
|
## Search Interface
|
||||||
|
|
||||||
|
### Global Search Bar
|
||||||
|
|
||||||
|
**Location**: Available in the header on all pages
|
||||||
|
**Features**:
|
||||||
|
- **Real-time suggestions**: Shows results as you type
|
||||||
|
- **Quick results**: Top 5 matches with snippets
|
||||||
|
- **Fast navigation**: Direct access to documents
|
||||||
|
- **Search history**: Recent searches for quick access
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
1. Click on the search bar in the header
|
||||||
|
2. Start typing your query
|
||||||
|
3. View instant suggestions and results
|
||||||
|
4. Click a result to navigate directly to the document
|
||||||
|
|
||||||
|
### Advanced Search Page
|
||||||
|
|
||||||
|
**Location**: Dedicated search page with full interface
|
||||||
|
**Features**:
|
||||||
|
- **Multiple search modes**: Toggle between search types
|
||||||
|
- **Filter sidebar**: All filtering options in one place
|
||||||
|
- **Result options**: Sorting, pagination, view modes
|
||||||
|
- **Export capabilities**: Export search results
|
||||||
|
|
||||||
|
**Interface Sections**:
|
||||||
|
|
||||||
|
#### Search Input Area
|
||||||
|
- **Query builder**: Visual query construction
|
||||||
|
- **Mode selector**: Choose search type (simple, phrase, fuzzy, boolean)
|
||||||
|
- **Suggestions**: Auto-complete and query recommendations
|
||||||
|
|
||||||
|
#### Filter Sidebar
|
||||||
|
- **File type filters**: Checkboxes for different formats
|
||||||
|
- **Date range picker**: Calendar interface for date selection
|
||||||
|
- **Size sliders**: Visual size range selection
|
||||||
|
- **Label selector**: Hierarchical label browser
|
||||||
|
- **Source filters**: Filter by upload source
|
||||||
|
|
||||||
|
#### Results Area
|
||||||
|
- **Sort options**: Relevance, date, filename, size
|
||||||
|
- **View modes**: List view, grid view, detail view
|
||||||
|
- **Pagination**: Navigate through result pages
|
||||||
|
- **Export options**: CSV, JSON export of results
|
||||||
|
|
||||||
|
### Search Results
|
||||||
|
|
||||||
|
#### Result Display Elements
|
||||||
|
|
||||||
|
**Document Cards**:
|
||||||
|
- **Filename**: Primary document identifier
|
||||||
|
- **Snippet**: Highlighted text excerpt showing search matches
|
||||||
|
- **Metadata**: File size, type, upload date, labels
|
||||||
|
- **Relevance Score**: Numerical relevance ranking
|
||||||
|
- **Quick Actions**: Download, view, edit labels
|
||||||
|
|
||||||
|
**Highlighting**:
|
||||||
|
- **Search terms**: Highlighted in yellow
|
||||||
|
- **Context**: Surrounding text for context
|
||||||
|
- **Multiple matches**: All instances highlighted
|
||||||
|
- **Snippet length**: Configurable in user settings
|
||||||
|
|
||||||
|
#### Result Sorting
|
||||||
|
|
||||||
|
**Sort Options**:
|
||||||
|
- **Relevance**: Best matches first (default)
|
||||||
|
- **Date**: Newest or oldest first
|
||||||
|
- **Filename**: Alphabetical order
|
||||||
|
- **Size**: Largest or smallest first
|
||||||
|
- **Score**: Highest search score first
|
||||||
|
|
||||||
|
**Secondary Sorting**:
|
||||||
|
- Apply secondary criteria when primary sort values are equal
|
||||||
|
- Example: Sort by relevance, then by date
|
||||||
|
|
||||||
|
### Search Configuration
|
||||||
|
|
||||||
|
#### User Preferences
|
||||||
|
|
||||||
|
**Search Settings** (accessible via Settings → Search):
|
||||||
|
- **Results per page**: 10, 25, 50, 100
|
||||||
|
- **Snippet length**: 100, 200, 300, 500 characters
|
||||||
|
- **Fuzzy threshold**: Sensitivity for approximate matching
|
||||||
|
- **Default sort**: Preferred default sorting option
|
||||||
|
- **Search history**: Enable/disable query history
|
||||||
|
|
||||||
|
#### Search Behavior
|
||||||
|
- **Auto-complete**: Enable search suggestions
|
||||||
|
- **Real-time search**: Search as you type
|
||||||
|
- **Search highlighting**: Highlight search terms in results
|
||||||
|
- **Context snippets**: Show surrounding text in results
|
||||||
|
|
||||||
|
## Search Optimization
|
||||||
|
|
||||||
|
### Query Optimization
|
||||||
|
|
||||||
|
#### Best Practices
|
||||||
|
|
||||||
|
1. **Use Specific Terms**: More specific queries yield better results
|
||||||
|
```
|
||||||
|
Good: "quarterly sales report Q1"
|
||||||
|
Poor: "document"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Combine Search Modes**: Use appropriate mode for your needs
|
||||||
|
```
|
||||||
|
Exact phrases: "status update"
|
||||||
|
Flexible terms: project~
|
||||||
|
Complex logic: (budget OR financial) AND 2024
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Leverage Filters**: Combine text search with filters
|
||||||
|
```
|
||||||
|
Query: budget
|
||||||
|
Filters: Type = PDF, Date = This Quarter, Label = Finance
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Use Field Search**: Target specific document aspects
|
||||||
|
```
|
||||||
|
filename:invoice date:2024
|
||||||
|
content:"project milestone" label:important
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Tips
|
||||||
|
|
||||||
|
#### Efficient Searching
|
||||||
|
|
||||||
|
1. **Start Broad, Then Narrow**: Begin with general terms, then add filters
|
||||||
|
2. **Use Filters Early**: Apply filters before complex text queries
|
||||||
|
3. **Avoid Wildcards at Start**: `*report` is slower than `report*`
|
||||||
|
4. **Combine Short Queries**: Use multiple short terms rather than long phrases
|
||||||
|
|
||||||
|
#### Search Index Optimization
|
||||||
|
|
||||||
|
The search system automatically optimizes for:
|
||||||
|
- **Frequent Terms**: Common words are indexed for fast retrieval
|
||||||
|
- **Document Updates**: New documents are indexed immediately
|
||||||
|
- **Language Support**: Multi-language stemming and analysis
|
||||||
|
- **Cache Management**: Frequent searches are cached
|
||||||
|
|
||||||
|
### OCR Search Optimization
|
||||||
|
|
||||||
|
#### Handling OCR Text
|
||||||
|
|
||||||
|
OCR-extracted text may contain errors that affect search:
|
||||||
|
|
||||||
|
**Strategies**:
|
||||||
|
1. **Use Fuzzy Search**: Handle OCR errors with approximate matching
|
||||||
|
2. **Try Variations**: Search for common OCR mistakes
|
||||||
|
3. **Use Context**: Include surrounding words for better matches
|
||||||
|
4. **Check Original**: Compare with original document when possible
|
||||||
|
|
||||||
|
**Common OCR Issues**:
|
||||||
|
- **Character confusion**: "m" vs "rn", "cl" vs "d"
|
||||||
|
- **Word boundaries**: "some thing" vs "something"
|
||||||
|
- **Special characters**: Missing or incorrect punctuation
|
||||||
|
|
||||||
|
**Optimization Examples**:
|
||||||
|
```
|
||||||
|
# Original: "invoice"
|
||||||
|
# OCR might produce: "irwoice", "invoce", "mvoice"
|
||||||
|
# Solution: Use fuzzy search
|
||||||
|
invoice~
|
||||||
|
|
||||||
|
# Or search for context
|
||||||
|
"invoice number" OR "irwoice number" OR "invoce number"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Saved Searches
|
||||||
|
|
||||||
|
### Creating Saved Searches
|
||||||
|
|
||||||
|
1. **Build Your Query**: Create a search with desired parameters
|
||||||
|
2. **Test Results**: Verify the search returns expected documents
|
||||||
|
3. **Save Search**: Click "Save Search" button
|
||||||
|
4. **Name Search**: Provide descriptive name
|
||||||
|
5. **Configure Options**: Set update frequency and notifications
|
||||||
|
|
||||||
|
### Managing Saved Searches
|
||||||
|
|
||||||
|
**Saved Search Features**:
|
||||||
|
- **Quick Access**: Available in sidebar or dashboard
|
||||||
|
- **Automatic Updates**: Results update as new documents are added
|
||||||
|
- **Shared Access**: Share searches with other users (future feature)
|
||||||
|
- **Export Options**: Export results automatically
|
||||||
|
|
||||||
|
**Search Organization**:
|
||||||
|
- **Categories**: Group related searches
|
||||||
|
- **Favorites**: Mark frequently used searches
|
||||||
|
- **Recent**: Quick access to recently used searches
|
||||||
|
|
||||||
|
### Smart Collections
|
||||||
|
|
||||||
|
Saved searches that automatically include new documents:
|
||||||
|
|
||||||
|
**Examples**:
|
||||||
|
- **"This Month's Reports"**: `type:pdf AND content:report AND date:this-month`
|
||||||
|
- **"Pending Review"**: `label:"needs review" AND -label:completed`
|
||||||
|
- **"High Priority Items"**: `label:(urgent OR critical OR "high priority")`
|
||||||
|
|
||||||
|
## Search Analytics
|
||||||
|
|
||||||
|
### Search Performance Metrics
|
||||||
|
|
||||||
|
**Available Metrics**:
|
||||||
|
- **Query Performance**: Average search response times
|
||||||
|
- **Popular Searches**: Most frequently used search terms
|
||||||
|
- **Result Quality**: Click-through rates and user engagement
|
||||||
|
- **Search Patterns**: Common search behaviors and trends
|
||||||
|
|
||||||
|
### User Search History
|
||||||
|
|
||||||
|
**History Features**:
|
||||||
|
- **Recent Searches**: Quick access to previous queries
|
||||||
|
- **Search Suggestions**: Based on search history
|
||||||
|
- **Query Refinement**: Improve searches based on past patterns
|
||||||
|
- **Export History**: Download search history for analysis
|
||||||
|
|
||||||
|
## API Search
|
||||||
|
|
||||||
|
### Basic Search API
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/search?query=invoice&limit=20
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Query Parameters**:
|
||||||
|
- `query`: Search query string
|
||||||
|
- `limit`: Number of results (default: 50, max: 100)
|
||||||
|
- `offset`: Pagination offset
|
||||||
|
- `sort`: Sort order (relevance, date, filename, size)
|
||||||
|
|
||||||
|
### Advanced Search API
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/search/advanced
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"query": "budget report",
|
||||||
|
"mode": "phrase",
|
||||||
|
"filters": {
|
||||||
|
"file_types": ["pdf", "docx"],
|
||||||
|
"labels": ["Q1 2024", "Finance"],
|
||||||
|
"date_range": {
|
||||||
|
"start": "2024-01-01",
|
||||||
|
"end": "2024-03-31"
|
||||||
|
},
|
||||||
|
"size_range": {
|
||||||
|
"min": 1048576,
|
||||||
|
"max": 52428800
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"fuzzy_threshold": 0.8,
|
||||||
|
"snippet_length": 200,
|
||||||
|
"highlight": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search Response Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"results": [
|
||||||
|
{
|
||||||
|
"id": "550e8400-e29b-41d4-a716-446655440000",
|
||||||
|
"filename": "Q1_Budget_Report.pdf",
|
||||||
|
"snippet": "The quarterly budget report shows a <mark>10% increase</mark> in revenue...",
|
||||||
|
"score": 0.95,
|
||||||
|
"highlights": ["budget", "report"],
|
||||||
|
"metadata": {
|
||||||
|
"size": 2048576,
|
||||||
|
"type": "application/pdf",
|
||||||
|
"uploaded_at": "2024-01-15T10:30:00Z",
|
||||||
|
"labels": ["Q1 2024", "Finance", "Budget"],
|
||||||
|
"source": "WebDAV Sync"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"total": 42,
|
||||||
|
"limit": 20,
|
||||||
|
"offset": 0,
|
||||||
|
"query_time": 0.085
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Search Issues
|
||||||
|
|
||||||
|
#### No Results Found
|
||||||
|
|
||||||
|
**Possible Causes**:
|
||||||
|
1. **Typos**: Check spelling in search query
|
||||||
|
2. **Too Specific**: Query might be too restrictive
|
||||||
|
3. **Wrong Mode**: Using exact search when fuzzy would be better
|
||||||
|
4. **Filters**: Remove filters to check if they're excluding results
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Simplify Query**: Start with broader terms
|
||||||
|
2. **Check Spelling**: Use fuzzy search for typo tolerance
|
||||||
|
3. **Remove Filters**: Test without date, type, or label filters
|
||||||
|
4. **Try Synonyms**: Use alternative terms for the same concept
|
||||||
|
|
||||||
|
#### Irrelevant Results
|
||||||
|
|
||||||
|
**Possible Causes**:
|
||||||
|
1. **Too Broad**: Query matches too many unrelated documents
|
||||||
|
2. **Common Terms**: Using very common words that appear everywhere
|
||||||
|
3. **Wrong Mode**: Using fuzzy when exact match is needed
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Add Specificity**: Include more specific terms or context
|
||||||
|
2. **Use Filters**: Add file type, date, or label filters
|
||||||
|
3. **Phrase Search**: Use quotes for exact phrases
|
||||||
|
4. **Boolean Logic**: Use AND/OR/NOT for better control
|
||||||
|
|
||||||
|
#### Slow Search Performance
|
||||||
|
|
||||||
|
**Possible Causes**:
|
||||||
|
1. **Complex Queries**: Very complex boolean queries
|
||||||
|
2. **Large Result Sets**: Queries matching many documents
|
||||||
|
3. **Wildcard Overuse**: Starting queries with wildcards
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Simplify Queries**: Break complex queries into simpler ones
|
||||||
|
2. **Add Filters**: Use filters to reduce result set size
|
||||||
|
3. **Avoid Leading Wildcards**: Use `term*` instead of `*term`
|
||||||
|
4. **Use Pagination**: Request smaller result sets
|
||||||
|
|
||||||
|
### OCR Search Issues
|
||||||
|
|
||||||
|
#### OCR Text Not Searchable
|
||||||
|
|
||||||
|
**Symptoms**: Can't find text that's visible in document images
|
||||||
|
**Solutions**:
|
||||||
|
1. **Check OCR Status**: Verify OCR processing completed
|
||||||
|
2. **Retry OCR**: Manually retry OCR processing
|
||||||
|
3. **Use Fuzzy Search**: OCR might have character recognition errors
|
||||||
|
4. **Check Language Settings**: Ensure correct OCR language is configured
|
||||||
|
|
||||||
|
#### Poor OCR Search Quality
|
||||||
|
|
||||||
|
**Symptoms**: Fuzzy search required for most queries on scanned documents
|
||||||
|
**Solutions**:
|
||||||
|
1. **Improve Source Quality**: Use higher resolution scans (300+ DPI)
|
||||||
|
2. **OCR Language**: Verify correct language setting for documents
|
||||||
|
3. **Image Enhancement**: Enable OCR preprocessing options
|
||||||
|
4. **Manual Correction**: Consider manual text correction for important documents
|
||||||
|
|
||||||
|
### Search Configuration Issues
|
||||||
|
|
||||||
|
#### Settings Not Applied
|
||||||
|
|
||||||
|
**Symptoms**: Search settings changes don't take effect
|
||||||
|
**Solutions**:
|
||||||
|
1. **Reload Page**: Refresh browser to apply settings
|
||||||
|
2. **Clear Cache**: Clear browser cache and cookies
|
||||||
|
3. **Check Permissions**: Ensure user has permission to modify settings
|
||||||
|
4. **Database Issues**: Check if settings are being saved to database
|
||||||
|
|
||||||
|
#### Filter Problems
|
||||||
|
|
||||||
|
**Symptoms**: Filters not working as expected
|
||||||
|
**Solutions**:
|
||||||
|
1. **Clear All Filters**: Reset filters and apply one at a time
|
||||||
|
2. **Check Filter Logic**: Ensure AND/OR logic is correct
|
||||||
|
3. **Label Validation**: Verify labels exist and are spelled correctly
|
||||||
|
4. **Date Format**: Ensure dates are in correct format
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Explore [labels and organization](labels-and-organization.md) for better search categorization
|
||||||
|
- Set up [sources](sources-guide.md) for automatic content ingestion
|
||||||
|
- Review [user guide](user-guide.md) for general search tips
|
||||||
|
- Check [API reference](api-reference.md) for programmatic search integration
|
||||||
|
- Configure [OCR optimization](dev/OCR_OPTIMIZATION_GUIDE.md) for better text extraction
|
||||||
|
|
@ -0,0 +1,501 @@
|
||||||
|
# Labels and Organization Guide
|
||||||
|
|
||||||
|
Readur's labeling system provides powerful document organization and categorization capabilities. This guide covers creating, managing, and using labels to organize your document collection effectively.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Label Types](#label-types)
|
||||||
|
- [Creating and Managing Labels](#creating-and-managing-labels)
|
||||||
|
- [Assigning Labels to Documents](#assigning-labels-to-documents)
|
||||||
|
- [Label-Based Search and Filtering](#label-based-search-and-filtering)
|
||||||
|
- [Label Organization Strategies](#label-organization-strategies)
|
||||||
|
- [Advanced Label Features](#advanced-label-features)
|
||||||
|
- [Best Practices](#best-practices)
|
||||||
|
- [API Integration](#api-integration)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Labels in Readur provide a flexible tagging system that allows you to:
|
||||||
|
|
||||||
|
- **Categorize Documents**: Organize documents by type, project, department, or any custom criteria
|
||||||
|
- **Enhanced Search**: Filter search results by specific labels for precise document discovery
|
||||||
|
- **Visual Organization**: Color-coded labels provide instant visual categorization
|
||||||
|
- **Bulk Operations**: Apply or remove labels from multiple documents simultaneously
|
||||||
|
- **Project Management**: Track documents across projects, workflows, or time periods
|
||||||
|
|
||||||
|
### Key Features
|
||||||
|
|
||||||
|
- **Hierarchical Organization**: Create nested label structures for complex categorization
|
||||||
|
- **Color Coding**: Visual identification with customizable label colors
|
||||||
|
- **System Labels**: Automatic labels generated by Readur for administrative purposes
|
||||||
|
- **User Labels**: Custom labels created and managed by users
|
||||||
|
- **Smart Collections**: Save searches that automatically include documents with specific labels
|
||||||
|
- **Label Statistics**: Track document counts and usage analytics per label
|
||||||
|
|
||||||
|
## Label Types
|
||||||
|
|
||||||
|
### User Labels
|
||||||
|
|
||||||
|
**Custom labels** created and managed by users for personal or organizational categorization.
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- **Full Control**: Create, edit, rename, and delete user-created labels
|
||||||
|
- **Color Customization**: Choose from a wide range of colors for visual organization
|
||||||
|
- **Flexible Naming**: Use any descriptive names that fit your workflow
|
||||||
|
- **Sharing**: Labels are visible to all users with access to labeled documents
|
||||||
|
|
||||||
|
**Common Use Cases:**
|
||||||
|
- Project names (e.g., "Project Alpha", "Q1 Budget")
|
||||||
|
- Document types (e.g., "Invoices", "Contracts", "Reports")
|
||||||
|
- Departments (e.g., "HR", "Engineering", "Marketing")
|
||||||
|
- Priority levels (e.g., "Urgent", "Review Needed", "Archive")
|
||||||
|
- Status indicators (e.g., "Draft", "Final", "Approved")
|
||||||
|
|
||||||
|
### System Labels
|
||||||
|
|
||||||
|
**Automatic labels** generated by Readur based on document properties and processing status.
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
- **OCR Status**: "OCR Completed", "OCR Failed", "OCR Pending"
|
||||||
|
- **File Type**: "PDF", "Image", "Text Document"
|
||||||
|
- **Source Origin**: "WebDAV Upload", "Local Folder", "Manual Upload"
|
||||||
|
- **Processing Status**: "Recently Added", "High Confidence OCR", "Needs Review"
|
||||||
|
- **Size Categories**: "Large File", "Small File"
|
||||||
|
- **Date-based**: "This Week", "This Month", "This Year"
|
||||||
|
|
||||||
|
**Characteristics:**
|
||||||
|
- **Read-only**: Cannot be edited or deleted by users
|
||||||
|
- **Automatic Assignment**: Applied automatically based on document properties
|
||||||
|
- **System Managed**: Updated automatically when document properties change
|
||||||
|
- **Consistent Formatting**: Standardized naming and color scheme
|
||||||
|
|
||||||
|
## Creating and Managing Labels
|
||||||
|
|
||||||
|
### Creating New Labels
|
||||||
|
|
||||||
|
#### Via Label Management Page
|
||||||
|
|
||||||
|
1. **Navigate to Labels**: Go to Settings → Labels
|
||||||
|
2. **Click "Create Label"**
|
||||||
|
3. **Configure Label Properties**:
|
||||||
|
```
|
||||||
|
Name: Project Documentation
|
||||||
|
Color: Blue (#2196F3)
|
||||||
|
Description: Documents related to current projects
|
||||||
|
```
|
||||||
|
4. **Save** to create the label
|
||||||
|
|
||||||
|
#### During Document Upload
|
||||||
|
|
||||||
|
1. **Upload Document(s)**: Use the upload interface
|
||||||
|
2. **Add Labels Field**: In the upload form
|
||||||
|
3. **Create New Label**: Type a new label name
|
||||||
|
4. **Assign Color**: Choose color for the new label
|
||||||
|
5. **Complete Upload**: Label is created and assigned automatically
|
||||||
|
|
||||||
|
#### Quick Label Creation
|
||||||
|
|
||||||
|
- **Search Interface**: Create labels while filtering search results
|
||||||
|
- **Document Details**: Add new labels directly from document pages
|
||||||
|
- **Bulk Operations**: Create labels during bulk document operations
|
||||||
|
|
||||||
|
### Editing Labels
|
||||||
|
|
||||||
|
#### Renaming Labels
|
||||||
|
|
||||||
|
1. **Access Label Management**: Settings → Labels
|
||||||
|
2. **Find Target Label**: Use search or browse the label list
|
||||||
|
3. **Click "Edit"** or double-click the label name
|
||||||
|
4. **Modify Name**: Change to new descriptive name
|
||||||
|
5. **Save Changes**: Updates all documents using this label
|
||||||
|
|
||||||
|
#### Changing Colors
|
||||||
|
|
||||||
|
1. **Edit Label**: Follow renaming steps above
|
||||||
|
2. **Select New Color**: Choose from color palette or enter hex code
|
||||||
|
3. **Preview Changes**: See how the color looks in different contexts
|
||||||
|
4. **Apply**: Color updates immediately across all interfaces
|
||||||
|
|
||||||
|
#### Merging Labels
|
||||||
|
|
||||||
|
1. **Identify Similar Labels**: Find labels with overlapping purposes
|
||||||
|
2. **Select Target Label**: Choose the label to keep
|
||||||
|
3. **Merge Operation**: Use "Merge with..." option
|
||||||
|
4. **Confirm Merge**: All documents transfer to target label
|
||||||
|
5. **Source Label Deletion**: Original label is removed after merge
|
||||||
|
|
||||||
|
### Deleting Labels
|
||||||
|
|
||||||
|
#### Individual Label Deletion
|
||||||
|
|
||||||
|
1. **Label Management Page**: Access via Settings → Labels
|
||||||
|
2. **Select Label**: Find the label to delete
|
||||||
|
3. **Delete Action**: Click delete button or menu option
|
||||||
|
4. **Confirm Deletion**: Confirm removal (this cannot be undone)
|
||||||
|
5. **Document Update**: Label is removed from all associated documents
|
||||||
|
|
||||||
|
#### Bulk Label Cleanup
|
||||||
|
|
||||||
|
- **Unused Labels**: Automatically identify and remove labels with no documents
|
||||||
|
- **Duplicate Labels**: Find and merge labels with similar names
|
||||||
|
- **Batch Deletion**: Select multiple labels for simultaneous removal
|
||||||
|
|
||||||
|
## Assigning Labels to Documents
|
||||||
|
|
||||||
|
### Single Document Labeling
|
||||||
|
|
||||||
|
#### Document Details Page
|
||||||
|
|
||||||
|
1. **Open Document**: Click on any document to view details
|
||||||
|
2. **Labels Section**: Find the labels area in document metadata
|
||||||
|
3. **Add Labels**: Click "+" or "Add Label" button
|
||||||
|
4. **Select or Create**: Choose existing labels or create new ones
|
||||||
|
5. **Apply Changes**: Labels are assigned immediately
|
||||||
|
|
||||||
|
#### Quick Label Assignment
|
||||||
|
|
||||||
|
- **Hover Actions**: Quick label buttons appear when hovering over documents
|
||||||
|
- **Right-Click Menu**: Context menu with common label operations
|
||||||
|
- **Keyboard Shortcuts**: Assign frequently used labels with key combinations
|
||||||
|
|
||||||
|
### Bulk Label Operations
|
||||||
|
|
||||||
|
#### Multi-Document Selection
|
||||||
|
|
||||||
|
1. **Document Browser**: Navigate to documents page
|
||||||
|
2. **Select Documents**: Use checkboxes to select multiple documents
|
||||||
|
3. **Bulk Actions**: Click "Actions" or "Labels" in the toolbar
|
||||||
|
4. **Apply Labels**: Choose labels to add or remove
|
||||||
|
5. **Execute**: Apply changes to all selected documents
|
||||||
|
|
||||||
|
#### Search-Based Labeling
|
||||||
|
|
||||||
|
1. **Search for Documents**: Use search to find specific document sets
|
||||||
|
2. **Select All Results**: Choose all documents matching criteria
|
||||||
|
3. **Bulk Label Assignment**: Apply labels to entire result set
|
||||||
|
4. **Confirmation**: Review and confirm bulk changes
|
||||||
|
|
||||||
|
### Label Assignment During Upload
|
||||||
|
|
||||||
|
#### Upload Interface Labeling
|
||||||
|
|
||||||
|
1. **File Selection**: Choose files to upload
|
||||||
|
2. **Label Assignment**: Add labels before starting upload
|
||||||
|
3. **Label Creation**: Create new labels during upload process
|
||||||
|
4. **Automatic Application**: Labels assigned to all uploaded files
|
||||||
|
|
||||||
|
#### Drag and Drop Labeling
|
||||||
|
|
||||||
|
- **Pre-configured Areas**: Drag files to labeled drop zones
|
||||||
|
- **Automatic Tagging**: Labels applied based on drop location
|
||||||
|
- **Batch Processing**: Assign labels to multiple files simultaneously
|
||||||
|
|
||||||
|
## Label-Based Search and Filtering
|
||||||
|
|
||||||
|
### Label Filters in Search
|
||||||
|
|
||||||
|
#### Basic Label Filtering
|
||||||
|
|
||||||
|
1. **Search Interface**: Access the main search page
|
||||||
|
2. **Label Filter Section**: Find label filters in the sidebar
|
||||||
|
3. **Select Labels**: Check boxes for desired labels
|
||||||
|
4. **Apply Filter**: Search results automatically update
|
||||||
|
5. **Multiple Labels**: Combine multiple labels with AND/OR logic
|
||||||
|
|
||||||
|
#### Advanced Label Queries
|
||||||
|
|
||||||
|
**Search Syntax Examples:**
|
||||||
|
```
|
||||||
|
label:urgent # Documents with "urgent" label
|
||||||
|
label:"project alpha" # Documents with multi-word label
|
||||||
|
label:urgent AND label:review # Documents with both labels
|
||||||
|
label:draft OR label:final # Documents with either label
|
||||||
|
-label:archive # Exclude archived documents
|
||||||
|
```
|
||||||
|
|
||||||
|
### Smart Collections
|
||||||
|
|
||||||
|
#### Creating Smart Collections
|
||||||
|
|
||||||
|
1. **Build Search Query**: Create search with label filters
|
||||||
|
2. **Save Search**: Use "Save Search" option
|
||||||
|
3. **Name Collection**: Give descriptive name (e.g., "Active Projects")
|
||||||
|
4. **Automatic Updates**: Collection updates as documents are labeled
|
||||||
|
5. **Quick Access**: Access collections from sidebar or dashboard
|
||||||
|
|
||||||
|
#### Collection Examples
|
||||||
|
|
||||||
|
**Project-Based Collections:**
|
||||||
|
- "Q1 Budget Documents": `label:"Q1 budget" OR label:"financial planning"`
|
||||||
|
- "Marketing Materials": `label:marketing AND (label:final OR label:approved)`
|
||||||
|
- "Pending Review": `label:"needs review" AND -label:completed`
|
||||||
|
|
||||||
|
**Status-Based Collections:**
|
||||||
|
- "Recent Uploads": `label:"this month" AND -label:processed`
|
||||||
|
- "High Priority": `label:urgent OR label:critical`
|
||||||
|
- "Archive Ready": `label:completed AND label:final`
|
||||||
|
|
||||||
|
### Label-Based Dashboard Views
|
||||||
|
|
||||||
|
#### Custom Dashboard Widgets
|
||||||
|
|
||||||
|
- **Label Statistics**: Show document counts per label
|
||||||
|
- **Recent Activity**: Display recently labeled documents
|
||||||
|
- **Label Trends**: Track labeling patterns over time
|
||||||
|
- **Quick Access**: Direct links to frequently used label filters
|
||||||
|
|
||||||
|
## Label Organization Strategies
|
||||||
|
|
||||||
|
### Hierarchical Labeling
|
||||||
|
|
||||||
|
#### Category-Based Organization
|
||||||
|
|
||||||
|
**Structure Example:**
|
||||||
|
```
|
||||||
|
Projects/
|
||||||
|
├── Project Alpha/
|
||||||
|
│ ├── Requirements
|
||||||
|
│ ├── Design
|
||||||
|
│ └── Implementation
|
||||||
|
├── Project Beta/
|
||||||
|
│ ├── Research
|
||||||
|
│ ├── Proposals
|
||||||
|
│ └── Contracts
|
||||||
|
└── Infrastructure/
|
||||||
|
├── Servers
|
||||||
|
├── Network
|
||||||
|
└── Security
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Implementation Approach
|
||||||
|
|
||||||
|
1. **Top-Level Categories**: Create broad organizational labels
|
||||||
|
2. **Subcategories**: Use descriptive naming for specific areas
|
||||||
|
3. **Consistent Naming**: Establish naming conventions across categories
|
||||||
|
4. **Cross-References**: Documents can belong to multiple hierarchies
|
||||||
|
|
||||||
|
### Functional Organization
|
||||||
|
|
||||||
|
#### Document Lifecycle Labels
|
||||||
|
|
||||||
|
**Workflow Stages:**
|
||||||
|
- **Creation**: "Draft", "In Progress", "Under Review"
|
||||||
|
- **Approval**: "Pending Approval", "Approved", "Rejected"
|
||||||
|
- **Distribution**: "Published", "Distributed", "Archived"
|
||||||
|
- **Maintenance**: "Current", "Outdated", "Superseded"
|
||||||
|
|
||||||
|
#### Department-Based Labeling
|
||||||
|
|
||||||
|
**Organizational Structure:**
|
||||||
|
- **Human Resources**: "HR Policy", "Employee Records", "Benefits"
|
||||||
|
- **Finance**: "Invoices", "Budget", "Audit", "Tax Documents"
|
||||||
|
- **Legal**: "Contracts", "Compliance", "IP Documents"
|
||||||
|
- **Operations**: "Procedures", "Manuals", "Incident Reports"
|
||||||
|
|
||||||
|
### Time-Based Organization
|
||||||
|
|
||||||
|
#### Date-Driven Labels
|
||||||
|
|
||||||
|
- **Fiscal Periods**: "Q1 2024", "FY2024", "H1 2024"
|
||||||
|
- **Project Phases**: "Phase 1", "Phase 2", "Final Phase"
|
||||||
|
- **Event-Based**: "Pre-Launch", "Launch", "Post-Launch"
|
||||||
|
- **Seasonal**: "Annual Review", "Budget Season", "Audit Period"
|
||||||
|
|
||||||
|
## Advanced Label Features
|
||||||
|
|
||||||
|
### Label Analytics
|
||||||
|
|
||||||
|
#### Usage Statistics
|
||||||
|
|
||||||
|
**Metrics Available:**
|
||||||
|
- **Document Count**: Number of documents per label
|
||||||
|
- **Recent Activity**: Labels used in recent uploads or assignments
|
||||||
|
- **Growth Trends**: How label usage changes over time
|
||||||
|
- **Popular Labels**: Most frequently used labels
|
||||||
|
- **Unused Labels**: Labels with no current document assignments
|
||||||
|
|
||||||
|
#### Label Performance
|
||||||
|
|
||||||
|
- **Search Frequency**: How often labels are used in searches
|
||||||
|
- **Click-Through Rates**: User engagement with labeled content
|
||||||
|
- **Organization Effectiveness**: How labels improve document discovery
|
||||||
|
|
||||||
|
### Label Automation
|
||||||
|
|
||||||
|
#### Auto-Labeling Rules
|
||||||
|
|
||||||
|
**OCR-Based Labeling:**
|
||||||
|
- **Content Detection**: Automatically label documents based on detected text
|
||||||
|
- **Template Recognition**: Recognize document types and apply appropriate labels
|
||||||
|
- **Entity Extraction**: Label documents based on detected entities (names, dates, amounts)
|
||||||
|
|
||||||
|
**Source-Based Labeling:**
|
||||||
|
- **Upload Location**: Apply labels based on upload source or folder
|
||||||
|
- **File Type**: Automatic labels based on file format and structure
|
||||||
|
- **Metadata**: Labels derived from file properties and EXIF data
|
||||||
|
|
||||||
|
#### Workflow Integration
|
||||||
|
|
||||||
|
- **Process Triggers**: Apply labels based on workflow stage completion
|
||||||
|
- **Approval Status**: Automatic labeling based on approval workflows
|
||||||
|
- **Time-Based Rules**: Apply labels based on document age or schedule
|
||||||
|
|
||||||
|
### Label Import/Export
|
||||||
|
|
||||||
|
#### Bulk Label Operations
|
||||||
|
|
||||||
|
**Import Scenarios:**
|
||||||
|
- **Migration**: Import existing label structures from other systems
|
||||||
|
- **Template Application**: Apply predefined label sets to document collections
|
||||||
|
- **Organizational Standards**: Implement company-wide labeling standards
|
||||||
|
|
||||||
|
**Export Capabilities:**
|
||||||
|
- **Backup**: Export label definitions for backup purposes
|
||||||
|
- **Reporting**: Generate reports of label usage and document organization
|
||||||
|
- **Integration**: Share label structures with other systems
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Label Design
|
||||||
|
|
||||||
|
#### Naming Conventions
|
||||||
|
|
||||||
|
1. **Descriptive Names**: Use clear, self-explanatory label names
|
||||||
|
2. **Consistent Format**: Establish and follow naming patterns
|
||||||
|
3. **Avoid Ambiguity**: Choose names that won't be confused with similar concepts
|
||||||
|
4. **Length Consideration**: Keep names concise but informative
|
||||||
|
5. **Special Characters**: Avoid special characters that may cause issues
|
||||||
|
|
||||||
|
**Good Examples:**
|
||||||
|
- "Q1-2024-Budget" ✅
|
||||||
|
- "Legal-Contract-Template" ✅
|
||||||
|
- "Marketing-Campaign-Assets" ✅
|
||||||
|
|
||||||
|
**Poor Examples:**
|
||||||
|
- "Stuff" ❌ (too vague)
|
||||||
|
- "Q1 Budget Documents for 2024 Financial Planning" ❌ (too long)
|
||||||
|
- "Legal/Contract#Template@2024" ❌ (special characters)
|
||||||
|
|
||||||
|
#### Color Strategy
|
||||||
|
|
||||||
|
1. **Consistent Color Families**: Use similar colors for related label categories
|
||||||
|
2. **High Contrast**: Ensure labels are readable against various backgrounds
|
||||||
|
3. **Color Meaning**: Establish color conventions (e.g., red for urgent, green for completed)
|
||||||
|
4. **Accessibility**: Consider color-blind users when choosing colors
|
||||||
|
5. **Limited Palette**: Don't use too many different colors
|
||||||
|
|
||||||
|
### Organization Strategy
|
||||||
|
|
||||||
|
#### Start Simple
|
||||||
|
|
||||||
|
1. **Basic Categories**: Begin with broad, obvious categories
|
||||||
|
2. **Organic Growth**: Add labels as needs become apparent
|
||||||
|
3. **User Feedback**: Incorporate user suggestions for new labels
|
||||||
|
4. **Regular Review**: Periodically assess and refine label structure
|
||||||
|
|
||||||
|
#### Maintain Consistency
|
||||||
|
|
||||||
|
1. **Documentation**: Document labeling standards and conventions
|
||||||
|
2. **Training**: Educate users on proper labeling practices
|
||||||
|
3. **Regular Cleanup**: Remove unused or redundant labels
|
||||||
|
4. **Standardization**: Ensure consistent application across teams
|
||||||
|
|
||||||
|
### Performance Optimization
|
||||||
|
|
||||||
|
#### Label Management
|
||||||
|
|
||||||
|
1. **Avoid Over-Labeling**: Don't create too many similar labels
|
||||||
|
2. **Regular Cleanup**: Remove unused labels to reduce clutter
|
||||||
|
3. **Search Optimization**: Focus on labels that improve searchability
|
||||||
|
4. **User Training**: Educate users on effective labeling practices
|
||||||
|
|
||||||
|
#### System Performance
|
||||||
|
|
||||||
|
- **Index Optimization**: Labels are indexed for fast search performance
|
||||||
|
- **Bulk Operations**: Use bulk assignment for better efficiency
|
||||||
|
- **Caching**: Frequently used labels are cached for quick access
|
||||||
|
|
||||||
|
## API Integration
|
||||||
|
|
||||||
|
### Label Management API
|
||||||
|
|
||||||
|
#### Creating Labels
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/labels
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"name": "Project Documentation",
|
||||||
|
"color": "#2196F3"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Listing Labels
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/labels
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"labels": [
|
||||||
|
{
|
||||||
|
"id": "550e8400-e29b-41d4-a716-446655440000",
|
||||||
|
"name": "Project Documentation",
|
||||||
|
"color": "#2196F3",
|
||||||
|
"document_count": 42,
|
||||||
|
"created_at": "2024-01-01T00:00:00Z"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Assigning Labels to Documents
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PATCH /api/documents/{document_id}
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"labels": ["Project Documentation", "Q1 2024", "High Priority"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search Integration
|
||||||
|
|
||||||
|
#### Label-Based Search
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GET /api/search?query=invoice&labels=urgent,review
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Advanced Label Queries
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /api/search/advanced
|
||||||
|
Authorization: Bearer <jwt_token>
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"query": "budget",
|
||||||
|
"filters": {
|
||||||
|
"labels": ["Q1 2024", "Finance"],
|
||||||
|
"label_logic": "AND"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Configure [advanced search](advanced-search.md) with label-based filtering
|
||||||
|
- Set up [sources](sources-guide.md) with automatic labeling rules
|
||||||
|
- Explore [user management](user-management-guide.md) for collaborative labeling
|
||||||
|
- Review [API reference](api-reference.md) for programmatic label management
|
||||||
|
- Check [best practices](user-guide.md#tips-for-best-results) for document organization
|
||||||
|
|
@ -0,0 +1,498 @@
|
||||||
|
# Sources Guide
|
||||||
|
|
||||||
|
Readur's Sources feature provides powerful automated document ingestion from multiple external storage systems. This comprehensive guide covers all supported source types and their configuration.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Source Types](#source-types)
|
||||||
|
- [WebDAV Sources](#webdav-sources)
|
||||||
|
- [Local Folder Sources](#local-folder-sources)
|
||||||
|
- [S3 Sources](#s3-sources)
|
||||||
|
- [Getting Started](#getting-started)
|
||||||
|
- [Configuration](#configuration)
|
||||||
|
- [Sync Operations](#sync-operations)
|
||||||
|
- [Health Monitoring](#health-monitoring)
|
||||||
|
- [Troubleshooting](#troubleshooting)
|
||||||
|
- [Best Practices](#best-practices)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Sources allow Readur to automatically discover, download, and process documents from external storage systems. Key features include:
|
||||||
|
|
||||||
|
- **Multi-Protocol Support**: WebDAV, Local Folders, and S3-compatible storage
|
||||||
|
- **Automated Syncing**: Scheduled synchronization with configurable intervals
|
||||||
|
- **Health Monitoring**: Proactive monitoring and validation of source connections
|
||||||
|
- **Intelligent Processing**: Duplicate detection, incremental syncs, and OCR integration
|
||||||
|
- **Real-time Status**: Live sync progress and comprehensive statistics
|
||||||
|
|
||||||
|
### How Sources Work
|
||||||
|
|
||||||
|
1. **Configuration**: Set up a source with connection details and preferences
|
||||||
|
2. **Discovery**: Readur scans the source for supported file types
|
||||||
|
3. **Synchronization**: New and changed files are downloaded and processed
|
||||||
|
4. **OCR Processing**: Documents are automatically queued for text extraction
|
||||||
|
5. **Search Integration**: Processed documents become searchable in your collection
|
||||||
|
|
||||||
|
## Source Types
|
||||||
|
|
||||||
|
### WebDAV Sources
|
||||||
|
|
||||||
|
WebDAV sources connect to cloud storage services and self-hosted servers that support the WebDAV protocol.
|
||||||
|
|
||||||
|
#### Supported WebDAV Servers
|
||||||
|
|
||||||
|
| Server Type | Status | Notes |
|
||||||
|
|-------------|--------|-------|
|
||||||
|
| **Nextcloud** | ✅ Fully Supported | Optimized discovery and authentication |
|
||||||
|
| **ownCloud** | ✅ Fully Supported | Native integration with server detection |
|
||||||
|
| **Apache WebDAV** | ✅ Supported | Generic WebDAV implementation |
|
||||||
|
| **nginx WebDAV** | ✅ Supported | Works with nginx dav module |
|
||||||
|
| **Box.com** | ⚠️ Limited | Basic WebDAV support |
|
||||||
|
| **Other WebDAV** | ✅ Supported | Generic WebDAV protocol compliance |
|
||||||
|
|
||||||
|
#### WebDAV Configuration
|
||||||
|
|
||||||
|
**Required Fields:**
|
||||||
|
- **Name**: Descriptive name for the source
|
||||||
|
- **Server URL**: Full WebDAV server URL (e.g., `https://cloud.example.com/remote.php/dav/files/username/`)
|
||||||
|
- **Username**: WebDAV authentication username
|
||||||
|
- **Password**: WebDAV authentication password or app password
|
||||||
|
|
||||||
|
**Optional Configuration:**
|
||||||
|
- **Watch Folders**: Specific directories to monitor (leave empty to sync entire accessible space)
|
||||||
|
- **File Extensions**: Limit to specific file types (default: all supported types)
|
||||||
|
- **Auto Sync**: Enable automatic scheduled synchronization
|
||||||
|
- **Sync Interval**: How often to check for changes (15 minutes to 24 hours)
|
||||||
|
- **Server Type**: Specify server type for optimizations (auto-detected)
|
||||||
|
|
||||||
|
#### Setting Up WebDAV Sources
|
||||||
|
|
||||||
|
1. **Navigate to Sources**: Go to Settings → Sources in the Readur interface
|
||||||
|
2. **Add New Source**: Click "Add Source" and select "WebDAV"
|
||||||
|
3. **Configure Connection**:
|
||||||
|
```
|
||||||
|
Name: My Nextcloud Documents
|
||||||
|
Server URL: https://cloud.mycompany.com/remote.php/dav/files/john/
|
||||||
|
Username: john
|
||||||
|
Password: app-password-here
|
||||||
|
```
|
||||||
|
4. **Test Connection**: Use the "Test Connection" button to verify credentials
|
||||||
|
5. **Configure Folders**: Specify directories to monitor:
|
||||||
|
```
|
||||||
|
Watch Folders:
|
||||||
|
- Documents/
|
||||||
|
- Projects/2024/
|
||||||
|
- Invoices/
|
||||||
|
```
|
||||||
|
6. **Set Sync Schedule**: Choose automatic sync interval (recommended: 30 minutes)
|
||||||
|
7. **Save and Sync**: Save configuration and trigger initial sync
|
||||||
|
|
||||||
|
#### WebDAV Best Practices
|
||||||
|
|
||||||
|
- **Use App Passwords**: Create dedicated app passwords instead of using main account passwords
|
||||||
|
- **Limit Scope**: Specify watch folders to avoid syncing unnecessary files
|
||||||
|
- **Server Optimization**: Let Readur auto-detect server type for optimal performance
|
||||||
|
- **Network Considerations**: Use longer sync intervals for slow connections
|
||||||
|
|
||||||
|
### Local Folder Sources
|
||||||
|
|
||||||
|
Local folder sources monitor directories on the Readur server's filesystem, including mounted network drives.
|
||||||
|
|
||||||
|
#### Use Cases
|
||||||
|
|
||||||
|
- **Watch Folders**: Monitor directories where documents are dropped
|
||||||
|
- **Network Mounts**: Sync from NFS, SMB/CIFS, or other mounted filesystems
|
||||||
|
- **Batch Processing**: Automatically process documents placed in specific folders
|
||||||
|
- **Archive Integration**: Monitor existing document archives
|
||||||
|
|
||||||
|
#### Local Folder Configuration
|
||||||
|
|
||||||
|
**Required Fields:**
|
||||||
|
- **Name**: Descriptive name for the source
|
||||||
|
- **Watch Folders**: Absolute paths to monitor directories
|
||||||
|
|
||||||
|
**Optional Configuration:**
|
||||||
|
- **File Extensions**: Filter by specific file types
|
||||||
|
- **Auto Sync**: Enable scheduled monitoring
|
||||||
|
- **Sync Interval**: Frequency of directory scans
|
||||||
|
- **Recursive**: Include subdirectories in scans
|
||||||
|
- **Follow Symlinks**: Follow symbolic links (use with caution)
|
||||||
|
|
||||||
|
#### Setting Up Local Folder Sources
|
||||||
|
|
||||||
|
1. **Prepare Directory**: Ensure the directory exists and is accessible
|
||||||
|
```bash
|
||||||
|
# Create watch folder
|
||||||
|
mkdir -p /mnt/documents/inbox
|
||||||
|
|
||||||
|
# Set permissions (if needed)
|
||||||
|
chmod 755 /mnt/documents/inbox
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Configure Source**:
|
||||||
|
```
|
||||||
|
Name: Document Inbox
|
||||||
|
Watch Folders: /mnt/documents/inbox
|
||||||
|
File Extensions: pdf,jpg,png,txt,docx
|
||||||
|
Auto Sync: Enabled
|
||||||
|
Sync Interval: 5 minutes
|
||||||
|
Recursive: Yes
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Test Setup**: Place a test document in the folder and verify detection
|
||||||
|
|
||||||
|
#### Network Mount Examples
|
||||||
|
|
||||||
|
**NFS Mount:**
|
||||||
|
```bash
|
||||||
|
# Mount NFS share
|
||||||
|
sudo mount -t nfs 192.168.1.100:/documents /mnt/nfs-docs
|
||||||
|
|
||||||
|
# Configure in Readur
|
||||||
|
Watch Folders: /mnt/nfs-docs/inbox
|
||||||
|
```
|
||||||
|
|
||||||
|
**SMB/CIFS Mount:**
|
||||||
|
```bash
|
||||||
|
# Mount SMB share
|
||||||
|
sudo mount -t cifs //server/documents /mnt/smb-docs -o username=user
|
||||||
|
|
||||||
|
# Configure in Readur
|
||||||
|
Watch Folders: /mnt/smb-docs/processing
|
||||||
|
```
|
||||||
|
|
||||||
|
### S3 Sources
|
||||||
|
|
||||||
|
S3 sources connect to Amazon S3 or S3-compatible storage services for document synchronization.
|
||||||
|
|
||||||
|
#### Supported S3 Services
|
||||||
|
|
||||||
|
| Service | Status | Configuration |
|
||||||
|
|---------|--------|---------------|
|
||||||
|
| **Amazon S3** | ✅ Fully Supported | Standard AWS configuration |
|
||||||
|
| **MinIO** | ✅ Fully Supported | Custom endpoint URL |
|
||||||
|
| **DigitalOcean Spaces** | ✅ Supported | S3-compatible API |
|
||||||
|
| **Wasabi** | ✅ Supported | Custom endpoint configuration |
|
||||||
|
| **Google Cloud Storage** | ⚠️ Limited | S3-compatible mode only |
|
||||||
|
|
||||||
|
#### S3 Configuration
|
||||||
|
|
||||||
|
**Required Fields:**
|
||||||
|
- **Name**: Descriptive name for the source
|
||||||
|
- **Bucket Name**: S3 bucket to monitor
|
||||||
|
- **Region**: AWS region (e.g., `us-east-1`)
|
||||||
|
- **Access Key ID**: AWS/S3 access key
|
||||||
|
- **Secret Access Key**: AWS/S3 secret key
|
||||||
|
|
||||||
|
**Optional Configuration:**
|
||||||
|
- **Endpoint URL**: Custom endpoint for S3-compatible services
|
||||||
|
- **Prefix**: Bucket path prefix to limit scope
|
||||||
|
- **Watch Folders**: Specific S3 "directories" to monitor
|
||||||
|
- **File Extensions**: Filter by file types
|
||||||
|
- **Auto Sync**: Enable scheduled synchronization
|
||||||
|
- **Sync Interval**: Frequency of bucket scans
|
||||||
|
|
||||||
|
#### Setting Up S3 Sources
|
||||||
|
|
||||||
|
1. **Prepare S3 Bucket**: Ensure bucket exists and credentials have access
|
||||||
|
2. **Configure Source**:
|
||||||
|
```
|
||||||
|
Name: Company Documents S3
|
||||||
|
Bucket Name: company-documents
|
||||||
|
Region: us-west-2
|
||||||
|
Access Key ID: AKIAIOSFODNN7EXAMPLE
|
||||||
|
Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
|
||||||
|
Prefix: documents/
|
||||||
|
Watch Folders:
|
||||||
|
- invoices/
|
||||||
|
- contracts/
|
||||||
|
- reports/
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Test Connection**: Verify credentials and bucket access
|
||||||
|
|
||||||
|
#### S3-Compatible Services
|
||||||
|
|
||||||
|
**MinIO Configuration:**
|
||||||
|
```
|
||||||
|
Endpoint URL: https://minio.example.com:9000
|
||||||
|
Bucket Name: documents
|
||||||
|
Region: us-east-1 (can be any value for MinIO)
|
||||||
|
```
|
||||||
|
|
||||||
|
**DigitalOcean Spaces:**
|
||||||
|
```
|
||||||
|
Endpoint URL: https://nyc3.digitaloceanspaces.com
|
||||||
|
Bucket Name: my-documents
|
||||||
|
Region: nyc3
|
||||||
|
```
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
### Adding Your First Source
|
||||||
|
|
||||||
|
1. **Access Sources Management**: Navigate to Settings → Sources
|
||||||
|
2. **Choose Source Type**: Select WebDAV, Local Folder, or S3 based on your needs
|
||||||
|
3. **Configure Connection**: Enter required credentials and connection details
|
||||||
|
4. **Test Connection**: Verify connectivity before saving
|
||||||
|
5. **Configure Sync**: Set up folders to monitor and sync schedule
|
||||||
|
6. **Initial Sync**: Trigger first synchronization to import existing documents
|
||||||
|
|
||||||
|
### Quick Setup Examples
|
||||||
|
|
||||||
|
#### Nextcloud WebDAV
|
||||||
|
```
|
||||||
|
Name: Nextcloud Documents
|
||||||
|
Server URL: https://cloud.company.com/remote.php/dav/files/username/
|
||||||
|
Username: username
|
||||||
|
Password: app-password
|
||||||
|
Watch Folders: Documents/, Shared/
|
||||||
|
Auto Sync: Every 30 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Local Network Drive
|
||||||
|
```
|
||||||
|
Name: Network Archive
|
||||||
|
Watch Folders: /mnt/network/documents
|
||||||
|
File Extensions: pdf,doc,docx,txt
|
||||||
|
Recursive: Yes
|
||||||
|
Auto Sync: Every 15 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
#### AWS S3 Bucket
|
||||||
|
```
|
||||||
|
Name: AWS Document Bucket
|
||||||
|
Bucket: company-docs-bucket
|
||||||
|
Region: us-east-1
|
||||||
|
Access Key: [AWS Access Key]
|
||||||
|
Secret Key: [AWS Secret Key]
|
||||||
|
Prefix: active-documents/
|
||||||
|
Auto Sync: Every 1 hour
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Sync Settings
|
||||||
|
|
||||||
|
**Sync Intervals:**
|
||||||
|
- **Real-time**: Immediate processing (local folders only)
|
||||||
|
- **5-15 minutes**: High-frequency monitoring
|
||||||
|
- **30-60 minutes**: Standard monitoring (recommended)
|
||||||
|
- **2-24 hours**: Low-frequency, large dataset sync
|
||||||
|
|
||||||
|
**File Filtering:**
|
||||||
|
- **File Extensions**: `pdf,jpg,jpeg,png,txt,doc,docx,rtf`
|
||||||
|
- **Size Limits**: Configurable maximum file size (default: 50MB)
|
||||||
|
- **Path Exclusions**: Skip specific directories or file patterns
|
||||||
|
|
||||||
|
### Advanced Configuration
|
||||||
|
|
||||||
|
**Concurrency Settings:**
|
||||||
|
- **Concurrent Files**: Number of files processed simultaneously (default: 5)
|
||||||
|
- **Network Timeout**: Connection timeout for network sources
|
||||||
|
- **Retry Logic**: Automatic retry for failed downloads
|
||||||
|
|
||||||
|
**Deduplication:**
|
||||||
|
- **Hash-based**: SHA-256 content hashing prevents duplicate storage
|
||||||
|
- **Cross-source**: Duplicates detected across all sources
|
||||||
|
- **Metadata Preservation**: Tracks file origins while avoiding storage duplication
|
||||||
|
|
||||||
|
## Sync Operations
|
||||||
|
|
||||||
|
### Manual Sync
|
||||||
|
|
||||||
|
**Trigger Immediate Sync:**
|
||||||
|
1. Navigate to Sources page
|
||||||
|
2. Find the source to sync
|
||||||
|
3. Click the "Sync Now" button
|
||||||
|
4. Monitor progress in real-time
|
||||||
|
|
||||||
|
**Deep Scan:**
|
||||||
|
- Forces complete re-scan of entire source
|
||||||
|
- Useful for detecting changes in large directories
|
||||||
|
- Automatically triggered periodically
|
||||||
|
|
||||||
|
### Sync Status
|
||||||
|
|
||||||
|
**Status Indicators:**
|
||||||
|
- 🟢 **Idle**: Source ready, no sync in progress
|
||||||
|
- 🟡 **Syncing**: Active synchronization in progress
|
||||||
|
- 🔴 **Error**: Sync failed, requires attention
|
||||||
|
- ⚪ **Disabled**: Source disabled, no automatic sync
|
||||||
|
|
||||||
|
**Progress Information:**
|
||||||
|
- Files discovered vs. processed
|
||||||
|
- Current operation (scanning, downloading, processing)
|
||||||
|
- Estimated completion time
|
||||||
|
- Transfer speeds and statistics
|
||||||
|
|
||||||
|
### Stopping Sync
|
||||||
|
|
||||||
|
**Graceful Cancellation:**
|
||||||
|
1. Click "Stop Sync" button during active sync
|
||||||
|
2. Current file processing completes
|
||||||
|
3. Sync stops cleanly without corruption
|
||||||
|
4. Partial progress is saved
|
||||||
|
|
||||||
|
## Health Monitoring
|
||||||
|
|
||||||
|
### Health Scores
|
||||||
|
|
||||||
|
Sources are continuously monitored and assigned health scores (0-100):
|
||||||
|
|
||||||
|
- **90-100**: ✅ Excellent - No issues detected
|
||||||
|
- **75-89**: ⚠️ Good - Minor issues or warnings
|
||||||
|
- **50-74**: ⚠️ Fair - Moderate issues requiring attention
|
||||||
|
- **25-49**: ❌ Poor - Significant problems
|
||||||
|
- **0-24**: ❌ Critical - Severe issues, manual intervention required
|
||||||
|
|
||||||
|
### Health Checks
|
||||||
|
|
||||||
|
**Automatic Validation** (every 30 minutes):
|
||||||
|
- Connection testing
|
||||||
|
- Credential verification
|
||||||
|
- Configuration validation
|
||||||
|
- Sync pattern analysis
|
||||||
|
- Error rate monitoring
|
||||||
|
|
||||||
|
**Common Health Issues:**
|
||||||
|
- Authentication failures
|
||||||
|
- Network connectivity problems
|
||||||
|
- Permission or access issues
|
||||||
|
- Configuration errors
|
||||||
|
- Rate limiting or throttling
|
||||||
|
|
||||||
|
### Health Notifications
|
||||||
|
|
||||||
|
**Alert Types:**
|
||||||
|
- Connection failures
|
||||||
|
- Authentication expires
|
||||||
|
- Sync errors
|
||||||
|
- Performance degradation
|
||||||
|
- Configuration warnings
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
#### WebDAV Connection Problems
|
||||||
|
|
||||||
|
**Symptom**: "Connection failed" or authentication errors
|
||||||
|
**Solutions**:
|
||||||
|
1. Verify server URL format:
|
||||||
|
- Nextcloud: `https://server.com/remote.php/dav/files/username/`
|
||||||
|
- ownCloud: `https://server.com/remote.php/dav/files/username/`
|
||||||
|
- Generic: `https://server.com/webdav/`
|
||||||
|
|
||||||
|
2. Check credentials:
|
||||||
|
- Use app passwords instead of main passwords
|
||||||
|
- Verify username/password combination
|
||||||
|
- Test credentials in web browser or WebDAV client
|
||||||
|
|
||||||
|
3. Network issues:
|
||||||
|
- Verify server is accessible from Readur
|
||||||
|
- Check firewall and SSL certificate issues
|
||||||
|
- Test with curl: `curl -u username:password https://server.com/webdav/`
|
||||||
|
|
||||||
|
#### Local Folder Issues
|
||||||
|
|
||||||
|
**Symptom**: "Permission denied" or "Directory not found"
|
||||||
|
**Solutions**:
|
||||||
|
1. Check directory permissions:
|
||||||
|
```bash
|
||||||
|
ls -la /path/to/watch/folder
|
||||||
|
chmod 755 /path/to/watch/folder # If needed
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify path exists:
|
||||||
|
```bash
|
||||||
|
stat /path/to/watch/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
3. For network mounts:
|
||||||
|
```bash
|
||||||
|
mount | grep /path/to/mount # Verify mount
|
||||||
|
ls -la /path/to/mount # Test access
|
||||||
|
```
|
||||||
|
|
||||||
|
#### S3 Access Problems
|
||||||
|
|
||||||
|
**Symptom**: "Access denied" or "Bucket not found"
|
||||||
|
**Solutions**:
|
||||||
|
1. Verify credentials and permissions:
|
||||||
|
```bash
|
||||||
|
aws s3 ls s3://bucket-name --profile your-profile
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check bucket policy and IAM permissions
|
||||||
|
3. Verify region configuration matches bucket region
|
||||||
|
4. For S3-compatible services, ensure correct endpoint URL
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
#### Slow Sync Performance
|
||||||
|
|
||||||
|
**Causes and Solutions**:
|
||||||
|
1. **Large file sizes**: Increase timeout values, consider file size limits
|
||||||
|
2. **Network latency**: Reduce concurrent connections, increase intervals
|
||||||
|
3. **Server throttling**: Implement longer delays between requests
|
||||||
|
4. **Large directories**: Use watch folders to limit scope
|
||||||
|
|
||||||
|
#### High Resource Usage
|
||||||
|
|
||||||
|
**Optimization Strategies**:
|
||||||
|
1. **Reduce concurrency**: Lower concurrent file processing
|
||||||
|
2. **Increase intervals**: Less frequent sync checks
|
||||||
|
3. **Filter files**: Limit to specific file types and sizes
|
||||||
|
4. **Stagger syncs**: Avoid multiple sources syncing simultaneously
|
||||||
|
|
||||||
|
### Error Recovery
|
||||||
|
|
||||||
|
**Automatic Recovery:**
|
||||||
|
- Failed files are automatically retried
|
||||||
|
- Temporary network issues are handled gracefully
|
||||||
|
- Sync resumes from last successful point
|
||||||
|
|
||||||
|
**Manual Recovery:**
|
||||||
|
1. Check source health status
|
||||||
|
2. Review error logs in source details
|
||||||
|
3. Test connection manually
|
||||||
|
4. Trigger deep scan to reset sync state
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
1. **Use Dedicated Credentials**: Create app-specific passwords and access keys
|
||||||
|
2. **Limit Permissions**: Grant minimum required access to source accounts
|
||||||
|
3. **Regular Rotation**: Periodically update passwords and access keys
|
||||||
|
4. **Network Security**: Use HTTPS/TLS for all connections
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
1. **Strategic Scheduling**: Stagger sync times for multiple sources
|
||||||
|
2. **Scope Limitation**: Use watch folders to limit sync scope
|
||||||
|
3. **File Filtering**: Exclude unnecessary file types and large files
|
||||||
|
4. **Monitor Resources**: Watch CPU, memory, and network usage
|
||||||
|
|
||||||
|
### Organization
|
||||||
|
|
||||||
|
1. **Descriptive Names**: Use clear, descriptive source names
|
||||||
|
2. **Consistent Structure**: Maintain consistent folder organization
|
||||||
|
3. **Documentation**: Document source purposes and configurations
|
||||||
|
4. **Regular Maintenance**: Periodically review and clean up sources
|
||||||
|
|
||||||
|
### Reliability
|
||||||
|
|
||||||
|
1. **Health Monitoring**: Regularly check source health scores
|
||||||
|
2. **Backup Configuration**: Document source configurations
|
||||||
|
3. **Test Scenarios**: Periodically test sync and recovery procedures
|
||||||
|
4. **Monitor Logs**: Review sync logs for patterns or issues
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Configure [notifications](notifications.md) for sync events
|
||||||
|
- Set up [advanced search](advanced-search.md) to find synced documents
|
||||||
|
- Review [OCR optimization](dev/OCR_OPTIMIZATION_GUIDE.md) for processing improvements
|
||||||
|
- Explore [labels and organization](labels-and-organization.md) for document management
|
||||||
|
|
@ -10,11 +10,12 @@ A comprehensive guide to using Readur's features for document management, OCR pr
|
||||||
- [Dashboard](#dashboard)
|
- [Dashboard](#dashboard)
|
||||||
- [Document Management](#document-management)
|
- [Document Management](#document-management)
|
||||||
- [Advanced Search](#advanced-search)
|
- [Advanced Search](#advanced-search)
|
||||||
- [Folder Watching](#folder-watching)
|
- [Sources and Synchronization](#sources-and-synchronization)
|
||||||
- [Document Upload](#document-upload)
|
- [Document Upload](#document-upload)
|
||||||
- [OCR Processing](#ocr-processing)
|
- [OCR Processing](#ocr-processing)
|
||||||
- [Search Features](#search-features)
|
- [Search Features](#search-features)
|
||||||
- [Tags and Organization](#tags-and-organization)
|
- [Labels and Organization](#labels-and-organization)
|
||||||
|
- [User Management](#user-management)
|
||||||
- [User Settings](#user-settings)
|
- [User Settings](#user-settings)
|
||||||
- [Tips for Best Results](#tips-for-best-results)
|
- [Tips for Best Results](#tips-for-best-results)
|
||||||
|
|
||||||
|
|
@ -117,20 +118,30 @@ tag:important invoice # Search within tagged documents
|
||||||
type:pdf contract # Search only PDFs
|
type:pdf contract # Search only PDFs
|
||||||
```
|
```
|
||||||
|
|
||||||
### Folder Watching
|
### Sources and Synchronization
|
||||||
|
|
||||||
The folder watching feature automatically imports documents:
|
Readur's Sources feature provides automated document ingestion from multiple external storage systems:
|
||||||
|
|
||||||
1. **Non-destructive**: Source files remain untouched
|
1. **Multi-Protocol Support**: WebDAV, Local Folders, and S3-compatible storage
|
||||||
2. **Automatic Processing**: New files are detected and processed
|
2. **Non-destructive**: Source files remain untouched in their original locations
|
||||||
3. **Configurable Intervals**: Adjust scan frequency
|
3. **Automated Syncing**: Scheduled synchronization with configurable intervals
|
||||||
4. **Multiple Sources**: Watch local folders, network drives, cloud storage
|
4. **Health Monitoring**: Proactive monitoring and validation of source connections
|
||||||
|
5. **Intelligent Processing**: Duplicate detection, incremental syncs, and OCR integration
|
||||||
|
|
||||||
#### Setting Up Watch Folders
|
#### Supported Source Types
|
||||||
1. Go to Settings → Sources
|
|
||||||
2. Add a new source with type "Local Folder"
|
- **WebDAV Sources**: Nextcloud, ownCloud, generic WebDAV servers
|
||||||
3. Configure the path and scan interval
|
- **Local Folder Sources**: Local filesystem directories and network mounts
|
||||||
4. Enable/disable the source as needed
|
- **S3 Sources**: Amazon S3 and S3-compatible storage (MinIO, DigitalOcean Spaces)
|
||||||
|
|
||||||
|
#### Setting Up Sources
|
||||||
|
1. Navigate to Settings → Sources
|
||||||
|
2. Click "Add Source" and select source type
|
||||||
|
3. Configure connection details and credentials
|
||||||
|
4. Test connection and configure sync settings
|
||||||
|
5. Set up folders to monitor and sync schedule
|
||||||
|
|
||||||
|
> 📖 **For comprehensive source configuration**, see the [Sources Guide](sources-guide.md)
|
||||||
|
|
||||||
## Document Upload
|
## Document Upload
|
||||||
|
|
||||||
|
|
@ -171,43 +182,147 @@ The folder watching feature automatically imports documents:
|
||||||
|
|
||||||
## Search Features
|
## Search Features
|
||||||
|
|
||||||
### Quick Search
|
Readur provides powerful search capabilities with multiple modes and advanced filtering options.
|
||||||
|
|
||||||
|
### Search Modes
|
||||||
|
|
||||||
|
- **Simple Search**: General purpose searching with automatic stemming and fuzzy matching
|
||||||
|
- **Phrase Search**: Find exact phrases using quotes (e.g., `"quarterly report"`)
|
||||||
|
- **Fuzzy Search**: Handle typos and OCR errors with approximate matching (e.g., `invoice~`)
|
||||||
|
- **Boolean Search**: Complex queries with AND, OR, NOT operators
|
||||||
|
|
||||||
|
### Search Interface
|
||||||
|
|
||||||
|
#### Quick Search
|
||||||
- Available in the header on all pages
|
- Available in the header on all pages
|
||||||
- Instant results as you type
|
- Instant results as you type
|
||||||
- Shows top 5 matches with snippets
|
- Shows top 5 matches with snippets
|
||||||
|
- Real-time suggestions
|
||||||
|
|
||||||
### Advanced Search Page
|
#### Advanced Search Page
|
||||||
- Full search interface with all filters
|
- Full search interface with all filters
|
||||||
|
- Multiple search modes selector
|
||||||
|
- Comprehensive filtering options
|
||||||
- Export search results
|
- Export search results
|
||||||
- Save frequently used searches
|
- Save frequently used searches
|
||||||
- Search history
|
- Search history and analytics
|
||||||
|
|
||||||
|
### Advanced Filtering
|
||||||
|
|
||||||
|
- **File Types**: Filter by PDF, images, documents, etc.
|
||||||
|
- **Date Ranges**: Search within specific time periods
|
||||||
|
- **Labels**: Filter by document tags and categories
|
||||||
|
- **Sources**: Search within specific sync sources
|
||||||
|
- **File Size**: Filter by document size ranges
|
||||||
|
- **OCR Status**: Filter by text extraction status
|
||||||
|
|
||||||
### Search Tips
|
### Search Tips
|
||||||
1. Use quotes for exact phrases
|
1. Use quotes for exact phrases: `"project status"`
|
||||||
2. Combine filters for precise results
|
2. Combine text search with filters for precision
|
||||||
3. Use wildcards: `inv*` matches invoice, inventory
|
3. Use wildcards: `proj*` matches project, projects, projection
|
||||||
4. Search in specific fields: `filename:report`
|
4. Search specific fields: `filename:report`, `label:urgent`
|
||||||
|
5. Use boolean logic: `(budget OR financial) AND 2024`
|
||||||
|
|
||||||
## Tags and Organization
|
> 🔍 **For detailed search techniques**, see the [Advanced Search Guide](advanced-search.md)
|
||||||
|
|
||||||
### Creating Tags
|
## Labels and Organization
|
||||||
1. Select document(s)
|
|
||||||
2. Click "Add Tag"
|
|
||||||
3. Enter tag name or select existing
|
|
||||||
4. Tags are color-coded for easy identification
|
|
||||||
|
|
||||||
### Tag Management
|
Readur's labeling system provides comprehensive document organization and categorization capabilities.
|
||||||
- Rename tags globally
|
|
||||||
- Merge similar tags
|
### Label Types
|
||||||
- Delete unused tags
|
|
||||||
- Set tag colors
|
- **User Labels**: Custom labels created and managed by users with full control
|
||||||
|
- **System Labels**: Automatic labels generated by Readur (OCR status, file type, etc.)
|
||||||
|
- **Color Coding**: Visual identification with customizable label colors
|
||||||
|
- **Hierarchical Structure**: Organize labels in categories and subcategories
|
||||||
|
|
||||||
|
### Creating and Managing Labels
|
||||||
|
|
||||||
|
#### Creating Labels
|
||||||
|
1. **Via Settings**: Go to Settings → Labels and click "Create Label"
|
||||||
|
2. **During Upload**: Add labels while uploading documents
|
||||||
|
3. **Document Details**: Add labels directly from document pages
|
||||||
|
4. **Bulk Operations**: Create and assign labels to multiple documents
|
||||||
|
|
||||||
|
#### Label Operations
|
||||||
|
- **Rename**: Change label names (updates all documents)
|
||||||
|
- **Merge**: Combine similar labels into one
|
||||||
|
- **Color Management**: Customize label colors for visual organization
|
||||||
|
- **Bulk Assignment**: Apply labels to multiple documents at once
|
||||||
|
|
||||||
|
### Organization Strategies
|
||||||
|
|
||||||
|
#### Category-Based Organization
|
||||||
|
- **Projects**: "Project Alpha", "Q1 Budget", "Infrastructure"
|
||||||
|
- **Departments**: "HR", "Finance", "Legal", "Marketing"
|
||||||
|
- **Document Types**: "Invoices", "Contracts", "Reports", "Policies"
|
||||||
|
- **Status**: "Draft", "Final", "Approved", "Archived"
|
||||||
|
|
||||||
|
#### Time-Based Organization
|
||||||
|
- **Fiscal Periods**: "Q1 2024", "FY2024", "Annual Review"
|
||||||
|
- **Project Phases**: "Planning", "Implementation", "Review"
|
||||||
|
- **Event-Based**: "Pre-Launch", "Launch", "Post-Launch"
|
||||||
|
|
||||||
### Smart Collections
|
### Smart Collections
|
||||||
Create saved searches based on:
|
Create saved searches that automatically include documents with specific labels:
|
||||||
- Tag combinations
|
- **Active Projects**: Documents with current project labels
|
||||||
- Date ranges
|
- **Pending Review**: Documents labeled for review
|
||||||
- File types
|
- **High Priority**: Documents with urgent or critical labels
|
||||||
- Custom criteria
|
|
||||||
|
> 🏷️ **For comprehensive labeling strategies**, see the [Labels and Organization Guide](labels-and-organization.md)
|
||||||
|
|
||||||
|
## User Management
|
||||||
|
|
||||||
|
Readur provides comprehensive user management with support for both local authentication and enterprise SSO integration.
|
||||||
|
|
||||||
|
### Authentication Methods
|
||||||
|
|
||||||
|
#### Local Authentication
|
||||||
|
- **Traditional Login**: Username and password authentication
|
||||||
|
- **Secure Storage**: Passwords hashed with bcrypt for security
|
||||||
|
- **Self Registration**: Users can create their own accounts (if enabled)
|
||||||
|
|
||||||
|
#### OIDC/SSO Authentication
|
||||||
|
- **Enterprise Integration**: Single Sign-On with corporate identity providers
|
||||||
|
- **Supported Providers**: Microsoft Azure AD, Google Workspace, Okta, Auth0, Keycloak
|
||||||
|
- **Automatic Provisioning**: User accounts created automatically on first login
|
||||||
|
- **Seamless Experience**: Users authenticate with existing corporate credentials
|
||||||
|
|
||||||
|
### User Roles and Permissions
|
||||||
|
|
||||||
|
#### User Role
|
||||||
|
Standard users with access to core document management functionality:
|
||||||
|
- Upload and manage documents
|
||||||
|
- Search and view documents
|
||||||
|
- Configure personal settings
|
||||||
|
- Create and manage labels
|
||||||
|
- Set up personal sources
|
||||||
|
|
||||||
|
#### Admin Role
|
||||||
|
Administrators with full system access and user management capabilities:
|
||||||
|
- **User Management**: Create, modify, and delete user accounts
|
||||||
|
- **System Settings**: Configure global system parameters
|
||||||
|
- **Role Management**: Assign and modify user roles
|
||||||
|
- **System Monitoring**: View system health and performance metrics
|
||||||
|
|
||||||
|
### Administrative Features
|
||||||
|
|
||||||
|
Administrators can access user management via Settings → Users:
|
||||||
|
- **Create Users**: Add new user accounts with role assignment
|
||||||
|
- **Modify Users**: Update user information, roles, and passwords
|
||||||
|
- **User Overview**: View all users with creation dates and roles
|
||||||
|
- **Authentication Methods**: Manage both local and OIDC users
|
||||||
|
- **Bulk Operations**: Perform operations on multiple users
|
||||||
|
|
||||||
|
### Mixed Authentication Environments
|
||||||
|
|
||||||
|
Readur supports both local and OIDC users in the same installation:
|
||||||
|
- Local admin accounts for system management
|
||||||
|
- OIDC user accounts for regular enterprise users
|
||||||
|
- Flexible role assignment regardless of authentication method
|
||||||
|
|
||||||
|
> 👥 **For detailed user administration**, see the [User Management Guide](user-management-guide.md)
|
||||||
|
> 🔐 **For OIDC configuration**, see the [OIDC Setup Guide](oidc-setup.md)
|
||||||
|
|
||||||
## User Settings
|
## User Settings
|
||||||
|
|
||||||
|
|
@ -276,7 +391,21 @@ Create saved searches based on:
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
- Explore the [API Reference](api-reference.md) for automation
|
### Explore Advanced Features
|
||||||
- Learn about [advanced configuration](configuration.md)
|
- [🔗 Sources Guide](sources-guide.md) - Set up WebDAV, Local Folder, and S3 synchronization
|
||||||
- Set up [automated workflows](WATCH_FOLDER.md)
|
- [🔎 Advanced Search](advanced-search.md) - Master search modes, syntax, and optimization
|
||||||
- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md)
|
- [🏷️ Labels & Organization](labels-and-organization.md) - Implement effective document organization
|
||||||
|
- [👥 User Management](user-management-guide.md) - Configure authentication and user administration
|
||||||
|
- [🔐 OIDC Setup](oidc-setup.md) - Integrate with enterprise identity providers
|
||||||
|
|
||||||
|
### System Administration
|
||||||
|
- [📦 Installation Guide](installation.md) - Full installation and setup instructions
|
||||||
|
- [🔧 Configuration](configuration.md) - Environment variables and advanced configuration
|
||||||
|
- [🚀 Deployment Guide](deployment.md) - Production deployment with SSL and monitoring
|
||||||
|
- [📁 Watch Folder Guide](WATCH_FOLDER.md) - Legacy folder watching setup
|
||||||
|
|
||||||
|
### Development and Integration
|
||||||
|
- [🔌 API Reference](api-reference.md) - REST API for automation and integration
|
||||||
|
- [🏗️ Developer Documentation](dev/) - Architecture and development setup
|
||||||
|
- [🔍 OCR Optimization](dev/OCR_OPTIMIZATION_GUIDE.md) - Improve OCR performance
|
||||||
|
- [📊 Queue Architecture](dev/QUEUE_IMPROVEMENTS.md) - Background processing optimization
|
||||||
|
|
@ -0,0 +1,440 @@
|
||||||
|
# User Management Guide
|
||||||
|
|
||||||
|
This comprehensive guide covers user administration, authentication, role-based access control, and user preferences in Readur.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Authentication Methods](#authentication-methods)
|
||||||
|
- [User Roles and Permissions](#user-roles-and-permissions)
|
||||||
|
- [Admin User Management](#admin-user-management)
|
||||||
|
- [User Settings and Preferences](#user-settings-and-preferences)
|
||||||
|
- [OIDC/SSO Integration](#oidcsso-integration)
|
||||||
|
- [Security Best Practices](#security-best-practices)
|
||||||
|
- [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Readur provides a comprehensive user management system with support for both local authentication and enterprise SSO integration. The system features:
|
||||||
|
|
||||||
|
- **Dual Authentication**: Local accounts and OIDC/SSO support
|
||||||
|
- **Role-Based Access Control**: Admin and User roles with distinct permissions
|
||||||
|
- **User Preferences**: Extensive per-user configuration options
|
||||||
|
- **Enterprise Integration**: OIDC support for corporate identity providers
|
||||||
|
- **Security Features**: JWT tokens, bcrypt password hashing, and session management
|
||||||
|
|
||||||
|
## Authentication Methods
|
||||||
|
|
||||||
|
### Local Authentication
|
||||||
|
|
||||||
|
Local authentication uses traditional username/password combinations stored securely in Readur's database.
|
||||||
|
|
||||||
|
#### Features:
|
||||||
|
- **Secure Storage**: Passwords hashed with bcrypt (cost factor 12)
|
||||||
|
- **JWT Tokens**: 24-hour token validity with secure signing
|
||||||
|
- **User Registration**: Self-service account creation (if enabled)
|
||||||
|
- **Password Requirements**: Configurable complexity requirements
|
||||||
|
|
||||||
|
#### Creating Local Users:
|
||||||
|
1. **Admin Creation** (via Settings):
|
||||||
|
- Navigate to Settings → Users (Admin only)
|
||||||
|
- Click "Add User"
|
||||||
|
- Enter username, email, and initial password
|
||||||
|
- Assign user role (Admin or User)
|
||||||
|
|
||||||
|
2. **Self Registration** (if enabled):
|
||||||
|
- Visit the registration page
|
||||||
|
- Provide username, email, and password
|
||||||
|
- Account created with default User role
|
||||||
|
|
||||||
|
### OIDC/SSO Authentication
|
||||||
|
|
||||||
|
OIDC (OpenID Connect) authentication integrates with enterprise identity providers for single sign-on.
|
||||||
|
|
||||||
|
#### Supported Features:
|
||||||
|
- **Standard OIDC Flow**: Authorization code flow with PKCE
|
||||||
|
- **Automatic Discovery**: Reads provider configuration from `.well-known/openid-configuration`
|
||||||
|
- **User Provisioning**: Automatic user creation on first login
|
||||||
|
- **Identity Linking**: Maps OIDC identities to local user accounts
|
||||||
|
- **Profile Sync**: Updates user information from OIDC provider
|
||||||
|
|
||||||
|
#### Supported Providers:
|
||||||
|
- **Microsoft Azure AD**: Enterprise identity management
|
||||||
|
- **Google Workspace**: Google's enterprise SSO
|
||||||
|
- **Okta**: Popular enterprise identity provider
|
||||||
|
- **Auth0**: Developer-friendly authentication platform
|
||||||
|
- **Keycloak**: Open-source identity management
|
||||||
|
- **Generic OIDC**: Any standards-compliant OIDC provider
|
||||||
|
|
||||||
|
See the [OIDC Setup Guide](oidc-setup.md) for detailed configuration instructions.
|
||||||
|
|
||||||
|
## User Roles and Permissions
|
||||||
|
|
||||||
|
### User Role
|
||||||
|
|
||||||
|
**Standard Users** have access to core document management functionality:
|
||||||
|
|
||||||
|
**Permissions:**
|
||||||
|
- ✅ Upload and manage own documents
|
||||||
|
- ✅ Search all documents (based on sharing settings)
|
||||||
|
- ✅ Configure personal settings and preferences
|
||||||
|
- ✅ Create and manage personal labels
|
||||||
|
- ✅ Use OCR processing features
|
||||||
|
- ✅ Access personal sources (WebDAV, local folders, S3)
|
||||||
|
- ✅ View personal notifications
|
||||||
|
- ❌ User management (cannot create/modify other users)
|
||||||
|
- ❌ System-wide settings or configuration
|
||||||
|
- ❌ Access to other users' private documents
|
||||||
|
|
||||||
|
### Admin Role
|
||||||
|
|
||||||
|
**Administrators** have full system access and user management capabilities:
|
||||||
|
|
||||||
|
**Additional Permissions:**
|
||||||
|
- ✅ **User Management**: Create, modify, and delete user accounts
|
||||||
|
- ✅ **System Settings**: Configure global system parameters
|
||||||
|
- ✅ **User Impersonation**: Access other users' documents (if needed)
|
||||||
|
- ✅ **System Monitoring**: View system health and performance metrics
|
||||||
|
- ✅ **Advanced Configuration**: OCR settings, source configurations
|
||||||
|
- ✅ **Security Management**: Token management, authentication settings
|
||||||
|
|
||||||
|
**Default Admin Account:**
|
||||||
|
- Username: `admin`
|
||||||
|
- Default Password: `readur2024` ⚠️ **Change immediately in production!**
|
||||||
|
|
||||||
|
## Admin User Management
|
||||||
|
|
||||||
|
### Accessing User Management
|
||||||
|
|
||||||
|
1. Log in as an administrator
|
||||||
|
2. Navigate to **Settings** → **Users**
|
||||||
|
3. The user management interface displays all system users
|
||||||
|
|
||||||
|
### User Management Operations
|
||||||
|
|
||||||
|
#### Creating Users
|
||||||
|
|
||||||
|
1. **Click "Add User"** in the Users section
|
||||||
|
2. **Fill out user information**:
|
||||||
|
```
|
||||||
|
Username: john.doe
|
||||||
|
Email: john.doe@company.com
|
||||||
|
Password: [secure-password]
|
||||||
|
Role: User (or Admin)
|
||||||
|
```
|
||||||
|
3. **Save** to create the account
|
||||||
|
4. **Notify the user** of their credentials
|
||||||
|
|
||||||
|
#### Modifying Users
|
||||||
|
|
||||||
|
1. **Find the user** in the user list
|
||||||
|
2. **Click "Edit"** or the user row
|
||||||
|
3. **Update information**:
|
||||||
|
- Change email address
|
||||||
|
- Reset password
|
||||||
|
- Modify role (User ↔ Admin)
|
||||||
|
- Update username (if needed)
|
||||||
|
4. **Save changes**
|
||||||
|
|
||||||
|
#### Deleting Users
|
||||||
|
|
||||||
|
1. **Select the user** to delete
|
||||||
|
2. **Click "Delete"**
|
||||||
|
3. **Confirm deletion** (this action cannot be undone)
|
||||||
|
|
||||||
|
**Important Notes:**
|
||||||
|
- Users cannot delete their own accounts
|
||||||
|
- Deleting a user removes all their documents and settings
|
||||||
|
- Consider disabling instead of deleting for user retention
|
||||||
|
|
||||||
|
#### Bulk Operations
|
||||||
|
|
||||||
|
**Future Feature**: Bulk user operations for enterprise deployments:
|
||||||
|
- Bulk user import from CSV
|
||||||
|
- Bulk role changes
|
||||||
|
- Bulk user deactivation
|
||||||
|
|
||||||
|
### User Information Display
|
||||||
|
|
||||||
|
The user management interface shows:
|
||||||
|
- **Username and Email**: Primary identification
|
||||||
|
- **Role**: Current role assignment
|
||||||
|
- **Created Date**: Account creation timestamp
|
||||||
|
- **Last Login**: Recent activity indicator
|
||||||
|
- **Auth Provider**: Local or OIDC authentication method
|
||||||
|
- **Status**: Active/disabled status (future feature)
|
||||||
|
|
||||||
|
## User Settings and Preferences
|
||||||
|
|
||||||
|
### Personal Settings Access
|
||||||
|
|
||||||
|
Users can configure their preferences via:
|
||||||
|
1. **User Menu** → **Settings** (top-right corner)
|
||||||
|
2. **Settings Page** → **Personal** tab
|
||||||
|
|
||||||
|
### Settings Categories
|
||||||
|
|
||||||
|
#### OCR Preferences
|
||||||
|
|
||||||
|
**Language Settings:**
|
||||||
|
- **OCR Language**: Primary language for text recognition (25+ languages)
|
||||||
|
- **Fallback Languages**: Secondary languages for mixed documents
|
||||||
|
- **Auto-Detection**: Automatic language detection (if supported)
|
||||||
|
|
||||||
|
**Processing Options:**
|
||||||
|
- **Image Enhancement**: Enable preprocessing for better OCR results
|
||||||
|
- **Auto-Rotation**: Automatically rotate images for optimal text recognition
|
||||||
|
- **Confidence Threshold**: Minimum confidence level for OCR acceptance
|
||||||
|
- **Processing Priority**: User's OCR queue priority level
|
||||||
|
|
||||||
|
#### Search Preferences
|
||||||
|
|
||||||
|
**Display Settings:**
|
||||||
|
- **Results Per Page**: Number of search results to display (10-100)
|
||||||
|
- **Snippet Length**: Length of text previews in search results
|
||||||
|
- **Fuzzy Search Threshold**: Sensitivity for fuzzy/approximate matching
|
||||||
|
- **Search History**: Enable/disable search query history
|
||||||
|
|
||||||
|
**Search Behavior:**
|
||||||
|
- **Default Sort Order**: Relevance, date, filename, size
|
||||||
|
- **Auto-Complete**: Enable search suggestions
|
||||||
|
- **Real-time Search**: Search as you type functionality
|
||||||
|
|
||||||
|
#### File Processing
|
||||||
|
|
||||||
|
**Upload Settings:**
|
||||||
|
- **Default File Types**: Preferred file types for uploads
|
||||||
|
- **Auto-OCR**: Automatically queue uploads for OCR processing
|
||||||
|
- **Duplicate Handling**: How to handle duplicate file uploads
|
||||||
|
- **File Size Limits**: Personal file size restrictions
|
||||||
|
|
||||||
|
**Storage Preferences:**
|
||||||
|
- **Compression**: Enable compression for storage savings
|
||||||
|
- **Retention Period**: How long to keep documents (if configured)
|
||||||
|
- **Archive Behavior**: Automatic archiving of old documents
|
||||||
|
|
||||||
|
#### Interface Preferences
|
||||||
|
|
||||||
|
**Display Options:**
|
||||||
|
- **Theme**: Light/dark mode preference
|
||||||
|
- **Timezone**: Local timezone for timestamp display
|
||||||
|
- **Date Format**: Preferred date/time display format
|
||||||
|
- **Language**: Interface language (separate from OCR language)
|
||||||
|
|
||||||
|
**Navigation:**
|
||||||
|
- **Default View**: List or grid view for document browser
|
||||||
|
- **Sidebar Collapsed**: Default sidebar state
|
||||||
|
- **Items Per Page**: Default pagination size
|
||||||
|
|
||||||
|
#### Notification Settings
|
||||||
|
|
||||||
|
**Notification Types:**
|
||||||
|
- **OCR Completion**: Notify when document processing completes
|
||||||
|
- **Source Sync**: Notifications for source synchronization events
|
||||||
|
- **System Alerts**: Important system messages and warnings
|
||||||
|
- **Storage Warnings**: Alerts for storage space or quota issues
|
||||||
|
|
||||||
|
**Delivery Methods:**
|
||||||
|
- **In-App Notifications**: Browser notifications within Readur
|
||||||
|
- **Email Notifications**: Email delivery for important events (future)
|
||||||
|
- **Desktop Notifications**: Browser push notifications (future)
|
||||||
|
|
||||||
|
### Source-Specific Settings
|
||||||
|
|
||||||
|
**WebDAV Preferences:**
|
||||||
|
- **Connection Timeout**: How long to wait for WebDAV responses
|
||||||
|
- **Retry Attempts**: Number of retries for failed downloads
|
||||||
|
- **Sync Schedule**: Preferred automatic sync frequency
|
||||||
|
|
||||||
|
**Local Folder Settings:**
|
||||||
|
- **Watch Interval**: How often to scan local directories
|
||||||
|
- **File Permissions**: Permission handling for processed files
|
||||||
|
- **Symlink Handling**: Follow symbolic links during scans
|
||||||
|
|
||||||
|
### Saving and Applying Settings
|
||||||
|
|
||||||
|
1. **Modify preferences** in the settings interface
|
||||||
|
2. **Click "Save Settings"** to apply changes
|
||||||
|
3. **Settings take effect immediately** for most options
|
||||||
|
4. **Some settings** may require logout/login to fully apply
|
||||||
|
|
||||||
|
## OIDC/SSO Integration
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
OIDC integration allows users to authenticate using their corporate credentials without creating separate passwords for Readur.
|
||||||
|
|
||||||
|
### User Experience with OIDC
|
||||||
|
|
||||||
|
#### First-Time Login
|
||||||
|
|
||||||
|
1. **User clicks "Login with SSO"** on login page
|
||||||
|
2. **Redirected to corporate identity provider** (e.g., Azure AD, Okta)
|
||||||
|
3. **User authenticates** with corporate credentials
|
||||||
|
4. **Readur creates user account automatically** with information from OIDC provider
|
||||||
|
5. **User is logged in** and can immediately start using Readur
|
||||||
|
|
||||||
|
#### Subsequent Logins
|
||||||
|
|
||||||
|
1. **Click "Login with SSO"**
|
||||||
|
2. **Automatic redirect** to identity provider
|
||||||
|
3. **Single sign-on** (may not require re-authentication)
|
||||||
|
4. **Immediate access** to Readur
|
||||||
|
|
||||||
|
### OIDC User Account Details
|
||||||
|
|
||||||
|
**Automatic Account Creation:**
|
||||||
|
- **Username**: Derived from OIDC `preferred_username` or `sub` claim
|
||||||
|
- **Email**: Uses OIDC `email` claim
|
||||||
|
- **Role**: Default "User" role (admins can promote later)
|
||||||
|
- **Auth Provider**: Marked as "OIDC" in user management
|
||||||
|
|
||||||
|
**Identity Mapping:**
|
||||||
|
- **OIDC Subject**: Unique identifier from identity provider
|
||||||
|
- **OIDC Issuer**: Identity provider URL
|
||||||
|
- **Linked Accounts**: Maps OIDC identity to Readur user
|
||||||
|
|
||||||
|
### Mixed Authentication Environments
|
||||||
|
|
||||||
|
Readur supports both local and OIDC users in the same installation:
|
||||||
|
|
||||||
|
- **Local Admin Accounts**: For initial setup and emergency access
|
||||||
|
- **OIDC User Accounts**: For regular enterprise users
|
||||||
|
- **Role Management**: Admins can promote OIDC users to admin role
|
||||||
|
- **Account Linking**: Future feature to link local and OIDC accounts
|
||||||
|
|
||||||
|
### OIDC Configuration
|
||||||
|
|
||||||
|
See the detailed [OIDC Setup Guide](oidc-setup.md) for complete configuration instructions.
|
||||||
|
|
||||||
|
## Security Best Practices
|
||||||
|
|
||||||
|
### Password Security
|
||||||
|
|
||||||
|
**For Local Accounts:**
|
||||||
|
1. **Use Strong Passwords**: Minimum 12 characters with mixed case, numbers, symbols
|
||||||
|
2. **Regular Rotation**: Change passwords periodically
|
||||||
|
3. **Unique Passwords**: Don't reuse passwords from other systems
|
||||||
|
4. **Admin Passwords**: Use extra-strong passwords for administrator accounts
|
||||||
|
|
||||||
|
### JWT Token Security
|
||||||
|
|
||||||
|
**Token Management:**
|
||||||
|
- **Secure Storage**: Tokens stored securely in browser localStorage
|
||||||
|
- **Automatic Expiration**: 24-hour token lifetime
|
||||||
|
- **Secure Transmission**: HTTPS required for production
|
||||||
|
- **Token Rotation**: Regular token refresh (future feature)
|
||||||
|
|
||||||
|
### Access Control
|
||||||
|
|
||||||
|
**Role Management:**
|
||||||
|
1. **Principle of Least Privilege**: Grant minimum necessary permissions
|
||||||
|
2. **Regular Review**: Periodically audit user roles and permissions
|
||||||
|
3. **Admin Accounts**: Limit number of administrator accounts
|
||||||
|
4. **Account Deactivation**: Disable accounts for departed users
|
||||||
|
|
||||||
|
### OIDC Security
|
||||||
|
|
||||||
|
**Provider Configuration:**
|
||||||
|
1. **Use HTTPS**: Ensure all OIDC endpoints use HTTPS
|
||||||
|
2. **Client Secret Protection**: Secure storage of OIDC client secrets
|
||||||
|
3. **Scope Limitation**: Request only necessary OIDC scopes
|
||||||
|
4. **Token Validation**: Proper verification of OIDC tokens
|
||||||
|
|
||||||
|
### Monitoring and Auditing
|
||||||
|
|
||||||
|
**Access Monitoring:**
|
||||||
|
- **Login Tracking**: Monitor successful and failed login attempts
|
||||||
|
- **Role Changes**: Audit administrator role assignments
|
||||||
|
- **Account Activity**: Track user document access patterns
|
||||||
|
- **Security Events**: Log authentication and authorization events
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Authentication Issues
|
||||||
|
|
||||||
|
#### Local Login Problems
|
||||||
|
|
||||||
|
**Symptom**: "Invalid username or password"
|
||||||
|
**Solutions**:
|
||||||
|
1. **Verify credentials**: Check username/password carefully
|
||||||
|
2. **Account existence**: Confirm account exists in user management
|
||||||
|
3. **Password reset**: Admin can reset user password
|
||||||
|
4. **Account status**: Ensure account is active/enabled
|
||||||
|
|
||||||
|
#### OIDC Login Problems
|
||||||
|
|
||||||
|
**Symptom**: OIDC login fails or redirects incorrectly
|
||||||
|
**Solutions**:
|
||||||
|
1. **Check OIDC configuration**: Verify client ID, secret, and issuer URL
|
||||||
|
2. **Redirect URI**: Ensure redirect URI is registered with OIDC provider
|
||||||
|
3. **Provider status**: Confirm OIDC provider is operational
|
||||||
|
4. **Network connectivity**: Verify Readur can reach OIDC endpoints
|
||||||
|
|
||||||
|
#### JWT Token Issues
|
||||||
|
|
||||||
|
**Symptom**: "Invalid token" or frequent logouts
|
||||||
|
**Solutions**:
|
||||||
|
1. **Check system time**: Ensure server time is accurate
|
||||||
|
2. **JWT secret**: Verify JWT_SECRET environment variable
|
||||||
|
3. **Token expiration**: Tokens expire after 24 hours
|
||||||
|
4. **Browser storage**: Clear localStorage and re-login
|
||||||
|
|
||||||
|
### User Management Issues
|
||||||
|
|
||||||
|
#### Cannot Create Users
|
||||||
|
|
||||||
|
**Symptom**: User creation fails
|
||||||
|
**Solutions**:
|
||||||
|
1. **Admin permissions**: Ensure logged in as administrator
|
||||||
|
2. **Duplicate usernames**: Check for existing username/email
|
||||||
|
3. **Database connectivity**: Verify database connection
|
||||||
|
4. **Input validation**: Ensure all required fields are provided
|
||||||
|
|
||||||
|
#### User Settings Not Saving
|
||||||
|
|
||||||
|
**Symptom**: Settings changes don't persist
|
||||||
|
**Solutions**:
|
||||||
|
1. **Check permissions**: Ensure user has permission to modify settings
|
||||||
|
2. **Database issues**: Verify database write permissions
|
||||||
|
3. **Browser issues**: Try clearing browser cache
|
||||||
|
4. **Network connectivity**: Ensure stable connection during save
|
||||||
|
|
||||||
|
### Role and Permission Issues
|
||||||
|
|
||||||
|
#### Users Cannot Access Features
|
||||||
|
|
||||||
|
**Symptom**: User reports missing functionality
|
||||||
|
**Solutions**:
|
||||||
|
1. **Check user role**: Verify user has appropriate role assignment
|
||||||
|
2. **Permission scope**: Confirm feature is available to user role
|
||||||
|
3. **Session refresh**: User may need to logout/login after role change
|
||||||
|
4. **Feature availability**: Ensure feature is enabled in system configuration
|
||||||
|
|
||||||
|
#### Admin Access Problems
|
||||||
|
|
||||||
|
**Symptom**: Admin cannot access management features
|
||||||
|
**Solutions**:
|
||||||
|
1. **Role verification**: Confirm user has Admin role
|
||||||
|
2. **Token validity**: Ensure JWT token contains correct role information
|
||||||
|
3. **Database consistency**: Verify role is correctly stored in database
|
||||||
|
4. **Login refresh**: Try logging out and logging back in
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
#### Slow User Operations
|
||||||
|
|
||||||
|
**Symptom**: User management operations are slow
|
||||||
|
**Solutions**:
|
||||||
|
1. **Database performance**: Check database query performance
|
||||||
|
2. **User count**: Large user counts may require pagination
|
||||||
|
3. **Network latency**: OIDC operations may be affected by provider latency
|
||||||
|
4. **System resources**: Monitor CPU and memory usage
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Configure [OIDC integration](oidc-setup.md) for enterprise authentication
|
||||||
|
- Set up [sources](sources-guide.md) for document synchronization
|
||||||
|
- Review [security best practices](deployment.md#security-considerations)
|
||||||
|
- Explore [advanced search](advanced-search.md) capabilities
|
||||||
|
- Configure [labels and organization](labels-and-organization.md) for document management
|
||||||
Loading…
Reference in New Issue