902 lines
24 KiB
Markdown
902 lines
24 KiB
Markdown
# Per-User Watch Directories Documentation
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Architecture and Components](#architecture-and-components)
|
|
3. [Prerequisites and Requirements](#prerequisites-and-requirements)
|
|
4. [Administrator Setup Guide](#administrator-setup-guide)
|
|
5. [User Guide](#user-guide)
|
|
6. [API Reference](#api-reference)
|
|
7. [Configuration Reference](#configuration-reference)
|
|
8. [Security Considerations](#security-considerations)
|
|
9. [Troubleshooting](#troubleshooting)
|
|
10. [Examples and Best Practices](#examples-and-best-practices)
|
|
|
|
## Overview
|
|
|
|
The Per-User Watch Directories feature in Readur allows each user to have their own dedicated folder for automatic document ingestion. When enabled, documents placed in a user's watch directory are automatically processed, OCR'd, and associated with that specific user's account.
|
|
|
|
### Key Benefits
|
|
|
|
- **User Isolation**: Each user's documents remain private and separate
|
|
- **Automatic Attribution**: Documents are automatically assigned to the correct user
|
|
- **Simplified Workflow**: Users can drop files into their folder without manual upload
|
|
- **Batch Processing**: Process multiple documents simultaneously
|
|
- **Integration Support**: Works with network shares, sync tools, and automated workflows
|
|
|
|
### How It Works
|
|
|
|
1. Administrator enables per-user watch directories in configuration
|
|
2. System creates a dedicated folder for each user (e.g., `/data/user_watch/username/`)
|
|
3. Users place documents in their watch folder
|
|
4. Readur's file watcher detects new files
|
|
5. Documents are automatically ingested and associated with the user
|
|
6. OCR processing extracts text for searching
|
|
7. Documents appear in the user's library
|
|
|
|
## Architecture and Components
|
|
|
|
### System Components
|
|
|
|
1. **UserWatchService** (`src/services/user_watch_service.rs`)
|
|
- Manages user-specific watch directories
|
|
- Handles directory creation, validation, and cleanup
|
|
- Provides secure path operations
|
|
|
|
2. **UserWatchManager** (`src/scheduling/user_watch_manager.rs`)
|
|
- Coordinates between file watcher and user management
|
|
- Maps file paths to users
|
|
- Manages user cache for performance
|
|
|
|
3. **File Watcher** (`src/scheduling/watcher.rs`)
|
|
- Monitors both global and per-user directories
|
|
- Determines file ownership based on directory location
|
|
- Triggers document ingestion pipeline
|
|
|
|
4. **API Endpoints** (`src/routes/users.rs`)
|
|
- REST API for managing user watch directories
|
|
- Provides status, creation, and deletion operations
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
user_watch_base_dir/ # Base directory (configurable)
|
|
├── alice/ # User alice's watch directory
|
|
│ ├── document1.pdf
|
|
│ └── report.docx
|
|
├── bob/ # User bob's watch directory
|
|
│ └── invoice.pdf
|
|
└── charlie/ # User charlie's watch directory
|
|
├── presentation.pptx
|
|
└── notes.txt
|
|
```
|
|
|
|
## Prerequisites and Requirements
|
|
|
|
### System Requirements
|
|
|
|
- **Operating System**: Linux, macOS, or Windows with proper file permissions
|
|
- **Storage**: Sufficient disk space for user directories and documents
|
|
- **File System**: Support for directory permissions (recommended: ext4, NTFS, APFS)
|
|
- **Readur Version**: 2.5.4 or later
|
|
|
|
### Software Requirements
|
|
|
|
- PostgreSQL database
|
|
- Readur server with file watching enabled
|
|
- Proper file system permissions for the Readur process
|
|
|
|
### Network Requirements (Optional)
|
|
|
|
- Network file system support (NFS, SMB/CIFS) for remote directories
|
|
- Stable network connection for remote file access
|
|
|
|
## Administrator Setup Guide
|
|
|
|
### Step 1: Enable Per-User Watch Directories
|
|
|
|
Edit your `.env` file or set environment variables:
|
|
|
|
```bash
|
|
# Enable the feature
|
|
ENABLE_PER_USER_WATCH=true
|
|
|
|
# Set the base directory for user watch folders
|
|
USER_WATCH_BASE_DIR=/data/user_watch
|
|
|
|
# Configure watch interval (optional, default: 60 seconds)
|
|
WATCH_INTERVAL_SECONDS=30
|
|
|
|
# Set file stability check (optional, default: 2000ms)
|
|
FILE_STABILITY_CHECK_MS=3000
|
|
|
|
# Set maximum file age to process (optional, default: 24 hours)
|
|
MAX_FILE_AGE_HOURS=48
|
|
```
|
|
|
|
### Step 2: Create Base Directory
|
|
|
|
Ensure the base directory exists with proper permissions:
|
|
|
|
```bash
|
|
# Create the base directory
|
|
sudo mkdir -p /data/user_watch
|
|
|
|
# Set ownership to the user running Readur
|
|
sudo chown readur:readur /data/user_watch
|
|
|
|
# Set permissions (owner: read/write/execute, group: read/execute)
|
|
sudo chmod 755 /data/user_watch
|
|
```
|
|
|
|
### Step 3: Configure Directory Permissions
|
|
|
|
For production environments, configure appropriate permissions:
|
|
|
|
```bash
|
|
# Option 1: Shared group access
|
|
sudo groupadd readur-users
|
|
sudo usermod -a -G readur-users readur
|
|
sudo chgrp -R readur-users /data/user_watch
|
|
sudo chmod -R 2775 /data/user_watch # SGID bit ensures new files inherit group
|
|
|
|
# Option 2: ACL-based permissions (more granular)
|
|
sudo setfacl -R -m u:readur:rwx /data/user_watch
|
|
sudo setfacl -R -d -m u:readur:rwx /data/user_watch
|
|
```
|
|
|
|
### Step 4: Network Share Setup (Optional)
|
|
|
|
To allow users to access their watch directories via network shares:
|
|
|
|
#### SMB/CIFS Share Configuration
|
|
|
|
```ini
|
|
# /etc/samba/smb.conf
|
|
[readur-watch]
|
|
path = /data/user_watch
|
|
valid users = @readur-users
|
|
writable = yes
|
|
browseable = yes
|
|
create mask = 0660
|
|
directory mask = 0770
|
|
force group = readur-users
|
|
```
|
|
|
|
#### NFS Export Configuration
|
|
|
|
```bash
|
|
# /etc/exports
|
|
/data/user_watch *(rw,sync,no_subtree_check,no_root_squash)
|
|
```
|
|
|
|
### Step 5: Restart Readur
|
|
|
|
After configuration, restart the Readur service:
|
|
|
|
```bash
|
|
# Systemd
|
|
sudo systemctl restart readur
|
|
|
|
# Docker
|
|
docker-compose restart readur
|
|
|
|
# Direct execution
|
|
# Stop the current process and start with new configuration
|
|
```
|
|
|
|
### Step 6: Verify Configuration
|
|
|
|
Check the Readur logs to confirm per-user watch is enabled:
|
|
|
|
```bash
|
|
# Check logs for confirmation
|
|
grep "Per-user watch enabled" /var/log/readur/readur.log
|
|
|
|
# Expected output:
|
|
# ✅ Per-user watch enabled: true
|
|
# 📂 User watch base directory: /data/user_watch
|
|
```
|
|
|
|
## User Guide
|
|
|
|
### Accessing Your Watch Directory
|
|
|
|
#### Method 1: Direct File System Access
|
|
|
|
If you have direct access to the server:
|
|
|
|
```bash
|
|
# Navigate to your watch directory
|
|
cd /data/user_watch/your-username/
|
|
|
|
# Copy files
|
|
cp ~/Documents/*.pdf /data/user_watch/your-username/
|
|
|
|
# Move files
|
|
mv ~/Downloads/report.docx /data/user_watch/your-username/
|
|
```
|
|
|
|
#### Method 2: Network Share Access
|
|
|
|
Access via SMB/CIFS on Windows:
|
|
|
|
1. Open File Explorer
|
|
2. Type in address bar: `\\server-name\readur-watch\your-username`
|
|
3. Drag and drop files into your folder
|
|
|
|
Access via SMB/CIFS on macOS:
|
|
|
|
1. Open Finder
|
|
2. Press Cmd+K
|
|
3. Enter: `smb://server-name/readur-watch/your-username`
|
|
4. Drag and drop files into your folder
|
|
|
|
#### Method 3: Sync Tools
|
|
|
|
Use synchronization tools for automatic uploads:
|
|
|
|
```bash
|
|
# Using rsync
|
|
rsync -avz ~/Documents/*.pdf server:/data/user_watch/your-username/
|
|
|
|
# Using rclone
|
|
rclone copy ~/Documents server:user_watch/your-username/
|
|
|
|
# Using Syncthing (configure folder sync)
|
|
# Add /data/user_watch/your-username as a sync folder
|
|
```
|
|
|
|
### Managing Your Watch Directory via Web Interface
|
|
|
|
1. **Check Directory Status**
|
|
- Navigate to Settings → Watch Folder
|
|
- View your watch directory path and status
|
|
- See if directory exists and is enabled
|
|
|
|
2. **Create Your Directory**
|
|
- Click "Create Watch Directory" button
|
|
- System will create your personal folder
|
|
- Confirmation message will appear
|
|
|
|
3. **View Directory Path**
|
|
- Your directory path is displayed
|
|
- Copy path for reference
|
|
- Share with IT for network access setup
|
|
|
|
### Supported File Types
|
|
|
|
Place any of these file types in your watch directory:
|
|
|
|
- **Documents**: PDF, TXT, DOC, DOCX, ODT, RTF
|
|
- **Images**: PNG, JPG, JPEG, TIFF, BMP
|
|
- **Presentations**: PPT, PPTX, ODP
|
|
- **Spreadsheets**: XLS, XLSX, ODS
|
|
|
|
### File Processing Workflow
|
|
|
|
1. **File Detection**: System checks for new files every 30-60 seconds
|
|
2. **Stability Check**: Waits for file to stop changing (2-3 seconds)
|
|
3. **Validation**: Verifies file type and size
|
|
4. **Ingestion**: Creates document record in database
|
|
5. **OCR Queue**: Adds to processing queue
|
|
6. **Text Extraction**: OCR processes the document
|
|
7. **Search Index**: Document becomes searchable
|
|
|
|
### Best Practices for Users
|
|
|
|
1. **File Naming**: Use descriptive names for easier identification
|
|
2. **File Size**: Keep files under 50MB for optimal processing
|
|
3. **Batch Upload**: Can upload multiple files simultaneously
|
|
4. **Organization**: Create subfolders within your watch directory
|
|
5. **Patience**: Allow 1-5 minutes for processing depending on file size
|
|
|
|
## API Reference
|
|
|
|
### Get User Watch Directory Information
|
|
|
|
Retrieve information about a user's watch directory.
|
|
|
|
**Endpoint**: `GET /api/users/{user_id}/watch-directory`
|
|
|
|
**Headers**:
|
|
```http
|
|
Authorization: Bearer {jwt_token}
|
|
```
|
|
|
|
**Response** (200 OK):
|
|
```json
|
|
{
|
|
"user_id": "550e8400-e29b-41d4-a716-446655440000",
|
|
"username": "alice",
|
|
"watch_directory_path": "/data/user_watch/alice",
|
|
"exists": true,
|
|
"enabled": true
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `401 Unauthorized`: Missing or invalid authentication
|
|
- `403 Forbidden`: Insufficient permissions
|
|
- `404 Not Found`: User not found
|
|
- `500 Internal Server Error`: Per-user watch disabled
|
|
|
|
### Create User Watch Directory
|
|
|
|
Create or ensure a user's watch directory exists.
|
|
|
|
**Endpoint**: `POST /api/users/{user_id}/watch-directory`
|
|
|
|
**Headers**:
|
|
```http
|
|
Authorization: Bearer {jwt_token}
|
|
Content-Type: application/json
|
|
```
|
|
|
|
**Request Body**:
|
|
```json
|
|
{
|
|
"ensure_created": true
|
|
}
|
|
```
|
|
|
|
**Response** (200 OK):
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Watch directory ready for user 'alice'",
|
|
"watch_directory_path": "/data/user_watch/alice"
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `401 Unauthorized`: Missing or invalid authentication
|
|
- `403 Forbidden`: Insufficient permissions
|
|
- `404 Not Found`: User not found
|
|
- `500 Internal Server Error`: Creation failed or feature disabled
|
|
|
|
### Delete User Watch Directory
|
|
|
|
Remove a user's watch directory and its contents.
|
|
|
|
**Endpoint**: `DELETE /api/users/{user_id}/watch-directory`
|
|
|
|
**Headers**:
|
|
```http
|
|
Authorization: Bearer {jwt_token}
|
|
```
|
|
|
|
**Note**: Only administrators can delete watch directories.
|
|
|
|
**Response** (200 OK):
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Watch directory removed for user 'alice'",
|
|
"watch_directory_path": null
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `401 Unauthorized`: Missing or invalid authentication
|
|
- `403 Forbidden`: Admin access required
|
|
- `404 Not Found`: User not found
|
|
- `500 Internal Server Error`: Deletion failed
|
|
|
|
### API Usage Examples
|
|
|
|
#### Python Example
|
|
|
|
```python
|
|
import requests
|
|
|
|
# Configuration
|
|
base_url = "https://readur.example.com/api"
|
|
token = "your-jwt-token"
|
|
user_id = "550e8400-e29b-41d4-a716-446655440000"
|
|
|
|
headers = {
|
|
"Authorization": f"Bearer {token}",
|
|
"Content-Type": "application/json"
|
|
}
|
|
|
|
# Get watch directory info
|
|
response = requests.get(
|
|
f"{base_url}/users/{user_id}/watch-directory",
|
|
headers=headers
|
|
)
|
|
info = response.json()
|
|
print(f"Watch directory: {info['watch_directory_path']}")
|
|
print(f"Exists: {info['exists']}")
|
|
|
|
# Create watch directory
|
|
response = requests.post(
|
|
f"{base_url}/users/{user_id}/watch-directory",
|
|
headers=headers,
|
|
json={"ensure_created": True}
|
|
)
|
|
result = response.json()
|
|
if result['success']:
|
|
print(f"Created: {result['watch_directory_path']}")
|
|
```
|
|
|
|
#### JavaScript/TypeScript Example
|
|
|
|
```typescript
|
|
// Using the provided API service
|
|
import { userWatchService } from './services/api';
|
|
|
|
// Get watch directory information
|
|
const getWatchInfo = async (userId: string) => {
|
|
try {
|
|
const response = await userWatchService.getUserWatchDirectory(userId);
|
|
console.log('Watch directory:', response.data.watch_directory_path);
|
|
console.log('Exists:', response.data.exists);
|
|
return response.data;
|
|
} catch (error) {
|
|
console.error('Failed to get watch directory info:', error);
|
|
}
|
|
};
|
|
|
|
// Create watch directory
|
|
const createWatchDirectory = async (userId: string) => {
|
|
try {
|
|
const response = await userWatchService.createUserWatchDirectory(userId);
|
|
if (response.data.success) {
|
|
console.log('Created:', response.data.watch_directory_path);
|
|
}
|
|
return response.data;
|
|
} catch (error) {
|
|
console.error('Failed to create watch directory:', error);
|
|
}
|
|
};
|
|
```
|
|
|
|
#### cURL Examples
|
|
|
|
```bash
|
|
# Get watch directory information
|
|
curl -X GET "https://readur.example.com/api/users/${USER_ID}/watch-directory" \
|
|
-H "Authorization: Bearer ${TOKEN}"
|
|
|
|
# Create watch directory
|
|
curl -X POST "https://readur.example.com/api/users/${USER_ID}/watch-directory" \
|
|
-H "Authorization: Bearer ${TOKEN}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"ensure_created": true}'
|
|
|
|
# Delete watch directory (admin only)
|
|
curl -X DELETE "https://readur.example.com/api/users/${USER_ID}/watch-directory" \
|
|
-H "Authorization: Bearer ${TOKEN}"
|
|
```
|
|
|
|
## Configuration Reference
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `ENABLE_PER_USER_WATCH` | Boolean | `false` | Enable/disable per-user watch directories |
|
|
| `USER_WATCH_BASE_DIR` | String | `./user_watch` | Base directory for all user watch folders |
|
|
| `WATCH_INTERVAL_SECONDS` | Integer | `60` | How often to scan for new files (seconds) |
|
|
| `FILE_STABILITY_CHECK_MS` | Integer | `2000` | Time to wait for file size stability (milliseconds) |
|
|
| `MAX_FILE_AGE_HOURS` | Integer | `24` | Maximum age of files to process (hours) |
|
|
|
|
### Configuration Validation
|
|
|
|
The system performs several validation checks:
|
|
|
|
1. **Path Validation**: Ensures paths are distinct and non-overlapping
|
|
2. **Directory Conflicts**: Prevents USER_WATCH_BASE_DIR from being:
|
|
- The same as UPLOAD_PATH
|
|
- The same as WATCH_FOLDER
|
|
- Inside UPLOAD_PATH
|
|
- Containing UPLOAD_PATH
|
|
|
|
### Docker Configuration
|
|
|
|
When using Docker, mount the user watch directory:
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
readur:
|
|
image: ghcr.io/readur/readur:main
|
|
environment:
|
|
- ENABLE_PER_USER_WATCH=true
|
|
- USER_WATCH_BASE_DIR=/app/user_watch
|
|
- WATCH_INTERVAL_SECONDS=30
|
|
volumes:
|
|
- ./user_watch:/app/user_watch
|
|
- ./uploads:/app/uploads
|
|
- ./watch:/app/watch
|
|
ports:
|
|
- "8000:8000"
|
|
```
|
|
|
|
### Kubernetes Configuration
|
|
|
|
For Kubernetes deployments:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: readur-config
|
|
data:
|
|
ENABLE_PER_USER_WATCH: "true"
|
|
USER_WATCH_BASE_DIR: "/data/user_watch"
|
|
WATCH_INTERVAL_SECONDS: "30"
|
|
---
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: readur
|
|
spec:
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: readur
|
|
image: ghcr.io/readur/readur:main
|
|
envFrom:
|
|
- configMapRef:
|
|
name: readur-config
|
|
volumeMounts:
|
|
- name: user-watch
|
|
mountPath: /data/user_watch
|
|
volumes:
|
|
- name: user-watch
|
|
persistentVolumeClaim:
|
|
claimName: readur-user-watch-pvc
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Username Validation
|
|
|
|
The system enforces strict username validation to prevent security issues:
|
|
|
|
- **Length**: 1-64 characters
|
|
- **Allowed Characters**: Alphanumeric, underscore (_), dash (-)
|
|
- **Prohibited Patterns**:
|
|
- Path traversal attempts (.., /)
|
|
- Hidden directories (starting with .)
|
|
- Null bytes or special characters
|
|
|
|
### Directory Permissions
|
|
|
|
1. **User Isolation**: Each user's directory is separate
|
|
2. **Permission Model**: 755 (owner: rwx, group: r-x, others: r-x)
|
|
3. **Ownership**: Readur process owns all directories
|
|
4. **SGID Bit**: Optional for group inheritance
|
|
|
|
### Path Security
|
|
|
|
- **Canonicalization**: All paths are canonicalized to prevent traversal
|
|
- **Boundary Checking**: Files must be within designated directories
|
|
- **Validation**: Extracted usernames are validated before use
|
|
|
|
### Access Control
|
|
|
|
- **API Protection**: JWT authentication required
|
|
- **Permission Levels**:
|
|
- Users: Can only access their own directory
|
|
- Admins: Can manage all directories
|
|
- **Directory Creation**: Users can create their own, admins can create any
|
|
- **Directory Deletion**: Admin-only operation
|
|
|
|
### Audit Considerations
|
|
|
|
1. **Logging**: All directory operations are logged
|
|
2. **File Attribution**: Documents tracked to source user
|
|
3. **Access Tracking**: API access logged with user context
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues and Solutions
|
|
|
|
#### Issue: Per-user watch directories not working
|
|
|
|
**Symptoms**: Files in user directories are not processed
|
|
|
|
**Solutions**:
|
|
1. Verify feature is enabled:
|
|
```bash
|
|
grep ENABLE_PER_USER_WATCH .env
|
|
# Should show: ENABLE_PER_USER_WATCH=true
|
|
```
|
|
|
|
**Check base directory exists and has correct permissions:** Verify that the base watch directory has been created with proper ownership.
|
|
```bash
|
|
ls -la /data/user_watch
|
|
# Should show readur as owner with 755 permissions
|
|
```
|
|
|
|
**Review logs for errors:** Search for watch directory related error messages in the application logs.
|
|
```bash
|
|
grep -i "user watch" /var/log/readur/readur.log
|
|
```
|
|
|
|
#### Issue: "User watch service not initialized" error
|
|
|
|
**Symptoms**: API returns 500 error when accessing watch directories
|
|
|
|
**Solutions**:
|
|
1. Ensure ENABLE_PER_USER_WATCH=true in configuration
|
|
2. Restart Readur service
|
|
3. Check initialization logs for errors
|
|
|
|
#### Issue: Files not being detected
|
|
|
|
**Symptoms**: Files placed in watch directory are not processed
|
|
|
|
**Solutions**:
|
|
1. Check file permissions:
|
|
```bash
|
|
ls -la /data/user_watch/username/
|
|
# Files should be readable by readur user
|
|
```
|
|
|
|
2. Verify file type is supported:
|
|
```bash
|
|
echo $ALLOWED_FILE_TYPES
|
|
# Ensure your file extension is included
|
|
```
|
|
|
|
3. Check file age restriction:
|
|
```bash
|
|
# Files older than MAX_FILE_AGE_HOURS are ignored
|
|
find /data/user_watch -type f -mtime +1
|
|
```
|
|
|
|
#### Issue: Permission denied errors
|
|
|
|
**Symptoms**: Users cannot write to their watch directories
|
|
|
|
**Solutions**:
|
|
1. Fix directory ownership:
|
|
```bash
|
|
sudo chown -R readur:readur /data/user_watch
|
|
```
|
|
|
|
2. Set correct permissions:
|
|
```bash
|
|
sudo chmod -R 755 /data/user_watch
|
|
```
|
|
|
|
3. For shared access, use group permissions:
|
|
```bash
|
|
sudo chmod -R 775 /data/user_watch
|
|
sudo chgrp -R readur-users /data/user_watch
|
|
```
|
|
|
|
#### Issue: Duplicate documents created
|
|
|
|
**Symptoms**: Same file creates multiple documents
|
|
|
|
**Solutions**:
|
|
1. Ensure file stability check is adequate:
|
|
```bash
|
|
# Increase if files are still being written
|
|
FILE_STABILITY_CHECK_MS=5000
|
|
```
|
|
|
|
2. Check for file system issues (timestamps, inode changes)
|
|
3. Review deduplication settings in configuration
|
|
|
|
### Diagnostic Commands
|
|
|
|
```bash
|
|
# Check if user watch is enabled
|
|
curl -H "Authorization: Bearer $TOKEN" \
|
|
https://readur.example.com/api/users/$USER_ID/watch-directory
|
|
|
|
# List all user directories
|
|
ls -la /data/user_watch/
|
|
|
|
# Check file watcher logs
|
|
journalctl -u readur | grep -i "watch"
|
|
|
|
# Monitor file processing in real-time
|
|
tail -f /var/log/readur/readur.log | grep -E "(Processing new file|watch)"
|
|
|
|
# Check directory permissions
|
|
namei -l /data/user_watch/username/
|
|
|
|
# Find recently modified files
|
|
find /data/user_watch -type f -mmin -60
|
|
|
|
# Check disk space
|
|
df -h /data/user_watch
|
|
```
|
|
|
|
## Examples and Best Practices
|
|
|
|
### Example 1: Small Team Setup
|
|
|
|
For a team of 5-10 users with local file access:
|
|
|
|
```bash
|
|
# .env configuration
|
|
ENABLE_PER_USER_WATCH=true
|
|
USER_WATCH_BASE_DIR=/srv/readur/user_watches
|
|
WATCH_INTERVAL_SECONDS=60
|
|
FILE_STABILITY_CHECK_MS=2000
|
|
MAX_FILE_AGE_HOURS=72
|
|
|
|
# Directory structure
|
|
/srv/readur/user_watches/
|
|
├── alice/
|
|
├── bob/
|
|
├── charlie/
|
|
├── diana/
|
|
└── edward/
|
|
```
|
|
|
|
### Example 2: Enterprise Network Share Integration
|
|
|
|
For larger organizations with network shares:
|
|
|
|
```bash
|
|
# Mount network share
|
|
sudo mount -t cifs //fileserver/readur /mnt/readur \
|
|
-o username=readur,domain=COMPANY
|
|
|
|
# .env configuration
|
|
ENABLE_PER_USER_WATCH=true
|
|
USER_WATCH_BASE_DIR=/mnt/readur/user_watches
|
|
WATCH_INTERVAL_SECONDS=120 # Slower for network
|
|
FILE_STABILITY_CHECK_MS=5000 # Higher for network delays
|
|
```
|
|
|
|
### Example 3: Automated Document Workflow
|
|
|
|
Script for automatic document routing:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Auto-route documents to user watch directories based on metadata
|
|
"""
|
|
import os
|
|
import shutil
|
|
from pathlib import Path
|
|
|
|
def route_document(file_path, user_mapping):
|
|
"""Route document to appropriate user watch directory"""
|
|
|
|
# Extract metadata (example: from filename)
|
|
filename = os.path.basename(file_path)
|
|
|
|
# Determine target user (implement your logic)
|
|
if "invoice" in filename.lower():
|
|
target_user = "accounting"
|
|
elif "report" in filename.lower():
|
|
target_user = "management"
|
|
else:
|
|
target_user = "general"
|
|
|
|
# Move to user's watch directory
|
|
user_watch_dir = Path(f"/data/user_watch/{target_user}")
|
|
if user_watch_dir.exists():
|
|
dest = user_watch_dir / filename
|
|
shutil.move(file_path, dest)
|
|
print(f"Moved {filename} to {target_user}'s watch directory")
|
|
else:
|
|
print(f"User {target_user} watch directory does not exist")
|
|
|
|
# Monitor incoming directory
|
|
incoming_dir = Path("/srv/incoming")
|
|
for file_path in incoming_dir.glob("*.pdf"):
|
|
route_document(file_path, user_mapping={})
|
|
```
|
|
|
|
### Example 4: Bulk User Setup
|
|
|
|
PowerShell script for creating multiple user directories:
|
|
|
|
```powershell
|
|
# bulk-create-watch-dirs.ps1
|
|
$baseUrl = "https://readur.example.com/api"
|
|
$adminToken = "your-admin-token"
|
|
|
|
$users = @("alice", "bob", "charlie", "diana", "edward")
|
|
|
|
foreach ($username in $users) {
|
|
# Get user ID
|
|
$userResponse = Invoke-RestMethod `
|
|
-Uri "$baseUrl/users" `
|
|
-Headers @{Authorization="Bearer $adminToken"}
|
|
|
|
$user = $userResponse | Where-Object {$_.username -eq $username}
|
|
|
|
if ($user) {
|
|
# Create watch directory
|
|
$body = @{ensure_created=$true} | ConvertTo-Json
|
|
|
|
$result = Invoke-RestMethod `
|
|
-Method Post `
|
|
-Uri "$baseUrl/users/$($user.id)/watch-directory" `
|
|
-Headers @{
|
|
Authorization="Bearer $adminToken"
|
|
"Content-Type"="application/json"
|
|
} `
|
|
-Body $body
|
|
|
|
Write-Host "Created watch directory for $username at $($result.watch_directory_path)"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Best Practices Summary
|
|
|
|
#### For Administrators
|
|
|
|
1. **Capacity Planning**: Allocate 1-5GB per user for watch directories
|
|
2. **Backup Strategy**: Include user watch directories in backup plans
|
|
3. **Monitoring**: Set up alerts for disk space and processing failures
|
|
4. **Documentation**: Maintain user guide with network paths
|
|
5. **Testing**: Test with various file types and sizes before deployment
|
|
|
|
#### For Users
|
|
|
|
1. **File Organization**: Use meaningful filenames and folder structure
|
|
2. **File Formats**: Prefer PDF for best OCR results
|
|
3. **Batch Processing**: Group related documents for upload
|
|
4. **Size Limits**: Split large documents if over 50MB
|
|
5. **Patience**: Allow processing time before expecting search results
|
|
|
|
#### For Developers
|
|
|
|
1. **API Integration**: Use provided client libraries when available
|
|
2. **Error Handling**: Implement retry logic for transient failures
|
|
3. **Validation**: Validate file types before placing in watch directories
|
|
4. **Monitoring**: Track processing status via WebSocket updates
|
|
5. **Caching**: Cache user directory paths to reduce API calls
|
|
|
|
### Performance Optimization
|
|
|
|
1. **File System**: Use SSD storage for watch directories
|
|
2. **Network**: Minimize latency for network-mounted directories
|
|
3. **Scheduling**: Adjust watch interval based on usage patterns
|
|
4. **Concurrency**: Configure OCR workers based on CPU cores
|
|
5. **Cleanup**: Implement retention policies for processed files
|
|
|
|
### Migration from Global Watch Directory
|
|
|
|
To migrate from a single global watch directory to per-user directories:
|
|
|
|
1. **Preparation**:
|
|
```bash
|
|
# Backup existing watch directory
|
|
tar -czf watch_backup.tar.gz /data/watch/
|
|
```
|
|
|
|
2. **Enable Feature**:
|
|
```bash
|
|
# Update configuration
|
|
ENABLE_PER_USER_WATCH=true
|
|
USER_WATCH_BASE_DIR=/data/user_watch
|
|
```
|
|
|
|
3. **Create User Directories**:
|
|
```bash
|
|
# Script to create directories for existing users
|
|
for user in $(psql -d readur -c "SELECT username FROM users" -t); do
|
|
mkdir -p "/data/user_watch/$user"
|
|
chown readur:readur "/data/user_watch/$user"
|
|
done
|
|
```
|
|
|
|
4. **Migrate Documents** (optional):
|
|
- Keep existing documents in place
|
|
- Or reassign to appropriate users through the UI
|
|
|
|
5. **Update Documentation**:
|
|
- Notify users of new directory locations
|
|
- Update any automation scripts
|
|
- Revise backup procedures
|
|
|
|
This completes the comprehensive documentation for the Per-User Watch Directories feature in Readur. |