Readur/SUBDIRECTORY_BUG_FIX.md

3.8 KiB

Subdirectory Discovery Bug Fix

Issue Summary

The ETag optimization feature was failing to discover subdirectories during first-time scans, causing it to report "No known subdirectories" and miss thousands of files.

Root Cause

In src/services/webdav_service.rs:933-936, the check_subdirectories_for_changes function was returning empty results when no subdirectories were previously tracked in the database:

// BUGGY CODE (before fix)
if subdirectories.is_empty() {
    info!("📁 No known subdirectories for {}, no changes to process", parent_path);
    return Ok(Vec::new()); // ❌ This was the bug!
}

This logic assumed that if a directory's ETag was unchanged, we only needed to check previously-known subdirectories. However, this failed for directories that hadn't been fully scanned before.

The Fix

Changed the function to fall back to a full recursive scan when no subdirectories are known:

// FIXED CODE (after fix)
if subdirectories.is_empty() {
    info!("📁 No known subdirectories for {}, performing initial scan to discover structure", parent_path);
    return self.discover_files_in_folder_impl(parent_path).await; // ✅ Fixed!
}

Files Changed

  1. src/services/webdav_service.rs:933-936 - Fixed the core logic
  2. tests/integration_webdav_first_time_scan_tests.rs - New integration tests
  3. tests/unit_webdav_subdirectory_edge_cases_tests.rs - New unit tests

Test Coverage Added

Integration Tests (integration_webdav_first_time_scan_tests.rs)

  • test_first_time_directory_scan_with_subdirectories() - Tests the exact bug scenario
  • test_subdirectory_tracking_after_full_scan() - Verifies proper tracking after discovery
  • test_direct_child_identification_edge_cases() - Tests path logic with realistic paths
  • test_file_count_accuracy_per_directory() - Verifies correct file counting
  • test_size_calculation_accuracy() - Verifies size calculations

Unit Tests (unit_webdav_subdirectory_edge_cases_tests.rs)

  • test_comprehensive_directory_extraction() - Tests directory structure extraction
  • test_first_time_scan_scenario_logic() - Tests the exact bug logic
  • test_directory_etag_mapping_accuracy() - Tests ETag handling
  • test_direct_file_counting_precision() - Tests file counting logic
  • test_total_size_calculation_per_directory() - Tests size calculations
  • test_path_edge_cases_and_normalization() - Tests path handling edge cases
  • test_bug_scenario_file_count_verification() - Specifically tests the 7046 files scenario

Expected Behavior Now

Before Fix:

📁 No known subdirectories for /FullerDocuments/JonDocuments, no changes to process
Found 0 files in folder /FullerDocuments/JonDocuments

After Fix:

📁 No known subdirectories for /FullerDocuments/JonDocuments, performing initial scan to discover structure
[Performs full recursive scan]
Found 7046 files in folder /FullerDocuments/JonDocuments

Why This Fix is Safe

  1. Performance: Only affects first-time scans or completely empty directories
  2. Correctness: Ensures all files are discovered even when ETag optimization is enabled
  3. Backward Compatibility: Doesn't change behavior for directories with known subdirectories
  4. Robustness: Falls back to the tried-and-tested full scan method

Test Verification

The fix has been verified with comprehensive test coverage that includes:

  • Real-world directory structures similar to the user's environment
  • Edge cases for path handling and file counting
  • Integration scenarios that test the full optimization workflow
  • Unit tests that isolate the specific logic that was failing

Summary

This fix ensures that the ETag optimization feature works correctly for first-time directory scans while maintaining all the performance benefits for subsequent scans where subdirectories are already known.