Partition Statistics

Phase 1 — Traversal & Raw Metrics

  • Parse CLI: analize_partition.py <drive_letter> (e.g., D)
  • Recursively walk through all directories and files on the specified partition
  • Count total directories and total files
  • Record file extensions (case-insensitive), using <no-ext> for files without extensions
  • Accumulate totals per extension: number of files and combined size in bytes

Functional result: Running the script on a test folder prints total directories/files and a raw list of file extensions with their respective counts and sizes.


Phase 2 — Data Aggregation & Proportions

  • Convert tallies into relative proportions for both file count and total size
  • Sort results by descending frequency and by descending total size
  • Combine minor categories into an Other bucket for readability
  • Format output with rounded percentages (two decimals)

Functional result: Console output clearly shows each file type’s percentage of total count and total size, with an additional Other category if many small file types exist.


Phase 3 — Chart Generation

  • Generate visual charts (using Matplotlib or a similar library)for easier interpretation:

    • Pie chart for file type distribution by count
    • Pie chart for file type distribution by total size
    • Bar chart for top-N file types by count
    • Bar chart for top-N file types by size
  • Include titles, legends, and percentage/value labels

  • Display charts interactively and save them as PNGs in a charts/ folder and

Functional result: The program produces four correctly labeled charts (and PNG files) showing file-type distribution and relative proportions based on the analyzed data.


Phase 4 — Error Handling & Validation

  • Validate command-line arguments and drive existence before scanning
  • Gracefully handle permission errors (skip restricted paths)
  • Manage long paths and unreadable directories without interruption
  • Handle non-UTF8 filenames by falling back to safe encodings

Functional result: The program runs smoothly even with missing permissions, long paths, or encoding issues, skipping problematic files while completing the analysis and generating valid output.