Partition Statistics
Phase 1 — Traversal & Raw Metrics
- Parse CLI:
analize_partition.py <drive_letter>(e.g.,D) - Recursively walk through all directories and files on the specified partition
- Count total directories and total files
- Record file extensions (case-insensitive), using
<no-ext>for files without extensions - Accumulate totals per extension: number of files and combined size in bytes
Functional result: Running the script on a test folder prints total directories/files and a raw list of file extensions with their respective counts and sizes.
Phase 2 — Data Aggregation & Proportions
- Convert tallies into relative proportions for both file count and total size
- Sort results by descending frequency and by descending total size
- Combine minor categories into an
Otherbucket for readability - Format output with rounded percentages (two decimals)
Functional result: Console output clearly shows each file type’s percentage of total count and total size, with an additional Other category if many small file types exist.
Phase 3 — Chart Generation
-
Generate visual charts (using Matplotlib or a similar library)for easier interpretation:
- Pie chart for file type distribution by count
- Pie chart for file type distribution by total size
- Bar chart for top-N file types by count
- Bar chart for top-N file types by size
-
Include titles, legends, and percentage/value labels
-
Display charts interactively and save them as PNGs in a
charts/folder and
Functional result: The program produces four correctly labeled charts (and PNG files) showing file-type distribution and relative proportions based on the analyzed data.
Phase 4 — Error Handling & Validation
- Validate command-line arguments and drive existence before scanning
- Gracefully handle permission errors (skip restricted paths)
- Manage long paths and unreadable directories without interruption
- Handle non-UTF8 filenames by falling back to safe encodings
Functional result: The program runs smoothly even with missing permissions, long paths, or encoding issues, skipping problematic files while completing the analysis and generating valid output.