FindMissingBytes
This project consists of developing a high-performance parallel tool that reconstructs a file from a truncated archive, using an expected hash to detect the correct reconstruction. The tool must brute-repair missing tail bytes, unpack the archive, and validate the recovered file.
Phase 1: Archive Format Study & CLI Validation
Define assumptions about the archive structure and implement strict CLI parsing.
- Validate truncated archive path, filename, expected hash
- Optional command to truncate an archive for testing
- Provide clear errors/logs
Functional Output: Inputs are validated and archive structure assumptions are documented.
Phase 2: Truncation Simulation (Optional Input Generator)
Implement controlled archive truncation to produce test samples.
- Remove last x bytes from a valid archive
- Save truncated version + metadata (x, original hash)
- Log generation details
Functional Output: Tool can generate truncated archives for testing.
Phase 3: Parallel Tail Reconstruction Engine
Implement the core parallelized algorithm attempting to rebuild missing bytes.
- Estimate missing byte count
- Spawn threads/processes to brute-fill missing tail ranges
- Stream candidate completions to decompressor/unpacker
Functional Output: Multiple candidates are tested in parallel for archive validity.
Phase 4: Partial Unpack & Integrity Checking
Unpack only the target file from each candidate archive.
- Perform quick structural validation
- Attempt file extraction
- Compute hash incrementally
- Abort early on mismatches
Functional Output: Incorrect candidates are rejected fast; valid data reaches hash check.
Phase 5: Correct Match Detection Using Hash
Confirm the reconstruction using the expected hash.
- Compare extracted file hash with expected one
- On match, stop all workers
- Save successful reconstruction
Functional Output: The exact file is recovered and verified by hash.
Phase 6: Output Delivery, Logging & Cleanup
Deliver the recovered file and finalize UX.
- Write recovered file to output
- Terminate workers cleanly
- Log timings, attempts, failures
- Provide summary of parallel search
Functional Output: A cleanly extracted, hash-verified file is returned with complete logs.