FindMissingBytes

This project consists of developing a high-performance parallel tool that reconstructs a file from a truncated archive, using an expected hash to detect the correct reconstruction. The tool must brute-repair missing tail bytes, unpack the archive, and validate the recovered file.

Phase 1: Archive Format Study & CLI Validation

Define assumptions about the archive structure and implement strict CLI parsing.

Validate truncated archive path, filename, expected hash
Optional command to truncate an archive for testing
Provide clear errors/logs

Functional Output: Inputs are validated and archive structure assumptions are documented.

Phase 2: Truncation Simulation (Optional Input Generator)

Implement controlled archive truncation to produce test samples.

Remove last x bytes from a valid archive
Save truncated version + metadata (x, original hash)
Log generation details

Functional Output: Tool can generate truncated archives for testing.

Phase 3: Parallel Tail Reconstruction Engine

Implement the core parallelized algorithm attempting to rebuild missing bytes.

Estimate missing byte count
Spawn threads/processes to brute-fill missing tail ranges
Stream candidate completions to decompressor/unpacker

Functional Output: Multiple candidates are tested in parallel for archive validity.

Phase 4: Partial Unpack & Integrity Checking

Unpack only the target file from each candidate archive.

Perform quick structural validation
Attempt file extraction
Compute hash incrementally
Abort early on mismatches

Functional Output: Incorrect candidates are rejected fast; valid data reaches hash check.

Phase 5: Correct Match Detection Using Hash

Confirm the reconstruction using the expected hash.

Compare extracted file hash with expected one
On match, stop all workers
Save successful reconstruction

Functional Output: The exact file is recovered and verified by hash.

Phase 6: Output Delivery, Logging & Cleanup

Deliver the recovered file and finalize UX.

Write recovered file to output
Terminate workers cleanly
Log timings, attempts, failures
Provide summary of parallel search

Functional Output: A cleanly extracted, hash-verified file is returned with complete logs.