FindMissingBytes

This project consists of developing a high-performance parallel tool that reconstructs a file from a truncated archive, using an expected hash to detect the correct reconstruction. The tool must brute-repair missing tail bytes, unpack the archive, and validate the recovered file.

Phase 1: Archive Format Study & CLI Validation

Define assumptions about the archive structure and implement strict CLI parsing.

  • Validate truncated archive path, filename, expected hash
  • Optional command to truncate an archive for testing
  • Provide clear errors/logs

Functional Output: Inputs are validated and archive structure assumptions are documented.


Phase 2: Truncation Simulation (Optional Input Generator)

Implement controlled archive truncation to produce test samples.

  • Remove last x bytes from a valid archive
  • Save truncated version + metadata (x, original hash)
  • Log generation details

Functional Output: Tool can generate truncated archives for testing.


Phase 3: Parallel Tail Reconstruction Engine

Implement the core parallelized algorithm attempting to rebuild missing bytes.

  • Estimate missing byte count
  • Spawn threads/processes to brute-fill missing tail ranges
  • Stream candidate completions to decompressor/unpacker

Functional Output: Multiple candidates are tested in parallel for archive validity.


Phase 4: Partial Unpack & Integrity Checking

Unpack only the target file from each candidate archive.

  • Perform quick structural validation
  • Attempt file extraction
  • Compute hash incrementally
  • Abort early on mismatches

Functional Output: Incorrect candidates are rejected fast; valid data reaches hash check.


Phase 5: Correct Match Detection Using Hash

Confirm the reconstruction using the expected hash.

  • Compare extracted file hash with expected one
  • On match, stop all workers
  • Save successful reconstruction

Functional Output: The exact file is recovered and verified by hash.


Phase 6: Output Delivery, Logging & Cleanup

Deliver the recovered file and finalize UX.

  • Write recovered file to output
  • Terminate workers cleanly
  • Log timings, attempts, failures
  • Provide summary of parallel search

Functional Output: A cleanly extracted, hash-verified file is returned with complete logs.