Common Phrases

Two phrases are considered the same, if they have the same wording order. Multiple phrases are separated by ., ;, ? and !.

E.g.: Hello, this is Maria. is considered the same as Hello ... this is Maria but not the same as Hello. This is Maria?

Phase 1 - Basic common phrases comparison

Create a script that reads multiple text files and identifies phrases that appear in more than one file.

  • Input multiple text files
  • Extract phrases or n-grams
  • Compare and list common phrases

Functional result: Script outputs a correct list of repeated phrases across files.

Phase 2 - GUI and enhanced analysis

Add a simple GUI to select files, display common phrases, and optionally highlight frequency or export results.

  • File selection via GUI
  • Display common phrases with counts
  • Export results to CSV or text

Functional result: GUI shows common phrases interactively and allows saving the analysis.