Common Phrases
Two phrases are considered the same, if they have the same wording order. Multiple phrases are separated by ., ;, ? and !.
E.g.: Hello, this is Maria. is considered the same as Hello ... this is Maria but not the same as Hello. This is Maria?
Phase 1 - Basic common phrases comparison
Create a script that reads multiple text files and identifies phrases that appear in more than one file.
- Input multiple text files
- Extract phrases or n-grams
- Compare and list common phrases
Functional result: Script outputs a correct list of repeated phrases across files.
Phase 2 - GUI and enhanced analysis
Add a simple GUI to select files, display common phrases, and optionally highlight frequency or export results.
- File selection via GUI
- Display common phrases with counts
- Export results to CSV or text
Functional result: GUI shows common phrases interactively and allows saving the analysis.