States of the World
Phase 1: Web crawling and data extraction
Build a crawler to scrape country information from Wikipedia and normalize data fields.
- Access country pages
- Extract name, capital, population, density, area, neighbors, language, timezone, political system
- Clean and validate data
Functional Output: Structured dataset containing all required country information with consistent formatting.
Phase 2: Database schema design and population
Design relational schema and populate database with extracted country data.
- Create tables with constraints
- Map extracted fields
- Insert all country records
Functional Output: Fully populated database reflecting all countries and their attributes.
Phase 3: Basic API setup and routing
Implement REST API with GET routes for core queries.
- Setup framework (e.g., Flask/FastAPI)
- Implement
/top-10-populationand/top-10-density - Return JSON responses
Functional Output: API responds with accurate top 10 lists for population and density.
Phase 4: Advanced query routes and filtering
Add routes for flexible queries (by timezone, language, political system, neighbors).
- Parse query parameters
- Filter database records
- Return structured JSON
Functional Output: API supports multiple query filters and returns correct country lists.
Phase 5: API optimization and error handling
Ensure efficient queries, proper indexing, and robust error responses.
- Add database indexes
- Handle missing or invalid routes
- Provide clear error messages
Functional Output: Fast, reliable API with stable performance and informative error handling.
Phase 6: Testing, logging, and documentation
Implement unit/integration tests, request logging, and API documentation.
- Add automated tests for crawler, database, and API
- Log all requests and errors
- Provide usage documentation
Functional Output: Fully tested, documented API with logs and reliable, predictable behavior.