States of the World

Phase 1: Web crawling and data extraction

Build a crawler to scrape country information from Wikipedia and normalize data fields.

  • Access country pages
  • Extract name, capital, population, density, area, neighbors, language, timezone, political system
  • Clean and validate data

Functional Output: Structured dataset containing all required country information with consistent formatting.


Phase 2: Database schema design and population

Design relational schema and populate database with extracted country data.

  • Create tables with constraints
  • Map extracted fields
  • Insert all country records

Functional Output: Fully populated database reflecting all countries and their attributes.


Phase 3: Basic API setup and routing

Implement REST API with GET routes for core queries.

  • Setup framework (e.g., Flask/FastAPI)
  • Implement /top-10-population and /top-10-density
  • Return JSON responses

Functional Output: API responds with accurate top 10 lists for population and density.


Phase 4: Advanced query routes and filtering

Add routes for flexible queries (by timezone, language, political system, neighbors).

  • Parse query parameters
  • Filter database records
  • Return structured JSON

Functional Output: API supports multiple query filters and returns correct country lists.


Phase 5: API optimization and error handling

Ensure efficient queries, proper indexing, and robust error responses.

  • Add database indexes
  • Handle missing or invalid routes
  • Provide clear error messages

Functional Output: Fast, reliable API with stable performance and informative error handling.


Phase 6: Testing, logging, and documentation

Implement unit/integration tests, request logging, and API documentation.

  • Add automated tests for crawler, database, and API
  • Log all requests and errors
  • Provide usage documentation

Functional Output: Fully tested, documented API with logs and reliable, predictable behavior.