This end-to-end data engineering project uses Azure services to extract, clean and analyze data to provide insights into athlete performance, country rankings and event trends.
It integrates Azure Data Factory for orchestrating data pipelines, Azure Data Lake Storage Gen 2 as the central data storage, and Azure Databricks for scalable data transformation. Data is then stored and modeled in a lake database within Azure Synapse Analytics for optimized querying. Finally Power BI delivers interactive visualizations of the transformed data.
120 years of Olympic history: athletes and results. This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.
Links to the dataset: https://github.com/Nizra/olympics/blob/47bbd491ed276063ed9ff15a8019ff93c3684e6e/noc_regions.csv https://github.com/Nizra/olympics/blob/47bbd491ed276063ed9ff15a8019ff93c3684e6e/athlete_events.csv.zip
Original data source: https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.