Premier League 2024-25: A Data-Driven Analysis¶
An End-to-End Data Analysis & Data Science Project¶
Author: Shorya Raj
Project Overview¶
This project presents a comprehensive analysis (EDA) of the English Premier League 2024-25 season. It transitions from a foundational Data Analysis approach—describing what happened through team performance metrics and player statistics—into a Data Science exploration by testing hypotheses, building explanatory models, and discovering hidden patterns in the data through learning.
The notebook provides a complete, end-to-end workflow, from secure data acquisition via the Kaggle API to final interactive visualizations and a detailed summary of key insights.
Project Highlights¶
- Dual Approach: Combines descriptive data analysis with inferential data science techniques.
- Holistic Coverage: In-depth analysis of both team dynamics and individual player performance.
- Advanced Techniques: Features hypothesis testing, explanatory regression modeling, and K-Means clustering.
- Interactive Visualizations: Employs Plotly, Matplotlib, and Seaborn to create publication-quality charts and dashboards.
- Professional Workflow: Demonstrates best practices in data cleaning, feature engineering, and reporting.
Technologies & Libraries¶
- Data Manipulation & Analysis: Pandas, NumPy
- Data Visualization: Matplotlib, Seaborn, Plotly
- Data Acquisition: Kaggle API
- Statistical Modeling & Machine Learning: SciPy, StatsModels, Scikit-learn
Section 1: Project Overview and Environment Setup ¶
This initial section prepares the notebook environment. It involves installing all necessary Python packages for data analysis, visualization, and statistical modeling, followed by importing the required libraries. A consistent plotting style is also set for all visualizations.
Section 1.1: Package Installation¶
✅ Packages installed successfully!
Section 1.2: Import Libraries¶
✅ Environment setup complete! 📦 Libraries imported successfully!
Section 2: Secure Data Acquisition¶
Here, we automate the process of downloading the Premier League datasets directly from Kaggle. This is achieved using the Kaggle API, with a secure method for handling API credentials directly in the notebook.
✅ Kaggle credentials loaded securely from Colab secrets! 📥 Downloading datasets... ✅ Datasets downloaded successfully! ✅ Extracted premier-league-2024-2025-team-statistics.zip ✅ Extracted football-players-stats-2024-2025.zip 🎉 All datasets ready for analysis!
Section 3: Data Loading and Initial Exploration¶
Once downloaded, the datasets are loaded into Pandas DataFrames. This section includes an initial exploration to understand the structure, shape, and quality of both the team and player statistics data, forming the basis for our subsequent cleaning and analysis.
📂 Extracting and loading datasets... ✅ Data loaded successfully!
Section 3.1: Dataset Overview¶
====================================================================== 📊 DATASET OVERVIEW ====================================================================== 🏟️ Team Statistics: • Shape: (20, 19) • Teams: 20 • Metrics: 19 ⚽ Player Statistics: • Shape: (2854, 165) • Total Players: 2854 • Metrics: 165 📋 Team Statistics Columns: ['Rk', 'Squad', 'MP', 'W', 'D', 'L', 'GF', 'GA', 'GD', 'Pts', 'Pts/MP', 'xG', 'xGA', 'xGD', 'xGD/90', 'Attendance', 'Top Team Scorer', 'Goalkeeper', 'Notes'] 📋 Player Statistics Columns: ['Rk', 'Player', 'Nation', 'Pos', 'Squad', 'Comp', 'Age', 'Born', 'MP', 'Starts', 'Min', '90s', 'Gls', 'Ast', 'G+A', 'G-PK', 'PK', 'PKatt', 'CrdY', 'CrdR', 'xG', 'npxG', 'xAG', 'npxG+xAG', 'G+A-PK', 'xG+xAG', 'PrgC', 'PrgP', 'PrgR', 'Sh', 'SoT', 'SoT%', 'Sh/90', 'SoT/90', 'G/Sh', 'G/SoT', 'Dist', 'FK', 'PK_stats_shooting', 'PKatt_stats_shooting', 'xG_stats_shooting', 'npxG_stats_shooting', 'npxG/Sh', 'G-xG', 'np:G-xG', 'Cmp', 'Att', 'Cmp%', 'TotDist', 'PrgDist', 'Ast_stats_passing', 'xAG_stats_passing', 'xA', 'A-xAG', 'KP', '1/3', 'PPA', 'CrsPA', 'PrgP_stats_passing', 'Live', 'Dead', 'FK_stats_passing_types', 'TB', 'Sw', 'Crs', 'TI', 'CK', 'In', 'Out', 'Str', 'Cmp_stats_passing_types', 'Tkl', 'TklW', 'Def 3rd', 'Mid 3rd', 'Att 3rd', 'Att_stats_defense', 'Tkl%', 'Lost', 'Blocks_stats_defense', 'Sh_stats_defense', 'Pass', 'Int', 'Tkl+Int', 'Clr', 'Err', 'SCA', 'SCA90', 'PassLive', 'PassDead', 'TO', 'Sh_stats_gca', 'Fld', 'Def', 'GCA', 'GCA90', 'Touches', 'Def Pen', 'Def 3rd_stats_possession', 'Mid 3rd_stats_possession', 'Att 3rd_stats_possession', 'Att Pen', 'Live_stats_possession', 'Att_stats_possession', 'Succ', 'Succ%', 'Tkld', 'Tkld%', 'Carries', 'TotDist_stats_possession', 'PrgDist_stats_possession', 'PrgC_stats_possession', '1/3_stats_possession', 'CPA', 'Mis', 'Dis', 'Rec', 'PrgR_stats_possession', 'CrdY_stats_misc', 'CrdR_stats_misc', '2CrdY', 'Fls', 'Fld_stats_misc', 'Off_stats_misc', 'Crs_stats_misc', 'Int_stats_misc', 'TklW_stats_misc', 'PKwon', 'PKcon', 'OG', 'Recov', 'Won', 'Lost_stats_misc', 'Won%', 'GA', 'GA90', 'SoTA', 'Saves', 'Save%', 'W', 'D', 'L', 'CS', 'CS%', 'PKatt_stats_keeper', 'PKA', 'PKsv', 'PKm', 'PSxG', 'PSxG/SoT', 'PSxG+/-', '/90', 'Cmp_stats_keeper_adv', 'Att_stats_keeper_adv', 'Cmp%_stats_keeper_adv', 'Att (GK)', 'Thr', 'Launch%', 'AvgLen', 'Opp', 'Stp', 'Stp%', '#OPA', '#OPA/90', 'AvgDist']
Section 3.2: Data Quality Assessment¶
======================================================================
🔍 DATA QUALITY ASSESSMENT
======================================================================
🏟️ Team Statistics - Missing Values:
Notes 10
dtype: int64
📊 Team Statistics Summary:
Rk MP W D L GF GA GD Pts Pts/MP \
count 20.00 20.0 20.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00
mean 10.50 38.0 14.35 9.30 14.35 55.75 55.75 0.00 52.35 1.38
std 5.92 0.0 6.00 2.87 6.96 14.71 14.42 27.04 18.58 0.49
min 1.00 38.0 2.00 5.00 4.00 26.00 34.00 -60.00 12.00 0.32
25% 5.75 38.0 11.00 7.75 9.75 45.50 45.50 -11.25 42.00 1.11
50% 10.50 38.0 15.00 9.00 12.00 58.00 52.50 3.50 55.00 1.44
75% 15.25 38.0 19.25 10.25 18.50 66.00 62.75 14.25 66.00 1.74
max 20.00 38.0 25.00 15.00 30.00 86.00 86.00 45.00 84.00 2.21
xG xGA xGD xGD/90 Attendance
count 20.00 20.00 20.00 20.00 20.00
mean 53.90 53.89 0.00 0.00 40475.55
std 13.04 12.00 23.30 0.61 16886.82
min 32.60 34.40 -52.10 -1.37 11210.00
25% 45.05 47.28 -6.55 -0.17 29979.75
50% 57.40 49.60 2.70 0.07 35118.50
75% 61.25 58.50 16.20 0.43 54629.75
max 82.20 84.80 43.60 1.15 73747.00
📊 PLayer Statistics Summary:
Rk Age Born MP Starts Min 90s Gls \
count 2854.00 2846.00 2846.00 2854.00 2854.00 2854.00 2854.00 2854.00
mean 1427.50 25.02 1998.64 19.01 13.50 1211.53 13.46 1.68
std 824.02 4.49 4.50 11.50 11.32 965.19 10.72 3.15
min 1.00 15.00 1982.00 1.00 0.00 1.00 0.00 0.00
25% 714.25 22.00 1996.00 9.00 3.00 317.25 3.50 0.00
50% 1427.50 25.00 1999.00 20.00 11.00 1052.50 11.70 0.00
75% 2140.75 28.00 2002.00 30.00 23.00 1996.75 22.20 2.00
max 2854.00 41.00 2008.00 38.00 38.00 3420.00 38.00 31.00
Ast G+A ... Att (GK) Thr Launch% AvgLen Opp \
count 2854.00 2854.00 ... 212.00 212.00 212.00 212.00 212.00
mean 1.20 2.88 ... 491.60 69.45 34.14 33.04 226.56
std 1.95 4.53 ... 410.27 57.99 14.24 6.07 187.82
min 0.00 0.00 ... 1.00 0.00 0.00 6.00 0.00
25% 0.00 0.00 ... 112.25 15.75 25.45 29.48 55.75
50% 0.00 1.00 ... 397.50 55.00 33.20 32.45 175.50
75% 2.00 4.00 ... 847.25 120.25 41.02 35.90 408.00
max 18.00 47.00 ... 1498.00 197.00 92.30 56.30 710.00
Stp Stp% #OPA #OPA/90 AvgDist
count 212.00 211.00 212.00 212.00 208.00
mean 14.38 6.16 18.77 1.16 13.91
std 13.87 4.07 18.28 1.01 3.73
min 0.00 0.00 0.00 0.00 2.00
25% 2.00 4.00 3.00 0.67 11.98
50% 10.50 5.60 14.00 1.00 13.70
75% 22.00 7.90 30.25 1.47 15.52
max 64.00 33.30 89.00 10.00 28.00
[8 rows x 160 columns]
Section 3.3: Player Data Filtering and Assessment¶
====================================================================== ⚽ PREMIER LEAGUE PLAYER DATA ====================================================================== ✅ Premier League Players Found: 574 🏟️ Teams Represented: 20 📍 Position Distribution: • DF: 186 players • MF: 112 players • FW: 85 players • FW,MF: 60 players • GK: 44 players • MF,FW: 44 players • DF,MF: 16 players • MF,DF: 13 players • FW,DF: 7 players • DF,FW: 7 players
Section 4: Data Cleaning and Feature Engineering¶
Data preprocessing is a critical step for ensuring the accuracy and reliability of any analysis. In this section, we clean the datasets by handling missing values and create new, insightful features that will enable a deeper analysis of team and player performance.
🧹 Cleaning team statistics... ✅ Team data cleaned and enhanced! 🧹 Cleaning player statistics... ✅ Player data cleaned! 506 players with 90+ minutes
Section 4.1: Feature Engineering Summary¶
====================================================================== ⚙️ FEATURE ENGINEERING SUMMARY ====================================================================== 🏟️ Team Metrics Added: • Goals_Per_Game • Goals_Against_Per_Game • Goal_Difference_Per_Game • Win_Rate (%) • Points_Per_Game ⚽ Player Metrics Added: • Goals_Per_90 • Assists_Per_90 • Goal_Contributions_Per_90 📊 Clean Data Summary: • Teams: 20 • Players: 506 ✅ Data preparation complete!
Section 5: Final League Table and Standings Analysis¶
The core analysis begins by reconstructing the final Premier League table. This section breaks down the season's outcomes, identifying the champions, teams qualifying for European competitions, and the relegated clubs. This provides the foundational context for all subsequent team performance analysis.
🏆 PREMIER LEAGUE 2024-25 FINAL STANDINGS ANALYSIS
================================================================================
📋 FINAL LEAGUE TABLE:
==========================================================================================
Position Squad MP W D L GF GA GD Pts
1 Liverpool 38 25 9 4 86 41 45 84
2 Arsenal 38 20 14 4 69 34 35 74
3 Manchester City 38 21 8 9 72 44 28 71
4 Chelsea 38 20 9 9 64 43 21 69
5 Newcastle Utd 38 20 6 12 68 47 21 66
6 Aston Villa 38 19 9 10 58 51 7 66
7 Nott'ham Forest 38 19 8 11 58 46 12 65
8 Brighton 38 16 13 9 66 59 7 61
9 Bournemouth 38 15 11 12 58 46 12 56
10 Brentford 38 16 8 14 66 57 9 56
11 Fulham 38 15 9 14 54 54 0 54
12 Crystal Palace 38 13 14 11 51 51 0 53
13 Everton 38 11 15 12 42 44 -2 48
14 West Ham 38 11 10 17 46 62 -16 43
15 Manchester Utd 38 11 9 18 44 54 -10 42
16 Wolves 38 12 6 20 54 69 -15 42
17 Tottenham 38 11 5 22 64 65 -1 38
18 Leicester City 38 6 7 25 33 80 -47 25
19 Ipswich Town 38 4 10 24 36 82 -46 22
20 Southampton 38 2 6 30 26 86 -60 12
Section 5.1: European Qualification and Relegation Zones¶
============================================================ 🎯 QUALIFICATION AND RELEGATION ZONES ============================================================ 🏆 CHAMPIONS LEAGUE QUALIFIERS (Top 5): 1. Liverpool - 84 points 2. Arsenal - 74 points 3. Manchester City - 71 points 4. Chelsea - 69 points 5. Newcastle Utd - 66 points 🥈 EUROPA LEAGUE QUALIFIERS (6th-7th): 6. Aston Villa - 66 points 7. Nott'ham Forest - 65 points ⬇️ RELEGATED TEAMS: 18. Leicester City - 25 points 19. Ipswich Town - 22 points 20. Southampton - 12 points
Section 5.2: League Table Visualization¶
Section 6: Team Performance Analysis¶
This section moves beyond the final standings to conduct a detailed examination of team performance. The analysis is broken down into three key areas: attacking prowess, defensive solidity, and overall team efficiency, using both traditional and advanced metrics like Expected Goals (xG).
🔥 ATTACKING PERFORMANCE ANALYSIS
============================================================
⚽ TOP 5 ATTACKING TEAMS:
Squad GF MP Goals_Per_Game
Liverpool 86 38 2.26
Manchester City 72 38 1.89
Arsenal 69 38 1.82
Newcastle Utd 68 38 1.79
Brighton 66 38 1.74
📈 TOP 5 GOAL OVERPERFORMERS (vs Expected Goals):
Squad GF xG xG_Difference
Nott'ham Forest 58 45.5 12.5
Wolves 54 43.7 10.3
Arsenal 69 59.9 9.1
Brighton 66 58.7 7.3
Brentford 66 59.0 7.0
Section 6.1: Attacking Performance Visualization¶
📊 Creating attacking performance visualizations...
✅ Attacking visualizations created with team labels!
Section 6.2: Defensive Performance Analysis¶
============================================================
🛡️ DEFENSIVE PERFORMANCE ANALYSIS
============================================================
🛡️ BEST DEFENSIVE TEAMS (Fewest Goals Conceded):
Squad GA MP Goals_Against_Per_Game
Arsenal 34 38 0.89
Liverpool 41 38 1.08
Chelsea 43 38 1.13
Manchester City 44 38 1.16
Everton 44 38 1.16
📊 BEST GOAL DIFFERENCE:
Squad GD GF GA
Liverpool 45 86 41
Arsenal 35 69 34
Manchester City 28 72 44
Chelsea 21 64 43
Newcastle Utd 21 68 47
Section 6.3: Team Efficiency Analysis¶
============================================================
⚡ TEAM EFFICIENCY ANALYSIS
============================================================
⚡ MOST EFFICIENT TEAMS (Points per Game):
Squad Points_Per_Game Win_Rate Goals_Per_Game
Liverpool 2.21 65.79 2.26
Arsenal 1.95 52.63 1.82
Manchester City 1.87 55.26 1.89
Chelsea 1.82 52.63 1.68
Newcastle Utd 1.74 52.63 1.79
📈 BIGGEST OVERPERFORMERS (Actual vs Expected):
Squad GD xGD Performance_vs_Expected
Nott'ham Forest 12 -3.4 15.4
Arsenal 35 25.5 9.5
Manchester City 28 20.4 7.6
Brentford 9 3.6 5.4
Tottenham -1 -4.5 3.5
Section 7: Statistical Insights and Correlations¶
To understand the underlying drivers of success, this section applies statistical methods to the team data. We explore the relationships between different performance metrics using correlation heatmaps and rank-based visualizations to identify which factors are most strongly associated with winning points and a positive goal difference.
🔍 CALCULATING STATISTICAL CORRELATIONS ============================================================ ⭐ FACTORS MOST CORRELATED WITH POINTS: ---------------------------------------- • Points_Per_Game : 1.000 • Pts/MP : 1.000 • Win_Rate : 0.988 • W : 0.988 • Goal_Difference_Per_Game: 0.970 • GD : 0.970 • xGD/90 : 0.944 🎯 FACTORS MOST CORRELATED WITH GOAL DIFFERENCE: ---------------------------------------- • Goal_Difference_Per_Game: 1.000 • xGD : 0.975 • xGD/90 : 0.975 • Points_Per_Game : 0.970 • Pts : 0.970
Section 7.1.1: Correlation Heatmap Visualization¶
📊 Creating enhanced correlation visualizations...
✅ Enhanced correlation visualizations created!
Section 7.1.2: Team Rankings Heatmap¶
📊 Creating comprehensive team rankings heatmap...
✅ Team rankings heatmap created! 🏆 CATEGORY LEADERS: • Overall: Liverpool • Attack: Liverpool • Defense: Arsenal • Efficiency: Liverpool • Consistency: Liverpool
Section 7.2: League Statistical Summary¶
============================================================ 📈 LEAGUE STATISTICAL SUMMARY ============================================================ 🏟️ SEASON OVERVIEW: • Total Goals Scored: 1,115 • Total Games Played: 380 • Average Goals per Game: 2.93 📊 POINTS DISTRIBUTION: • Average Points: 52.4 • Standard Deviation: 18.6 • Points Range: 72 points • League Competitiveness: ⚖️ Moderately Competitive ⚽ GOALS ANALYSIS: • Highest Scoring Team: Liverpool (86 goals) • Best Defensive Team: Arsenal (34 conceded) • Goal Difference Range: -60 to +45
Section 7.3: Performance Distribution Visualization¶
📊 Creating performance distribution visualizations...
✅ Performance distribution visualizations created with team names!
Section 8: Comprehensive Premier League Player Analysis¶
This section provides detailed analysis of individual player performance across all positions. We examine top performers, efficiency metrics, and positional effectiveness to identify standout players and performance patterns.
🔍 LOADING AND PREPARING PREMIER LEAGUE PLAYER DATA
======================================================================
✅ Found 574 Premier League players!
Teams represented: 20
Positions: {'DF': 186, 'MF': 112, 'FW': 85, 'FW,MF': 60, 'GK': 44, 'MF,FW': 44, 'DF,MF': 16, 'MF,DF': 13, 'FW,DF': 7, 'DF,FW': 7}
📊 Active players (90+ minutes): 506
Section 8.1: Top Scorers and Goal Contributors¶
======================================================================
⚽ TOP GOAL SCORERS AND CREATORS ANALYSIS
======================================================================
🔥 TOP 15 GOAL SCORERS:
Player Squad Pos Gls Ast MP Goals_Per_90 xG Goals_vs_xG
Mohamed Salah Liverpool FW 29 18 38 0.77 25.2 3.8
Alexander Isak Newcastle Utd FW 23 6 34 0.75 20.3 2.7
Erling Haaland Manchester City FW 22 3 31 0.72 22.0 0.0
Bryan Mbeumo Brentford FW 20 7 38 0.53 12.3 7.7
Chris Wood Nott'ham Forest FW 20 3 36 0.61 13.4 6.6
Yoane Wissa Brentford FW 19 4 35 0.59 18.5 0.5
Ollie Watkins Aston Villa FW 16 8 38 0.55 15.3 0.7
Matheus Cunha Wolves MF,FW 15 6 33 0.52 8.6 6.4
Cole Palmer Chelsea MF,FW 15 8 37 0.42 17.3 -2.3
Jean-Philippe Mateta Crystal Palace FW 14 2 37 0.48 13.5 0.5
Jørgen Strand Larsen Wolves FW 14 4 35 0.49 10.3 3.7
Jarrod Bowen West Ham FW,MF 13 8 34 0.39 8.6 4.4
Luis Díaz Liverpool FW 13 5 36 0.49 12.0 1.0
Liam Delap Ipswich Town FW 12 2 37 0.42 9.3 2.7
Raúl Jiménez Fulham FW 12 3 38 0.43 12.0 0.0
🎯 TOP 15 ASSIST PROVIDERS:
Player Squad Pos Ast Gls MP Assists_Per_90 xAG Assists_vs_xAG
Mohamed Salah Liverpool FW 18 29 38 0.48 14.2 3.8
Jacob Murphy Newcastle Utd FW 12 8 35 0.46 8.9 3.1
Anthony Elanga Nott'ham Forest FW,MF 11 6 38 0.40 5.7 5.3
Mikkel Damsgaard Brentford MF,FW 10 2 38 0.31 8.4 1.6
Bruno Fernandes Manchester Utd MF 10 8 36 0.30 8.5 1.5
Antonee Robinson Fulham DF 10 0 36 0.28 4.2 5.8
Morgan Rogers Aston Villa FW,MF 10 8 37 0.29 7.8 2.2
Bukayo Saka Arsenal FW,MF 10 6 25 0.52 7.6 2.4
Son Heung-min Tottenham FW 9 7 30 0.38 8.2 0.8
Jarrod Bowen West Ham FW,MF 8 13 34 0.24 6.8 1.2
Eberechi Eze Crystal Palace MF,FW 8 8 34 0.28 6.2 1.8
Morgan Gibbs-White Nott'ham Forest MF 8 7 34 0.26 5.0 3.0
Cole Palmer Chelsea MF,FW 8 15 37 0.23 10.9 -2.9
Sávio Manchester City FW,MF 8 1 29 0.41 6.9 1.1
Ollie Watkins Aston Villa FW 8 16 38 0.28 3.3 4.7
🏅 TOP 15 GOAL CONTRIBUTORS (Goals + Assists):
Player Squad Pos Gls Ast Goal_Contributions Goal_Contributions_Per_90 MP
Mohamed Salah Liverpool FW 29 18 47 1.25 38
Alexander Isak Newcastle Utd FW 23 6 29 0.95 34
Bryan Mbeumo Brentford FW 20 7 27 0.71 38
Erling Haaland Manchester City FW 22 3 25 0.82 31
Ollie Watkins Aston Villa FW 16 8 24 0.83 38
Cole Palmer Chelsea MF,FW 15 8 23 0.65 37
Yoane Wissa Brentford FW 19 4 23 0.71 35
Chris Wood Nott'ham Forest FW 20 3 23 0.70 36
Jarrod Bowen West Ham FW,MF 13 8 21 0.64 34
Matheus Cunha Wolves MF,FW 15 6 21 0.73 33
Jacob Murphy Newcastle Utd FW 8 12 20 0.76 35
Luis Díaz Liverpool FW 13 5 18 0.67 36
Bruno Fernandes Manchester Utd MF 8 10 18 0.54 36
Justin Kluivert Bournemouth MF 12 6 18 0.69 34
Morgan Rogers Aston Villa FW,MF 8 10 18 0.52 37
⚡ MOST EFFICIENT SCORERS (Goals per 90 min, 500+ minutes):
Player Squad Pos Gls Goals_Per_90 Min xG_Per_90
Jáder Durán Aston Villa FW 7 0.99 638 0.69
Mohamed Salah Liverpool FW 29 0.77 3371 0.67
Alexander Isak Newcastle Utd FW 23 0.75 2756 0.66
Rodrigo Muniz Fulham FW 8 0.75 964 0.54
Erling Haaland Manchester City FW 22 0.72 2736 0.72
Richarlison Tottenham FW 4 0.71 504 0.66
Ryan Sessegnon Fulham FW,DF 4 0.62 580 0.25
Chris Wood Nott'ham Forest FW 20 0.61 2959 0.41
Yoane Wissa Brentford FW 19 0.59 2919 0.57
Ollie Watkins Aston Villa FW 16 0.55 2598 0.53
Section 8.2: Positional Performance Analysis¶
======================================================================
📍 PERFORMANCE ANALYSIS BY POSITION
======================================================================
📊 STATISTICS BY POSITION:
Gls Ast Goal_Contributions_Per_90 xG_Per_90 \
count sum mean sum mean mean mean
Pos
DF 168 118 0.70 151 0.90 0.08 0.04
DF,FW 5 5 1.00 5 1.00 0.18 0.08
DF,MF 16 8 0.50 16 1.00 0.13 0.08
FW 72 452 6.28 173 2.40 0.46 0.34
FW,DF 4 6 1.50 3 0.75 0.35 0.17
FW,MF 54 179 3.31 141 2.61 0.38 0.25
GK 42 0 0.00 9 0.21 0.01 0.00
MF 93 168 1.81 181 1.95 0.20 0.12
MF,DF 12 18 1.50 17 1.42 0.13 0.08
MF,FW 40 127 3.18 107 2.68 0.32 0.21
xAG_Per_90 MP
mean mean
Pos
DF 0.05 21.07
DF,FW 0.16 22.80
DF,MF 0.07 16.88
FW 0.14 24.71
FW,DF 0.10 13.00
FW,MF 0.17 23.89
GK 0.00 18.29
MF 0.11 25.37
MF,DF 0.10 25.58
MF,FW 0.16 23.75
⭐ TOP PERFORMERS BY POSITION:
DF (Top Contributor):
🏆 Rayan Aït-Nouri (Wolves)
4G + 7A = 11 contributions
0.32 per 90min
DF,FW (Top Contributor):
🏆 Keane Lewis-Potter (Brentford)
1G + 3A = 4 contributions
0.12 per 90min
DF,MF (Top Contributor):
🏆 Matheus Nunes (Manchester City)
1G + 6A = 7 contributions
0.38 per 90min
FW (Top Contributor):
🏆 Mohamed Salah (Liverpool)
29G + 18A = 47 contributions
1.25 per 90min
FW,DF (Top Contributor):
🏆 Ryan Sessegnon (Fulham)
4G + 2A = 6 contributions
0.94 per 90min
FW,MF (Top Contributor):
🏆 Jarrod Bowen (West Ham)
13G + 8A = 21 contributions
0.64 per 90min
GK (Top Contributor):
🏆 Ederson (Manchester City)
0G + 4A = 4 contributions
0.16 per 90min
MF (Top Contributor):
🏆 Bruno Fernandes (Manchester Utd)
8G + 10A = 18 contributions
0.54 per 90min
MF,DF (Top Contributor):
🏆 Jack Hinshelwood (Brighton)
5G + 2A = 7 contributions
0.34 per 90min
MF,FW (Top Contributor):
🏆 Cole Palmer (Chelsea)
15G + 8A = 23 contributions
0.65 per 90min
Section 8.3: Shooting and Attacking Metrics¶
======================================================================
🎯 SHOOTING ANALYSIS
======================================================================
🎯 BEST SHOT ACCURACY (10+ shots):
Player Squad Sh SoT Shot_Accuracy Gls
Nathan Broadhead Ipswich Town 13 8 61.5 2
Jørgen Strand Larsen Wolves 54 33 61.1 14
Riccardo Calafiori Arsenal 10 6 60.0 2
Justin Devenny Crystal Palace 10 6 60.0 1
Donyell Malen Aston Villa 15 9 60.0 3
Ethan Pinnock Brentford 10 6 60.0 2
Ryan Sessegnon Fulham 14 8 57.1 4
Sammie Szmodics Ipswich Town 21 12 57.1 4
Marcus Rashford Manchester Utd 16 9 56.2 4
Jack Hinshelwood Brighton 18 10 55.6 5
💥 BEST CONVERSION RATE (10+ shots):
Player Squad Sh Gls Conversion_Rate xG
Chris Wood Nott'ham Forest 65 20 30.8 13.4
Michael Keane Everton 10 3 30.0 0.6
Ryan Sessegnon Fulham 14 4 28.6 1.6
Jack Hinshelwood Brighton 18 5 27.8 2.7
Trevoh Chalobah Crystal Palace 11 3 27.3 1.2
Jørgen Strand Larsen Wolves 54 14 25.9 10.3
Iliman Ndiaye Everton 35 9 25.7 6.2
Bryan Mbeumo Brentford 79 20 25.3 12.3
James Mcatee Manchester City 12 3 25.0 2.8
Marcus Rashford Manchester Utd 16 4 25.0 1.7
📈 BIGGEST OVERPERFORMERS (Goals vs xG):
Player Squad Gls xG Goals_vs_xG
Bryan Mbeumo Brentford 20 12.3 7.7
Chris Wood Nott'ham Forest 20 13.4 6.6
Matheus Cunha Wolves 15 8.6 6.4
Jarrod Bowen West Ham 13 8.6 4.4
Alex Iwobi Fulham 9 4.7 4.3
Mateo Kovačić Manchester City 6 1.9 4.1
Mohamed Salah Liverpool 29 25.2 3.8
Jørgen Strand Larsen Wolves 14 10.3 3.7
Amad Diallo Manchester Utd 8 4.7 3.3
James Maddison Tottenham 9 5.8 3.2
Section 8.4: Defensive Performance Analysis¶
======================================================================
🛡️ DEFENSIVE PERFORMANCE ANALYSIS
======================================================================
⚔️ TOP TACKLERS:
Player Squad Pos Tkl TklW Int Clr MP
Idrissa Gana Gueye Everton MF 133 80 48 36 37
Daniel Muñoz Crystal Palace DF 123 80 44 108 37
João Gomes Wolves MF 116 71 25 33 36
Noussair Mazraoui Manchester Utd DF 115 68 34 95 37
Moisés Caicedo Chelsea MF,DF 114 73 49 60 38
Alexis Mac Allister Liverpool MF 95 58 22 29 35
Antonee Robinson Fulham DF 95 61 62 133 36
Elliot Anderson Nott'ham Forest MF 92 56 31 76 37
André Wolves MF 91 55 37 30 33
Tyrick Mitchell Crystal Palace DF 91 55 19 108 37
Neco Williams Nott'ham Forest DF 90 57 31 137 35
Rayan Aït-Nouri Wolves DF 89 57 26 75 37
Mateus Fernandes Southampton MF 89 48 28 35 36
Thomas Partey Arsenal MF,DF 89 53 35 49 35
Victor Bernth Kristiansen Leicester City DF 87 51 42 75 30
🔍 TOP INTERCEPTORS:
Player Squad Pos Int Tkl Clr MP
Aaron Wan-Bissaka West Ham DF 66 70 125 36
Antonee Robinson Fulham DF 62 95 133 36
Ryan Gravenberch Liverpool MF 60 69 59 37
Jan Bednarek Southampton DF 56 36 190 30
Virgil van Dijk Liverpool DF 56 38 190 37
Maxence Lacroix Crystal Palace DF 54 68 207 35
Dean Huijsen Bournemouth DF 51 36 198 32
Moisés Caicedo Chelsea MF,DF 49 114 60 38
Christian Nørgaard Brentford MF 49 79 70 34
Idrissa Gana Gueye Everton MF 48 133 36 37
Carlos Baleba Brighton MF 46 79 44 34
James Justin Leicester City DF 46 58 141 36
Milos Kerkez Bournemouth DF 45 52 106 38
Fabian Schär Newcastle Utd DF 45 36 138 34
Joško Gvardiol Manchester City DF 44 58 122 37
🏃 MOST DEFENSIVE ACTIONS PER 90:
Player Squad Pos Defensive_Actions_Per_90 Tkl Int Clr
Harry Toffolo Nott'ham Forest DF 12.67 5 2 12
Willy Boly Nott'ham Forest DF 12.35 6 3 12
Welington Southampton DF 11.85 21 7 36
James Hill Bournemouth DF 11.00 15 9 31
Morato Nott'ham Forest DF 10.89 22 10 78
Charlie Taylor Southampton DF 10.75 7 5 31
Dean Huijsen Bournemouth DF 10.56 36 51 198
Philip Billing Bournemouth MF 10.50 14 5 2
Jan Bednarek Southampton DF 10.04 36 56 190
Ben Chilwell Crystal Palace DF 10.00 7 6 16
Woyo Coulibaly Leicester City DF 10.00 6 5 1
James Tarkowski Everton DF 9.78 64 41 213
Murillo Nott'ham Forest DF 9.55 53 36 249
Caleb Okoli Leicester City DF 9.52 25 15 79
Maxence Lacroix Crystal Palace DF 9.51 68 54 207
Section 8.5: Goalkeeper Analysis¶
======================================================================
🥅 GOALKEEPER ANALYSIS
======================================================================
📊 GOALKEEPER STATISTICS (33 goalkeepers):
Player Squad MP W D L CS GA Save% Saves
Matz Sels Nott'ham Forest 38 19.0 8.0 11.0 13.0 46.0 73.9 119.0
David Raya Arsenal 38 20.0 14.0 4.0 13.0 34.0 74.2 86.0
Jordan Pickford Everton 38 11.0 15.0 12.0 12.0 44.0 73.0 117.0
Dean Henderson Crystal Palace 38 13.0 14.0 11.0 11.0 51.0 66.7 101.0
Robert Sánchez Chelsea 32 17.0 9.0 6.0 10.0 34.0 76.4 92.0
Ederson Manchester City 26 16.0 4.0 6.0 10.0 26.0 69.2 53.0
Alisson Liverpool 28 18.0 7.0 3.0 9.0 29.0 72.0 73.0
André Onana Manchester Utd 34 10.0 9.0 15.0 9.0 44.0 68.9 88.0
Kepa Arrizabalaga Bournemouth 31 13.0 7.0 11.0 8.0 39.0 73.9 95.0
Nick Pope Newcastle Utd 28 13.0 6.0 9.0 8.0 35.0 71.7 86.0
Emiliano Martínez Aston Villa 37 18.0 9.0 10.0 8.0 45.0 69.0 96.0
José Sá Wolves 29 11.0 5.0 13.0 7.0 48.0 63.2 69.0
Bart Verbruggen Brighton 36 14.0 13.0 9.0 7.0 58.0 65.7 87.0
Mark Flekken Brentford 37 16.0 8.0 13.0 7.0 55.0 73.4 150.0
Martin Dúbravka Newcastle Utd 10 7.0 0.0 3.0 5.0 12.0 70.7 29.0
Bernd Leno Fulham 38 15.0 9.0 14.0 5.0 54.0 67.9 106.0
Alphonse Areola West Ham 26 5.0 7.0 13.0 5.0 41.0 64.3 77.0
Guglielmo Vicario Tottenham 24 9.0 3.0 12.0 4.0 37.0 64.7 67.0
Caoimhín Kelleher Liverpool 10 7.0 2.0 1.0 4.0 12.0 67.6 24.0
Aaron Ramsdale Southampton 30 2.0 5.0 23.0 3.0 66.0 67.6 120.0
Stefan Ortega Manchester City 13 5.0 4.0 3.0 3.0 18.0 68.0 33.0
Łukasz Fabiański West Ham 14 6.0 3.0 4.0 2.0 21.0 74.6 50.0
Jakub Stolarczyk Leicester City 10 3.0 1.0 6.0 2.0 16.0 63.6 28.0
Mads Hermansen Leicester City 27 3.0 6.0 18.0 1.0 58.0 64.5 99.0
Fraser Forster Tottenham 7 1.0 2.0 4.0 1.0 15.0 69.0 27.0
Mark Travers Bournemouth 5 2.0 2.0 1.0 1.0 5.0 80.0 20.0
Arijanet Muric Ipswich Town 18 2.0 6.0 10.0 1.0 33.0 70.0 67.0
Filip Jørgensen Chelsea 6 3.0 0.0 3.0 1.0 9.0 71.4 19.0
Antonín Kinský Tottenham 6 1.0 0.0 5.0 1.0 11.0 65.6 23.0
Christian Walton Ipswich Town 7 1.0 1.0 5.0 1.0 19.0 56.4 20.0
Sam Johnstone Wolves 7 0.0 1.0 6.0 0.0 17.0 61.5 23.0
Alex McCarthy Southampton 5 0.0 0.0 5.0 0.0 13.0 70.3 24.0
Alex Palmer Ipswich Town 13 1.0 3.0 9.0 0.0 30.0 59.2 43.0
🏆 BEST CLEAN SHEET %:
Player Squad MP CS CS% GA
Martin Dúbravka Newcastle Utd 10 5.0 50.0 12.0
Caoimhín Kelleher Liverpool 10 4.0 40.0 12.0
Ederson Manchester City 26 10.0 38.5 26.0
David Raya Arsenal 38 13.0 34.2 34.0
Matz Sels Nott'ham Forest 38 13.0 34.2 46.0
Alisson Liverpool 28 9.0 32.1 29.0
Jordan Pickford Everton 38 12.0 31.6 44.0
Robert Sánchez Chelsea 32 10.0 31.3 34.0
Dean Henderson Crystal Palace 38 11.0 28.9 51.0
Nick Pope Newcastle Utd 28 8.0 28.6 35.0
🤲 BEST SAVE %:
Player Squad Saves SoTA Save% GA
Mark Travers Bournemouth 20.0 25.0 80.0 5.0
Robert Sánchez Chelsea 92.0 127.0 76.4 34.0
Łukasz Fabiański West Ham 50.0 71.0 74.6 21.0
David Raya Arsenal 86.0 120.0 74.2 34.0
Kepa Arrizabalaga Bournemouth 95.0 134.0 73.9 39.0
Matz Sels Nott'ham Forest 119.0 165.0 73.9 46.0
Mark Flekken Brentford 150.0 203.0 73.4 55.0
Jordan Pickford Everton 117.0 163.0 73.0 44.0
Alisson Liverpool 73.0 100.0 72.0 29.0
Nick Pope Newcastle Utd 86.0 120.0 71.7 35.0
Section 8.6: Player Performance Visualizations¶
====================================================================== 📊 CREATING PLAYER PERFORMANCE VISUALIZATIONS ======================================================================
✅ Enhanced player performance visualizations created! 📋 VISUALIZATION LEGEND: 🟡 Yellow boxes: Top overall contributors 🔵 Blue boxes: Top goal scorers 🟢 Green boxes: Top assist providers / xG overperformers 🔴 Red boxes: xG underperformers
Section 8.7: Team Player Contributions¶
======================================================================
🏟️ TEAM PLAYER CONTRIBUTIONS ANALYSIS
======================================================================
🏆 TEAM GOAL CONTRIBUTIONS:
Gls Ast Goal_Contributions Players_Count xG xAG
Squad
Liverpool 85 65 150 22 84.1 62.3
Arsenal 67 55 122 22 62.0 46.6
Manchester City 71 51 122 25 69.8 54.9
Newcastle Utd 66 50 116 23 65.0 46.5
Brentford 65 44 109 22 60.5 42.9
Chelsea 61 47 108 26 69.3 53.5
Tottenham 61 46 107 28 59.9 45.0
Brighton 64 41 105 28 59.5 40.3
Aston Villa 56 45 101 28 57.6 42.0
Nott'ham Forest 57 42 99 22 46.7 33.8
Bournemouth 57 41 98 24 65.4 44.1
Fulham 53 44 97 23 49.9 37.6
Wolves 53 42 95 24 44.0 35.2
Crystal Palace 48 38 86 24 60.7 46.9
West Ham 43 29 72 25 48.2 34.2
Manchester Utd 42 29 71 30 53.5 39.8
Everton 39 27 66 23 42.2 32.7
Ipswich Town 35 26 61 30 35.0 24.2
Leicester City 33 25 58 27 32.9 25.2
Southampton 25 16 41 30 33.0 24.9
⚽ TOP PERFORMER PER TEAM:
Arsenal | Kai Havertz | 9G 3A
Aston Villa | Ollie Watkins | 16G 8A
Bournemouth | Justin Kluivert | 12G 6A
Brentford | Bryan Mbeumo | 20G 7A
Brighton | João Pedro | 10G 6A
Chelsea | Cole Palmer | 15G 8A
Crystal Palace | Jean-Philippe Mateta | 14G 2A
Everton | Iliman Ndiaye | 9G 0A
Fulham | Raúl Jiménez | 12G 3A
Ipswich Town | Liam Delap | 12G 2A
Leicester City | Jamie Vardy | 9G 4A
Liverpool | Mohamed Salah | 29G 18A
Manchester City | Erling Haaland | 22G 3A
Manchester Utd | Amad Diallo | 8G 6A
Newcastle Utd | Alexander Isak | 23G 6A
Nott'ham Forest | Chris Wood | 20G 3A
Southampton | Paul Onuachu | 4G 1A
Tottenham | Brennan Johnson | 11G 3A
West Ham | Jarrod Bowen | 13G 8A
Wolves | Matheus Cunha | 15G 6A
Section 9: Applying Data Science - Inference & Modeling¶
This section elevates the project from a descriptive analysis to a data science investigation. Instead of only observing what happened, we use statistical tools to understand why it might have happened and to quantify the relationships between different performance variables. This includes formal hypothesis testing, building an explanatory regression model, and using unsupervised learning to discover player archetypes.
Section 9.1: Statistical Hypothesis Testing (Inferential Statistics)¶
Hypothesis: Do teams that finish in the Top 4 (Champions League spots) have a statistically significant higher Goal Difference (GD) than the rest of the league?
🔬 HYPOTHESIS TEST: Does Goal Difference define Top 4 teams? ============================================================ H₀ (Null Hypothesis): There is no significant difference in Goal Difference. H₁ (Alternative Hypothesis): There IS a significant difference. ------------------------------------------------------------ T-statistic: 5.16 P-value: 0.0002 ✅ Conclusion: The result is statistically significant (p < 0.05). We REJECT the null hypothesis. Top 4 teams have a significantly different Goal Difference.
Section 9.2: Explanatory Modeling with Linear Regression¶
Goal: Build a model to explain how Goals For (GF), Goals Against (GA), and Expected Goal Difference (xGD) contribute to the final Points (Pts) tally.
EXPLANATORY MODEL: What factors drive league points?
============================================================
OLS Regression Results
==============================================================================
Dep. Variable: Pts R-squared: 0.942
Model: OLS Adj. R-squared: 0.931
Method: Least Squares F-statistic: 86.38
Date: Tue, 08 Jul 2025 Prob (F-statistic): 4.25e-10
Time: 15:33:11 Log-Likelihood: -57.857
No. Observations: 20 AIC: 123.7
Df Residuals: 16 BIC: 127.7
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 57.4780 11.684 4.920 0.000 32.710 82.246
GF 0.6552 0.203 3.231 0.005 0.225 1.085
GA -0.7471 0.227 -3.290 0.005 -1.229 -0.266
xGD -0.0407 0.219 -0.186 0.854 -0.504 0.423
==============================================================================
Omnibus: 8.548 Durbin-Watson: 2.172
Prob(Omnibus): 0.014 Jarque-Bera (JB): 6.408
Skew: -0.909 Prob(JB): 0.0406
Kurtosis: 5.095 Cond. No. 848.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
📊 INTERPRETATION:
- R-squared: Shows how much of the variance in Points is explained by our model.
- Coef: Shows the strength and direction of each factor's effect on Points.
- P>|t|: A low p-value (< 0.05) suggests the factor is statistically significant.
Section 9.3: Unsupervised Learning for Player Profiling (Clustering)¶
Goal: Identify different profiles of attacking players (e.g., "Pure Finishers", "Creative Forwards", "All-Rounders") using K-Means clustering.
👥 PLAYER PROFILING: Identifying attacker archetypes
============================================================
✅ Successfully clustered 67 attackers into 3 profiles.
--- Cluster 0: Profile ---
Goals_Per_90 0.434474
Assists_Per_90 0.132803
Sh/90 2.689615
PrgC_stats_possession 34.269231
KP 20.423077
dtype: float64
Sample Players in this Profile:
Player Squad
Beto Everton
Dominic Calvert-Lewin Everton
Liam Delap Ipswich Town
--- Cluster 1: Profile ---
Goals_Per_90 0.384251
Assists_Per_90 0.243794
Sh/90 2.850526
PrgC_stats_possession 107.578947
KP 51.684211
dtype: float64
Sample Players in this Profile:
Player Squad
Harvey Barnes Newcastle Utd
Matheus Cunha Wolves
Kevin De Bruyne Manchester City
--- Cluster 2: Profile ---
Goals_Per_90 0.130380
Assists_Per_90 0.118158
Sh/90 1.477727
PrgC_stats_possession 50.045455
KP 26.772727
dtype: float64
Sample Players in this Profile:
Player Squad
Cameron Archer Southampton
Jean-Ricner Bellegarde Wolves
Mikkel Damsgaard Brentford
Section 10: Season Summary and Export¶
Creation of comprehensive summary of findings. This section brings together all analysis into a professional presentation format with exportable insights and executive summary.
================================================================================
🏁 PREMIER LEAGUE 2024-25 COMPREHENSIVE SEASON SUMMARY
================================================================================
🏆 SEASON HIGHLIGHTS:
👑 Champion: Liverpool
• Final Points: 84
• Goal Difference: +45
• Win Rate: 65.8%
⬇️ Relegated Teams:
18. Leicester City - 25 points
19. Ipswich Town - 22 points
20. Southampton - 12 points
📊 CATEGORY WINNERS:
⚽ Best Attack: Liverpool (86 goals)
🛡️ Best Defense: Arsenal (34 conceded)
🏆 Most Wins: Liverpool (25 wins)
📈 LEAGUE STATISTICS:
• Total Goals: 1,115
• Average Goals per Game: 2.93
• Points Spread: 72 points
• Most Competitive Positions: Top 4 & Relegation battles
============================================================
💾 EXPORTING ANALYSIS RESULTS
============================================================
✅ Export completed successfully!
📁 Files saved to 'premier_league_2024_25_analysis' directory:
• final_league_table.csv
• enhanced_team_statistics.csv
• executive_summary.txt
Project Conclusion & Final Thoughts¶
This project successfully demonstrates an end-to-end data analysis and data science workflow within the exciting domain of sports analytics. By integrating descriptive statistics, advanced visualizations, hypothesis testing, and machine learning, we have extracted deep, multi-faceted insights from the Premier League 2024-25 season data.
The structured approach—from data acquisition and cleaning to team and player analysis, and finally to statistical modeling—provides a robust and reproducible framework that can be adapted for future seasons or other sports leagues.
Final Summary of Project¶
📊 ANALYSIS SCOPE COVERED:
- ✅ Complete season review and final standings
- ✅ Detailed team performance metrics and comparisons
- ✅ Comprehensive player performance analysis across all positions
- ✅ Statistical insights and correlation analysis
- ✅ Data science applications (hypothesis testing, regression, clustering)
- ✅ Interactive visualizations and data storytelling
🚀 TECHNICAL SKILLS DEMONSTRATED:
- • Data acquisition and API integration (Kaggle)
- • Advanced data wrangling and feature engineering (Pandas)
- • Statistical analysis and modeling (SciPy, StatsModels, Scikit-learn)
- • Interactive dashboard creation (Plotly)
- • Data visualization (Matplotlib, Seaborn)
- • Professional coding practices with functional programming
💼 THIS PROJECT VALUE:
- 🎯 Perfect for demonstrating both Data Analysis and Data Science expertise.
- 📈 Shows end-to-end analytical thinking, from descriptive to inferential analysis.
- 🏆 Explores an industry-relevant domain (sports analytics) with real-world data.
- 🔧 Highlights technical versatility across a wide range of popular data science tools.
📚 Data Sources & Attributions¶
Primary Data Sources¶
This analysis was made possible through the following high-quality datasets:
Team Statistics:
- Dataset: Premier League 2024-2025 Team Statistics
- Source: Kaggle Dataset by @sattvikyadav
-
URL:
kaggle.com/datasets/sattvikyadav/premier-league-2024-2025-team-statistics - License: Open Dataset License
- Usage: Complete team performance metrics, final league standings, and advanced statistics
Player Statistics:
- Dataset: Football Players Stats 2024-2025
- Source: Kaggle Dataset by @hubertsidorowicz
-
URL:
kaggle.com/datasets/hubertsidorowicz/football-players-stats-2024-2025 - License: Open Dataset License
- Usage: Individual player performance data across all Premier League teams
Data Acknowledgments¶
- All statistical data is sourced from official Premier League records and verified third-party providers
- Team and player performance metrics reflect the complete 2024-25 Premier League season
- Expected Goals (xG) and advanced metrics sourced from professional football analytics providers
🛠️ Technical Stack & Tools¶
Programming & Analysis¶
- Python 3.x - Primary programming language
- Jupyter Notebook - Interactive development environment
- Google Colaboratory - Cloud-based execution platform with GPU acceleration
Data Science Libraries¶
- pandas 1.5+ - Data manipulation and analysis
- NumPy 1.21+ - Numerical computing and array operations
- SciPy - Statistical analysis and hypothesis testing
- Scikit-learn - Machine learning algorithms and statistical modeling
- StatsModels - Advanced statistical analysis and regression modeling
Visualization Libraries¶
- Matplotlib 3.5+ - Static data visualization
- Seaborn 0.11+ - Statistical data visualization
- Plotly 5.0+ - Interactive charts and dashboards
Data Acquisition¶
- Kaggle API - Secure dataset download and management
- Python requests - HTTP library for data fetching
📖 Methodology References¶
Statistical Methods¶
- Correlation Analysis: Pearson correlation coefficients for metric relationships
- Hypothesis Testing: Independent t-tests for group comparisons
- Linear Regression: Ordinary Least Squares (OLS) for explanatory modeling
- Clustering Analysis: K-Means clustering for player profiling
Sports Analytics Framework¶
- Expected Goals (xG) methodology follows industry-standard football analytics practices
- Per-90-minute metrics calculated using official playing time data
- Performance efficiency ratios based on established sports science literature
Data Science Best Practices¶
- Cross-validation techniques for model validation
- Feature engineering following domain expertise principles
- Interactive visualization design based on data storytelling principles
🏆 Project Information¶
Project Scope¶
This project demonstrates end-to-end data science capabilities including:
- Secure data acquisition and preprocessing
- Exploratory Data Analysis (EDA) and statistical inference
- Advanced visualization and interactive dashboard creation
- Machine learning applications in sports analytics
- Professional reporting and insight generation
Educational Purpose¶
This analysis was conducted for educational and portfolio purposes, showcasing:
- Technical proficiency in Python and data science tools
- Domain expertise in sports analytics
- Statistical analysis and modeling capabilities
⚖️ Legal & Ethical Considerations¶
Data Usage Compliance¶
- All datasets used are publicly available under open data licenses
- Data usage complies with Kaggle Terms of Service and dataset-specific licenses
- No personally identifiable information (PII) was processed or stored
- Analysis conducted in accordance with data protection principles
Disclaimer¶
- This analysis is for educational and demonstration purposes only
- Statistical findings reflect historical data and should not be used for commercial betting or gambling
- All insights and conclusions are based on available data and may not reflect complete season context
🙏 Acknowledgments¶
Special Thanks¶
- Kaggle Community for providing high-quality, accessible sports datasets
- Premier League for maintaining comprehensive statistical records
- Open Source Community for developing the excellent tools that made this analysis possible
Inspiration¶
This project was inspired by the growing field of sports analytics and the desire to apply data science techniques to understand football performance dynamics. Special recognition to the broader sports analytics community for pioneering statistical approaches to football analysis.