Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

Overview

PremiershipPlayerAnalysis

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. Note : My understanding is the squad data on this site can change at any time so your results might be different

Improvement : Calculate age to finer degree than just years

The was developed in Jupyter Notebook and this walkthrough willl assume you are doing the same

Once you have ran the scraping

original = pd.DataFrame(playersList) # Convert the data scraped into a Pandas DataFrame 

original.to_csv('premiershipplayers.csv') # Keep a back up of the data to save time later if required 

df2 = original.copy() # Working copy of the DataFrame (just in case) 


df2.info()


   
    
RangeIndex: 578 entries, 0 to 577
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   club         578 non-null    object
 1   name         578 non-null    object
 2   shirtNo      572 non-null    object
 3   nationality  562 non-null    object
 4   dob          562 non-null    object
 5   height       500 non-null    object
 6   weight       474 non-null    object
 7   appearances  578 non-null    object
 8   goals        578 non-null    object
 9   wins         578 non-null    object
 10  losses       578 non-null    object
dtypes: object(11)
memory usage: 49.8+ KB

   

*** A total of 578 player. ***

6 without shirt number

16 without nationality listed

16 without dob listed

78 without height listed

104 without weight listed

Cleanup Data

  1. Remove spaces and newline from dob, appearances, goals, wins and losses columns

  2. Change type of dob to date

  3. change type of appearances, goals, wins, losses to int

     df2['dob'] = df2['dob'].str.replace('\n','').str.strip(' ')
     df2['appearances'] = df2['appearances'].str.replace('\n','').str.strip(' ')
     df2['goals'] = df2['goals'].str.replace('\n','').str.strip(' ')
     df2['wins'] = df2['wins'].str.replace('\n','').str.strip(' ')
     df2['losses'] = df2['losses'].str.replace('\n','').str.strip(' ')
    
     # change type of dob, appearances, goals, wins, losses
     from datetime import  date
    
     df2['dob'] = pd.to_datetime(df2['dob'],format='%d/%m/%Y').dt.date
     df2["appearances"] = pd.to_numeric(df2["appearances"])
     df2["goals"] = pd.to_numeric(df2["goals"])
     df2["wins"] = pd.to_numeric(df2["wins"])
     df2["losses"] = pd.to_numeric(df2["losses"])
     df2['height'] = df2['height'].str[:-2]
     df2["height"] = pd.to_numeric(df2["height"])
     
     
     # Create age column
    
     today = date.today()
    
     def age(born):
         if born:
             return today.year - born.year - ((today.month, 
                                           today.day) < (born.month, 
                                                         born.day))
         else:
             return np.nan
    
     df2['age'] = df2['dob'].apply(age)
    

10 Oldest Players

    df2.sort_values('age',ascending=False).head(10)

image

10 Youngest Players

    df2.sort_values('age',ascending=True).head(10)

image

Squad Sizes

    df2.groupby(['club'])['club'].count().sort_values(ascending=False)

image

Team's Average Player Age

    plt.ylim([20, 30])
    df2.groupby(['club'])['age'].mean().sort_values(ascending=False).plot.bar()

image

Burnley appear to not only have one of the highest average player ages but also the owest number of registered players

Top 10 Premiership Appearances

    df2.sort_values('appearances',ascending=False).head(10)

image

Collective Premiership Appearances per Club

    df2.groupby(['club'])['appearances'].sum().sort_values(ascending=False)

image

    df2.groupby(['club'])['appearances'].sum().sort_values(ascending=False).plot.bar()

image

10 Tallest Playes

    df2.sort_values('height',ascending=False).head(10)

image

10 Shortest Playes

    df2.sort_values('height',ascending=True).head(10)

image

Nationality totals of Players

    pd.set_option('display.max_rows', 100)
    df.groupby(['nationality'])['club'].count().sort_values(ascending=False)

Nationality totals per club

    pd.set_option('display.max_rows', 500)
    df.groupby(['club','nationality'])['nationality'].count()
Tkinter Izhikevich Neuron Model With Python

TKINTER IZHIKEVICH NEURON MODEL WITH PYTHON Hodgkin-Huxley Model It is a mathematical model for the generation and transmission of action potentials i

Rabia KOÇ 8 Jul 16, 2022
This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

sweemeng 2 Nov 23, 2021
Pip install minimal-pandas-api-for-polars

Minimal Pandas API for Polars Install From PyPI: pip install minimal-pandas-api-for-polars Example Usage (see tests/test_minimal_pandas_api_for_polars

Austin Ray 6 Oct 16, 2022
BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Vo Cong Thanh 1 Jan 06, 2022
Python ELT Studio, an application for building ELT (and ETL) data flows.

The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.

Schlerp 55 Nov 18, 2022
Spectral Analysis in Python

SPECTRUM : Spectral Analysis in Python contributions: Please join https://github.com/cokelaer/spectrum contributors: https://github.com/cokelaer/spect

Thomas Cokelaer 280 Dec 16, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

Ravi Prakash 3 Jun 27, 2021
Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)

Spark-DeltaLake-Demo Reliable, Scalable Machine Learning (2022) This project was completed in an attempt to become better acquainted with the latest b

8 Mar 21, 2022
Modular analysis tools for neurophysiology data

Neuroanalysis Modular and interactive tools for analysis of neurophysiology data, with emphasis on patch-clamp electrophysiology. Functions for runnin

Allen Institute 5 Dec 22, 2021
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

Tenyo Kawamura 1 Oct 20, 2021
ped-crash-techvol: Texas Ped Crash Tech Volume Pack

ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede

Network Modeling Center; Center for Transportation Research; The University of Texas at Austin 2 Sep 28, 2022
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
.npy, .npz, .mtx converter.

npy-converter Matrix Data Converter. Expand matrix for multi-thread, multi-process Divid matrix for multi-thread, multi-process Support: .mtx, .npy, .

taka 1 Feb 07, 2022
Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

Sebastian Schäfer 10 Dec 08, 2022
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Raster_Sampling_Demo (Resulting graph of this demo) Background Sampling values of a raster at specific geographic coordinates can be done with a numbe

2 Dec 13, 2022
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

📈 Statistical Quality Control 📉 This repo contains a simple but effective tool made using python which can be used for quality control in statistica

SasiVatsal 8 Oct 18, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022