A repository with scraping code and soccer dataset from understat.com.

Last update: Jan 03, 2023

Related tags

Overview

UNDERSTAT - SHOTS DATASET

As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goals (xG) stats for every shot taken in the top 5 leagues in Europe, as well as the Russian league.

After watching an awesome tutorial by McKay Johns (great channel btw, loads of resources for beginners in soccer analytics), I decided to write some code to scrape all the shots data available at Understat. As a consequence I managed to generate this dataset, containing shots data of season 2014/2015, up to every match played in the 2020/2021 season, for the top division on the following countries:

England - EPL

Spain - La Liga

Germany - Bundesliga

Italy - Serie A

France - Ligue 1

Russia - RFPL

Besides shots data, I also managed to scrape very detailed season stats on every single player that took part in these matches.

The datasets have been split into folders for every league, so every folder has 7 .csv files for shots data and 7 .csv files for players data (1 for every season since 14/15). The full dataset, with every league and season combined is also available at the "datasets" folder. I plan on updating the datasets everyday, but I also uploaded the Python code that generates and updates the datasets. Feel free to play with it and suggest improvements (hit me up on twitter). To update it by yourself, just save "scraping" and "datasets" on the same folder, run Python with this folder as the current working directory and then run the update.py script, that is located in "scraping".

Most of the columns in the datasets are pretty straightforward, but some aren't. So I uploaded a couple of .pdf files in "documentation", explaining every column.

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

Owner

douglasbc

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

Scrap the 42 Intranet's elearning videos in a single click

A Scrapper with python

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

Scrapping Connections' info on Linkedin

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

crypto currency scraping

Dex-scrapper - Hobby project for scrapping dex data on VeChain

Scrap-mtg-top-8 - A top 8 mtg scraper using python

A simple python script to fetch the latest covid info

A modern CSS selector implementation for BeautifulSoup

Nekopoi scraper using python3

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Minecraft Item Scraper

Parse feeds in Python

Collection of code files to scrap different kinds of websites.

A low-code tool that generates python crawler code based on curl or url

Web and PDF Scraper Refactoring

京东茅台抢购最新优化版本，京东茅台秒杀，优化了茅台抢购进程队列

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.