100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

Overview

100 pandas puzzles

Puzzles notebook

Solutions notebook

Inspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power.

Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are straightforward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python!). Choosing the right methods and following best practices is the underlying goal.

The exercises are loosely divided in sections. Each section has a difficulty rating; these ratings are subjective, of course, but should be a seen as a rough guide as to how elaborate the required solution needs to be.

Good luck solving the puzzles!

* the list of puzzles is not yet complete! Pull requests or suggestions for additional exercises, corrections and improvements are welcomed.

Overview of puzzles

Section Name Description Difficulty
Importing pandas Getting started and checking your pandas setup Easy
DataFrame basics A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames Easy
DataFrames: beyond the basics Slightly trickier: you may need to combine two or more methods to get the right answer Medium
DataFrames: harder problems These might require a bit of thinking outside the box... Hard
Series and DatetimeIndex Exercises for creating and manipulating Series with datetime data Easy/Medium
Cleaning Data Making a DataFrame easier to work with Easy/Medium
Using MultiIndexes Go beyond flat DataFrames with additional index levels Medium
Minesweeper Generate the numbers for safe squares in a Minesweeper grid Hard
Plotting Explore pandas' part of plotting functionality to see trends in data Medium

Setting up

To tackle the puzzles on your own computer, you'll need a Python 3 environment with the dependencies (namely pandas) installed.

One way to do this is as follows. I'm using a bash shell, the procedure with Mac OS should be essentially the same. Windows, I'm not sure about.

  1. Check you have Python 3 installed by printing the version of Python:
python -V
  1. Clone the puzzle repository using Git:
git clone https://github.com/ajcr/100-pandas-puzzles.git
  1. Install the dependencies (caution: if you don't want to modify any Python modules in your active environment, consider using a virtual environment instead):
python -m pip install -r requirements.txt
  1. Launch a jupyter notebook server:
jupyter notebook --notebook-dir=100-pandas-puzzles

You should be able to see the notebooks and launch them in your web browser.

Contributors

This repository has benefitted from numerous contributors, with those who have sent puzzles and fixes listed in CONTRIBUTORS.

Thanks to everyone who has raised an issue too.

Other links

If you feel like reading up on pandas before starting, the official documentation useful and very extensive. Good places get a broader overview of pandas are:

There are may other excellent resources and books that are easily searchable and purchaseable.

Owner
Alex Riley
Alex Riley
Plot-configurations for scientific publications, purely based on matplotlib

TUEplots Plot-configurations for scientific publications, purely based on matplotlib. Usage Please have a look at the examples in the example/ directo

Nicholas Krรคmer 487 Jan 08, 2023
๐Ÿž ๐Ÿ“Š Ladybug extension to generate 2D charts

ladybug-charts Ladybug extension to generate 2D charts. Installation pip install ladybug-charts QuickStart import ladybug_charts API Documentation Loc

Ladybug Tools 3 Dec 30, 2022
Visualization ideas for data science

Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom

Li Jiangchun 16 Nov 03, 2022
Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain amount of time.

Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain

Bohdan Ruban 5 Nov 08, 2022
An open-source tool for visual and modular block programing in python

PyFlow PyFlow is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are coming in bit by

1.1k Jan 06, 2023
These data visualizations were created for my introductory computer science course using Python

Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python

Sophia Huang 12 Oct 20, 2022
Lumen provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification

Lumen project provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification

HoloViz 120 Jan 04, 2023
A small script written in Python3 that generates a visual representation of the Mandelbrot set.

Mandelbrot Set Generator A small script written in Python3 that generates a visual representation of the Mandelbrot set. Abstract The colors in the ou

1 Dec 28, 2021
PyFlow is a general purpose visual scripting framework for python

PyFlow is a general purpose visual scripting framework for python. State Base structure of program implemented, such things as packages disco

1.8k Jan 07, 2023
Visualization Website by using Dash and Heroku

Visualization Website by using Dash and Heroku You can visit the website https://payroll-expense-analysis.herokuapp.com/ In this project, I am interes

YF Liu 1 Jan 14, 2022
Movies-chart - A CLI app gets the top 250 movies of all time from imdb.com and the top 100 movies from rottentomatoes.com

movies-chart This CLI app gets the top 250 movies of all time from imdb.com and

3 Feb 17, 2022
Make visual music sheets for thatskygame (graphical representations of the Sky keyboard)

sky-python-music-sheet-maker This program lets you make visual music sheets for Sky: Children of the Light. It will ask you a few questions, and does

21 Aug 26, 2022
Implementation of SOMs (Self-Organizing Maps) with neighborhood-based map topologies.

py-self-organizing-maps Simple implementation of self-organizing maps (SOMs) A SOM is an unsupervised method for learning a mapping from a discrete ne

Jonas Grebe 6 Nov 22, 2022
Keir&'s Visualizing Data on Life Expectancy

Keir's Visualizing Data on Life Expectancy Below is information on life expectancy in the United States from 1900-2017. You will also find information

9 Jun 06, 2022
Material for dataviz course at university of Bordeaux

Material for dataviz course at university of Bordeaux

Nicolas P. Rougier 50 Jul 17, 2022
Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.

Open-source tool for exploring, labeling, and monitoring data for AI projects

Recognai 1.5k Jan 07, 2023
Python support for Godot ๐Ÿ๐Ÿ๐Ÿ

Godot Python, because you want Python on Godot ! The goal of this project is to provide Python language support as a scripting module for the Godot ga

Emmanuel Leblond 1.4k Jan 04, 2023
๐ŸPyNode Next allows you to easily create beautiful graph visualisations and animations

PyNode Next A complete rewrite of PyNode for the modern era. Up to five times faster than the original PyNode. PyNode Next allows you to easily create

ehne 3 Feb 12, 2022
CompleX Group Interactions (XGI) provides an ecosystem for the analysis and representation of complex systems with group interactions.

XGI CompleX Group Interactions (XGI) is a Python package for the representation, manipulation, and study of the structure, dynamics, and functions of

Complex Group Interactions 67 Dec 28, 2022
Decision Border Visualizer for Classification Algorithms

dbv Decision Border Visualizer for Classification Algorithms Project description A python package for Machine Learning Engineers who want to visualize

Sven Eschlbeck 1 Nov 01, 2021