A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep

Overview

Here are the sections:

Data Science Cheatsheets

This section contains cheatsheets of basic concepts in data science that will be asked in interviews:

Data Science EBooks

This section contains books that I have read about data science and machine learning:

Data Science Question Bank

This section contains sample questions that were asked in actual data science interviews:

Data Science Case Studies

This section contains case study questions that concern designing machine learning systems to solve practical problems.

Data Science Portfolio

This section contains portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

For a more visually pleasant experience for browsing the portfolio, check out jameskle.com/data-portfolio

  • Recommendation Systems

    • Transfer Rec: My ongoing research work that intersects deep learning and recommendation systems.

    • Movie Recommendation: Designed 4 different models that recommend items on the MovieLens dataset.

    Tools: PyTorch, TensorBoard, Keras, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-Learn, Surprise, Wordcloud

  • Machine Learning

    • Trip Optimizer: Used XGBoost and evolutionary algorithms to optimize the travel time for taxi vehicles in New York City.

    • Instacart Market Basket Analysis: Tackled the Instacart Market Basket Analysis challenge to predict which products will be in a user's next order.

    Tools: Pandas, NumPy, Matplotlib, XGBoost, Geopy, Scikit-Learn

  • Computer Vision

    • Fashion Recommendation: Built a ResNet-based model that classifies and recommends fashion images in the DeepFashion database based on semantic similarity.

    • Fashion Classification: Developed 4 different Convolutional Neural Networks that classify images in the Fashion MNIST dataset.

    • Dog Breed Classification: Designed a Convolutional Neural Network that identifies dog breed.

    • Road Segmentation: Implemented a Fully-Convolutional Network for semantic segmentation task in the Kitty Road Dataset.

    Tools: TensorFlow, Keras, Pandas, NumPy, Matplotlib, Scikit-Learn, TensorBoard

  • Natural Language Processing

  • Data Analysis and Visualization

    • World Cup 2018 Team Analysis: Analysis and visualization of the FIFA 18 dataset to predict the best possible international squad lineups for 10 teams at the 2018 World Cup in Russia.

    • Spotify Artists Analysis: Analysis and visualization of musical styles from 50 different artists with a wide range of genres on Spotify.

    Tools: Pandas, NumPy, Matplotlib, Rspotify, httr, dplyr, tidyr, radarchart, ggplot2

Data Journalism Portfolio

This section contains portfolio of data journalism articles completed by me for freelance clients and self-learning purposes.

For a more visually pleasant experience for browsing the portfolio, check out jameskle.com/data-journalism

Downloadable Cheatsheets

These PDF cheatsheets come from BecomingHuman.AI.

1 - Neural Network Basics

Neural Network Basics

2 - Neural Network Graphs

Neural Network Graphs

3 - Machine Learning with Emojis

Machine Learning with Emojis

4 - Scikit-Learn With Python

Scikit-Learn With Python

5 - Python Basics

Python Basics

6 - NumPy Basics

NumPy Basics

7 - Pandas Basics

Pandas Basics

8 - Data Wrangling With Pandas

Data Wrangling With Pandas Part 1

Data Wrangling With Pandas Part 2

9 - SciPy Linear Algebra

SciPy Linear Algebra

10 - Matplotlib Basics

Matplotlib Basics

11 - Keras

Keras

12 - Big-O

Big-O

Owner
James Le
Data Journalist 📝 -> Data Scientist 📊 -> Machine Learning Researcher 🔍 -> Data Advocate 🤝
James Le
Markdown documentation generator from Google docstrings

mkgendocs A Python package for automatically generating documentation pages in markdown for Python source files by parsing Google style docstring. The

Davide Nunes 44 Dec 18, 2022
A tool that allows for versioning sites built with mkdocs

mkdocs-versioning mkdocs-versioning is a plugin for mkdocs, a tool designed to create static websites usually for generating project documentation. mk

Zayd Patel 38 Feb 26, 2022
An ongoing curated list of OS X best applications, libraries, frameworks and tools to help developers set up their macOS Laptop.

macOS Development Setup Welcome to MacOS Local Development & Setup. An ongoing curated list of OS X best applications, libraries, frameworks and tools

Paul Veillard 3 Apr 03, 2022
A Power BI/Google Studio Dashboard to analyze previous OTC CatchUps

OTC CatchUp Dashboard A Power BI/Google Studio dashboard analyzing OTC CatchUps. File Contents * ├───data ├───old summaries ─── *.md ├

11 Oct 30, 2022
The project that powers MDN.

Kuma Kuma is the platform that powers MDN (developer.mozilla.org) Development Code: https://github.com/mdn/kuma Issues: P1 Bugs (to be fixed ASAP) P2

MDN Web Docs 1.9k Dec 26, 2022
SqlAlchemy Flask-Restful Swagger Json:API OpenAPI

SAFRS: Python OpenAPI & JSON:API Framework Overview Installation JSON:API Interface Resource Objects Relationships Methods Custom Methods Class Method

Thomas Pollet 361 Nov 16, 2022
ACPOA plugin creation helper

ACPOA Plugin What is ACPOA ACPOA is the acronym for "Application Core for Plugin Oriented Applications". It's a tool to create flexible and extendable

Leikt Sol'Reihin 1 Oct 20, 2021
Word document generator with python

In this study, real world data is anonymized. The content is completely different, but the structure is the same. It was a script I prepared for the backend of a work using UiPath.

Ezgi Turalı 3 Jan 30, 2022
Explorative Data Analysis Guidelines

Explorative Data Analysis Get data into a usable format! Find out if the following predictive modeling phase will be successful! Combine everything in

Florian Rohrer 18 Dec 26, 2022
Spin-off Notice: the modules and functions used by our research notebooks have been refactored into another repository

Fecon235 - Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio eq

Adriano 825 Dec 27, 2022
Python bindings to OpenSlide

OpenSlide Python OpenSlide Python is a Python interface to the OpenSlide library. OpenSlide is a C library that provides a simple interface for readin

OpenSlide 297 Dec 21, 2022
Type hints support for the Sphinx autodoc extension

sphinx-autodoc-typehints This extension allows you to use Python 3 annotations for documenting acceptable argument types and return value types of fun

Alex Grönholm 462 Dec 29, 2022
Generate YARA rules for OOXML documents using ZIP local header metadata.

apooxml Generate YARA rules for OOXML documents using ZIP local header metadata. To learn more about this tool and the methodology behind it, check ou

MANDIANT 34 Jan 26, 2022
Tutorial for STARKs with supporting code in python

stark-anatomy STARK tutorial with supporting code in python Outline: introduction overview of STARKs basic tools -- algebra and polynomials FRI low de

121 Jan 03, 2023
A pluggable API specification generator. Currently supports the OpenAPI Specification (f.k.a. the Swagger specification)..

apispec A pluggable API specification generator. Currently supports the OpenAPI Specification (f.k.a. the Swagger specification). Features Supports th

marshmallow-code 1k Jan 01, 2023
An open-source script written in python just for fun

Owersite Owersite is an open-source script written in python just for fun. It do

大きなペニスを持つ少年 7 Sep 21, 2022
Main repository for the Sphinx documentation builder

Sphinx Sphinx is a tool that makes it easy to create intelligent and beautiful documentation for Python projects (or other documents consisting of mul

5.1k Jan 04, 2023
A Python library that simplifies the extraction of datasets from XML content.

xmldataset: simple xml parsing 🗃️ XML Dataset: simple xml parsing Documentation: https://xmldataset.readthedocs.io A Python library that simplifies t

James Spurin 75 Dec 30, 2022
Contains the assignments from the course Building a Modern Computer from First Principles: From Nand to Tetris.

Contains the assignments from the course Building a Modern Computer from First Principles: From Nand to Tetris.

Matheus Rodrigues 1 Jan 20, 2022
DeltaPy - Tabular Data Augmentation (by @firmai)

DeltaPy⁠⁠ — Tabular Data Augmentation & Feature Engineering Finance Quant Machine Learning ML-Quant.com - Automated Research Repository Introduction T

Derek Snow 470 Dec 28, 2022