AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Overview

Data Science on AWS - O'Reilly Book

Open In SageMaker Studio Lab

Get the book on Amazon.com

Data Science on AWS

Book Outline

Book Outline

Quick Start Workshop (4-hours)

Workshop Paths

In this quick start hands-on workshop, you will build an end-to-end AI/ML pipeline for natural language processing with Amazon SageMaker. You will train and tune a text classifier to predict the star rating (1 is bad, 5 is good) for product reviews using the state-of-the-art BERT model for language representation. To build our BERT-based NLP text classifier, you will use a product reviews dataset where each record contains some review text and a star rating (1-5).

Quick Start Workshop Learning Objectives

Attendees will learn how to do the following:

  • Ingest data into S3 using Amazon Athena and the Parquet data format
  • Visualize data with pandas, matplotlib on SageMaker notebooks
  • Detect statistical data bias with SageMaker Clarify
  • Perform feature engineering on a raw dataset using Scikit-Learn and SageMaker Processing Jobs
  • Store and share features using SageMaker Feature Store
  • Train and evaluate a custom BERT model using TensorFlow, Keras, and SageMaker Training Jobs
  • Evaluate the model using SageMaker Processing Jobs
  • Track model artifacts using Amazon SageMaker ML Lineage Tracking
  • Run model bias and explainability analysis with SageMaker Clarify
  • Register and version models using SageMaker Model Registry
  • Deploy a model to a REST endpoint using SageMaker Hosting and SageMaker Endpoints
  • Automate ML workflow steps by building end-to-end model pipelines using SageMaker Pipelines

Extended Workshop (8-hours)

Workshop Paths

In the extended hands-on workshop, you will get hands-on with advanced model training and deployment techniques such as hyper-parameter tuning, A/B testing, and auto-scaling. You will also setup a real-time, streaming analytics and data science pipeline to perform window-based aggregations and anomaly detection.

Extended Workshop Learning Objectives

Attendees will learn how to do the following:

  • Perform automated machine learning (AutoML) to find the best model from just your dataset with low-code
  • Find the best hyper-parameters for your custom model using SageMaker Hyper-parameter Tuning Jobs
  • Deploy multiple model variants into a live, production A/B test to compare online performance, live-shift prediction traffic, and autoscale the winning variant using SageMaker Hosting and SageMaker Endpoints
  • Setup a streaming analytics and continuous machine learning application using Amazon Kinesis and SageMaker

Workshop Instructions

Open In SageMaker Studio Lab

Amazon SageMaker Studio Lab is a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

1. Request Amazon SageMaker Studio Lab Account

Go to Amazon SageMaker Studio Lab, and request a free acount by providing a valid email address.

Amazon SageMaker Studio Lab Amazon SageMaker Studio Lab - Request Account

Note that Amazon SageMaker Studio Lab is currently in public preview. The number of new account registrations will be limited to ensure a high quality of experience for all customers.

2. Create Studio Lab Account

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page.

You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn't require you to provide any billing information.

Amazon SageMaker Studio Lab - Create Account

3. Sign in to your Studio Lab Account

You are now ready to sign in to your account.

Amazon SageMaker Studio Lab - Sign In

4. Select your Compute instance, Start runtime, and Open project

CPU Option

Select CPU as the compute type and click Start runtime.

Amazon SageMaker Studio Lab - CPU

Once the Status shows Running, click Open project

Amazon SageMaker Studio Lab - GPU Running

5. Launch a New Terminal within Studio Lab

Amazon SageMaker Studio Lab - New Terminal

6. Clone this GitHub Repo in the Terminal

Within the Terminal, run the following:

cd ~ && git clone https://github.com/data-science-on-aws/oreilly_book

Amazon SageMaker Studio Lab - Clone Repo

7. Create data_science_on_aws Conda kernel

Within the Terminal, run the following:

cd ~/oreilly_book/ && conda env create -f environment.yml || conda env update -f environment.yml && conda activate data_science_on_aws

Amazon SageMaker Studio Lab - Create Kernel

If you see an error like the following, just ignore it. This will appear if you already have an existing Conda environment with this name. In this case, we will update the environment.

CondaValueError: prefix already exists: /home/studio-lab-user/.conda/envs/data_science_on_aws

8. Start the Workshop!

Navigate to oreilly_book/00_quickstart/ in SageMaker Studio Lab and start the workshop!

You may need to refresh your browser if you don't see the new oreilly_book/ directory.

Amazon SageMaker Studio Lab - Start Workshop

When you open the notebooks, make sure to select the data_science_on_aws kernel.

Amazon SageMaker Studio Lab - Select Kernel

Owner
Data Science on AWS
Data Science on AWS
Data Science on AWS
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
Avocado hass time series vs predict price

AVOCADO HASS TIME SERIES VÀ PREDICT PRICE Trước khi vào Heroku muốn giao diện đẹp mọi người chuyển giúp mình theo hình bên dưới https://avocado-hass.h

hieulmsc 3 Dec 18, 2021
Predict profitability of trades based on indicator buy / sell signals

Predict profitability of trades based on indicator buy / sell signals Trade profitability analysis for trades based on various indicators signals: MAC

Tomasz Porzycki 1 Dec 15, 2021
Machine Learning Course with Python:

A Machine Learning Course with Python Table of Contents Download Free Deep Learning Resource Guide Slack Group Introduction Motivation Machine Learnin

Instill AI 6.9k Jan 03, 2023
This handbook accompanies the course: Machine Learning with Hung-Yi Lee

This handbook accompanies the course: Machine Learning with Hung-Yi Lee

RenChu Wang 472 Dec 31, 2022
This is my implementation on the K-nearest neighbors algorithm from scratch using Python

K Nearest Neighbors (KNN) algorithm In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic

sonny1902 1 Jan 08, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 05, 2022
A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

Dustin Tran 1.5k Dec 29, 2022
SPCL 48 Dec 12, 2022
A repository to index and organize the latest machine learning courses found on YouTube.

📺 ML YouTube Courses At DAIR.AI we ❤️ open education. We are excited to share some of the best and most recent machine learning courses available on

DAIR.AI 9.6k Jan 01, 2023
Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 3.7k Jan 07, 2023
Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Hackerank-Nested-List Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any s

Sangeeth Mathew John 2 Dec 14, 2021
The project's goal is to show a real world application of image segmentation using k means algorithm

The project's goal is to show a real world application of image segmentation using k means algorithm

2 Jan 22, 2022
Time-series momentum for momentum investing strategy

Time-series-momentum Time-series momentum strategy. You can use the data_analysis.py file to find out the best trigger and window for a given asset an

Victor Caldeira 3 Jun 18, 2022
Stacked Generalization (Ensemble Learning)

Stacking (stacked generalization) Overview ikki407/stacking - Simple and useful stacking library, written in Python. User can use models of scikit-lea

Ikki Tanaka 192 Dec 23, 2022
A python fast implementation of the famous SVD algorithm popularized by Simon Funk during Netflix Prize

⚡ funk-svd funk-svd is a Python 3 library implementing a fast version of the famous SVD algorithm popularized by Simon Funk during the Neflix Prize co

Geoffrey Bolmier 171 Dec 19, 2022
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

PyCaret 6.7k Jan 08, 2023
This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Big Data and Multi-modal Computing Group, CRIPAC 47 Dec 30, 2022
Machine Learning for Time-Series with Python.Published by Packt

Machine-Learning-for-Time-Series-with-Python Become proficient in deriving insights from time-series data and analyzing a model’s performance Links Am

Packt 124 Dec 28, 2022
Real-time domain adaptation for semantic segmentation

Advanced-Machine-Learning This repository contains the code for the project Real

Andrea Cavallo 1 Jan 30, 2022