This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

Overview

MLOps with Vertex AI

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example use Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.

MLOps lifecycle

Getting started

  1. Setup your MLOps environment on Google Cloud.

  2. Start your AI Notebook instance.

  3. Open the JupyterLab then open a new Terminal

  4. Clone the repository to your AI Notebook instance:

    git clone https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai.git
    cd mlops-with-vertex-ai
    
  5. Install the required Python packages:

    pip install tfx==1.2.0 --user
    pip install -r requirements.txt
    

    NOTE: You can ignore the pip dependencies issues. These will be fixed when upgrading to subsequent TFX version.


  6. Upgrade the gcloud components:

    sudo apt-get install google-cloud-sdk
    gcloud components update
    

Dataset Management

The Chicago Taxi Trips dataset is one of public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.

The 01-dataset-management notebook covers:

  1. Performing exploratory data analysis on the data in BigQuery.
  2. Creating Vertex AI Dataset resource using the Python SDK.
  3. Generating the schema for the raw data using TensorFlow Data Validation.

ML Development

We experiment with creating a Custom Model using 02-experimentation notebook, which covers:

  1. Preparing the data using Dataflow.
  2. Implementing a Keras classification model.
  3. Training the Keras model with Vertex AI using a pre-built container.
  4. Upload the exported model from Cloud Storage to Vertex AI.
  5. Extract and visualize experiment parameters from Vertex AI Metadata.
  6. Use Vertex AI for hyperparameter tuning.

We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.

In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.

Training Operationalization

The 04-pipeline-deployment notebook covers executing the CI/CD steps for the training pipeline deployment using Cloud Build. The CI/CD routine is defined in the pipeline-deployment.yaml file, and consists of the following steps:

  1. Clone the repository to the build environment.
  2. Run unit tests.
  3. Run a local e2e test of the TFX pipeline.
  4. Build the ML container image for pipeline steps.
  5. Compile the pipeline.
  6. Upload the pipeline to Cloud Storage.

Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism. The Cloud Function listens to the Pub/Sub topic, and runs the training pipeline given a message sent to the Pub/Sub topic. The Cloud Function is implemented in src/pipeline_triggering.

The 05-continuous-training notebook covers:

  1. Creating a Cloud Pub/Sub topic.
  2. Deploying a Cloud Function.
  3. Triggering the pipeline.

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

  1. Receive hyper-parameters using hyperparam_gen custom python component.
  2. Extract data from BigQuery using BigQueryExampleGen component.
  3. Validate the raw data using StatisticsGen and ExampleValidator component.
  4. Process the data using on Dataflow Transform component.
  5. Train a custom model with Vertex AI using Trainer component.
  6. Evaluate and validate the custom model using ModelEvaluator component.
  7. Save the blessed to model registry location in Cloud Storage using Pusher component.
  8. Upload the model to Vertex AI using vertex_model_pusher custom python component.

Model Deployment

The 06-model-deployment notebook covers executing the CI/CD steps for the model deployment using Cloud Build. The CI/CD routine is defined in build/model-deployment.yaml file, and consists of the following steps:

  1. Test model interface.
  2. Create an endpoint in Vertex AI.
  3. Deploy the model to the endpoint.
  4. Test the Vertex AI endpoint.

Prediction Serving

We serve the deployed model for prediction. The 07-prediction-serving notebook covers:

  1. Use the Vertex AI endpoint for online prediction.
  2. Use the Vertex AI uploaded model for batch prediction.
  3. Run the batch prediction using Vertex Pipelines.

Model Monitoring

After a model is deployed in for prediction serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and drift detection:

  1. Set skew and drift threshold.
  2. Create a monitoring job for all the models under and endpoint.
  3. List the monitoring jobs.
  4. List artifacts produced by monitoring job.
  5. Pause and delete the monitoring job.

Metadata Tracking

You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by your Vertex Pipelines in Cloud Console.

Disclaimer

This is not an official Google product but sample code provided for an educational purpose.


Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Google Cloud Platform
Google Cloud Platform
OBG-FCN - implementation of 'Object Boundary Guided Semantic Segmentation'

OBG-FCN This repository is to reproduce the implementation of 'Object Boundary Guided Semantic Segmentation' in http://arxiv.org/abs/1603.09742 Object

Jiu XU 3 Mar 11, 2019
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

78 Dec 27, 2022
Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

Graph Convolutional Gated Recurrent Neural Network (GCGRNN) Improved from Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF

Lei Lin 21 Dec 18, 2022
Weakly-supervised object detection.

Wetectron Wetectron is a software system that implements state-of-the-art weakly-supervised object detection algorithms. Project CVPR'20, ECCV'20 | Pa

NVIDIA Research Projects 342 Jan 05, 2023
A script depending on VASP output for calculating Fermi-Softness.

Fermi softness calculation for Vienna Ab initio Simulation Package (VASP) Update 1.1.0: Big update: Rewrote the code. Use Bader atomic division instea

qslin 11 Nov 08, 2022
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

417 Dec 20, 2022
TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

ICNet_tensorflow This repo provides a TensorFlow-based implementation of paper "ICNet for Real-Time Semantic Segmentation on High-Resolution Images,"

HsuanKung Yang 406 Nov 27, 2022
The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

TNDM - Targeted Neural Dynamical Modeling Note: This code is no longer being updated. The official re-implementation can be found at: https://github.c

1 Jul 21, 2022
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Kai Zhang 1.2k Dec 29, 2022
Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

MC-GAN in PyTorch This is the implementation of the Multi-Content GAN for Few-Shot Font Style Transfer. The code was written by Samaneh Azadi. If you

Samaneh Azadi 422 Dec 04, 2022
Codebase for BMVC 2021 paper "Text Based Person Search with Limited Data"

Text Based Person Search with Limited Data This is the codebase for our BMVC 2021 paper. Please bear with me refactoring this codebase after CVPR dead

Xiao Han 33 Nov 24, 2022
Image Completion with Deep Learning in TensorFlow

Image Completion with Deep Learning in TensorFlow See my blog post for more details and usage instructions. This repository implements Raymond Yeh and

Brandon Amos 1.3k Dec 23, 2022
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Blended Diffusion for Text-driven Editing of Natural Images Blended Diffusion for Text-driven Editing of Natural Images Omri Avrahami, Dani Lischinski

328 Dec 30, 2022
tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

tf2onnx converts TensorFlow (tf-1.x or tf-2.x), tf.keras and tflite models to ONNX via command line or python api.

Open Neural Network Exchange 1.8k Jan 08, 2023
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
Biomarker identification for COVID-19 Severity in BALF cells Single-cell RNA-seq data

scBALF Covid-19 dataset Analysis Here is the Github page that has the codes for the bioinformatics pipeline described in the paper COVID-Datathon: Bio

Nami Niyakan 2 May 21, 2022