data-science-on-gcp
Source code accompanying book:
Data Science on the Google Cloud Platform, 2nd Edition Valliappa Lakshmanan O'Reilly, Jan 2022 |
Branch edition2 [being built] | |
Data Science on the Google Cloud Platform Valliappa Lakshmanan O'Reilly, Jan 2017 |
Branch edition1_tf2 (also: main) |
Try out the code on Google Cloud Platform
The code on Qwiklabs (see below) is continually tested, and this repo is kept up-to-date. The code should work as-is for you, however, there are three very common problems that readers report:
- Ch 2: Download data fails. The Bureau of Transportation website to download the airline dataset periodically goes down or changes availability due to government furloughs and the like. Please use the instructions in 02_ingest/README.md to copy the data from my bucket. The rest of the chapters work off the data in the bucket, and will be fine.
- Ch 3: Permission errors. These typically occur because we expect that you will copy the airline data to your bucket. You don't have write access to gs://cloud-training-demos-ml/. The instructions will tell you to change the bucket name to one that you own. Please do that.
- Ch 4, 10: Dataflow doesn't do anything.. The real-time simulation requires that you simultaneously run simulate.py and the Dataflow pipeline. If the Dataflow pipeline is not progressing, make sure that the simulate program is still running.
If the code doesn't work for you, I recommend that you try the corresponding Qwiklab lab to see if there is some step that you missed. If you still have problems, please leave feedback in Qwiklabs, or file an issue in this repo.
Try out the code on Qwiklabs
- Data Science on the Google Cloud Platform Quest
- Data Science on Google Cloud Platform: Machine Learning Quest
Purchase book
Read on-line or download PDF of book
Updates to book
I updated the book in Nov 2019 with TensorFlow 2.0, Cloud Functions, and BigQuery ML.