Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Discord rich-presence implementation for VALORANT

not working on v1 anymore in favor of v2, but if there's any big bugs i'll try to fix them valorant-rich-presence-client Discord rich presence extensi

colinh 278 Jan 08, 2023
Linkvertise-Bypass - Bypass Linkvertise advertisement

Linkvertise-Bypass Bypass Linkvertise advertisement 📕 instructions Copy And Pas

Flex Tools 4 Jun 10, 2022
⚡️ Get notified as soon as your next CPU, GPU, or game console is in stock

Inventory Hunter This bot helped me snag an RTX 3070... hopefully it will help you get your hands on your next CPU, GPU, or game console. Requirements

Eric Marti 1.1k Dec 26, 2022
Acid's Utilities is a bot for my Discord server that alerts when I go live, welcomes new users, has some awesome games and so much more!

Acid's Utilities Acid's Utilities is a bot for my Discord server that alerts when I go live, welcomes new users, has some awesome games and so much mo

AcidFilms (Fin Stuart) 3 Nov 19, 2021
Get notifications in your Discord server of any software releases from Apple.

Apple Releases Get notifications in your Discord server of any software releases from Apple. Running To locally host your own instance, create a Disco

adam 17 Oct 22, 2022
An elegant mirai-api-http v2 Python SDK.

Ariadne 一个适用于 mirai-api-http v2 的 Python SDK。 本项目适用于 mirai-api-http 2.0 以上版本。 目前仍处于开发阶段,内部接口可能会有较大的变化。 安装 poetry add graia-ariadne 或 pip install graia

Graia Project 259 Jan 02, 2023
⚡ PoC: Hide a c&c botnet in the discord client. (Proof Of Concept)

👨‍💻 Discord Self Bot 👨‍💻 A Discord Self-Bot in Python by natrix Installation Run: selfbot.bat Python: version : 3.8 Modules

0хVιcнy#1337 37 Oct 21, 2022
Baota-docker - Deploying baota panel via docker

baota-docker Deploying baota panel via docker. 通过docker一键部署宝塔面板。 一、前言 好像很多人对这个感兴

Mr. Cat 15 Dec 12, 2022
This Server Cloner can clone the server you want with all the perms of roles in every particular channel.

Server-Cloner-with-perms 🚀 This Server Cloner can clone the server you want with all the perms of roles in every particular channel. Features Clone C

Gripz 0 Feb 17, 2022
WhatsApp Multi Device Client

WhatsApp Multi Device Client

23 Nov 18, 2022
Beyonic API Python official client library simplified examples using Flask, Django and Fast API.

Beyonic API Python official client library simplified examples using Flask, Django and Fast API.

Enumerate Microsoft 365 Groups in a tenant with their metadata

Enumerate Microsoft 365 Groups in a tenant with their metadata Description The all_groups.py script allows to enumerate all Microsoft 365 Groups in a

Clément Notin 46 Dec 26, 2022
A Advanced Powerful, Smart And Intelligent Group Management Bot With New And Powerful Features

Vegeta Robot A Advanced Powerful, Smart And Intelligent Group Management Bot With New And Powerful Features ... Written with Pyrogram and Telethon...

⚡ CT_PRO ⚡ 9 Nov 16, 2022
Compulsory join Telegram Bot

mussjoin About Compulsory join Telegram Bot this Telegram Bot Application can be added users to Telegram Channel or Group compulsorily. in addition wh

Hamed Mohammadvand 4 Dec 03, 2021
Telegram bot for stream music or video on telegram

KYURA MUSIC Telegram bot for stream music or video on telegram, powered by PyTgCalls and Pyrogram Help Need Help me to translate this repo, click the

0 Dec 08, 2022
⚡ ʑɠ ცơɬ Is One Of The Fastest & Smoothest Bot On Telegram Based on Telethon ⚡

『ʑɠ ცơɬ』 ⚡ ʑɠ ცơɬ Is One Of The Fastest & Smoothest Bot On Telegram Based on Telethon ⚡ Status Of Bot Telegram 🏪 Dєρℓογ το нєяοκυ Variables APP_ID =

ʑɑʑɓɦɑɪ 0 Feb 12, 2022
Security Monkey monitors AWS, GCP, OpenStack, and GitHub orgs for assets and their changes over time.

NOTE: Security Monkey is in maintenance mode and will be end-of-life in 2020. For AWS users, please make use of AWS Config. For GCP users, please make

Netflix, Inc. 4.3k Jan 09, 2023
Balanced API library in python.

Balanced Online Marketplace Payments v1.x requires Balanced API 1.1. Use v0.x for Balanced API 1.0. Installation pip install balanced Usage View Bala

Balanced 70 Oct 04, 2022
Minimal Python client for the Iris API, built on top of Authlib and httpx.

🕸️ Iris Python Client Minimal Python client for the Iris API, built on top of Authlib and httpx. Installation pip install dioptra-iris-client Usage f

Dioptra 1 Jan 28, 2022
Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

LUSID® Python SDK This is the Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilit

FINBOURNE 6 Dec 24, 2022