Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

Overview

time-series-kafka-demo

img

Mock stream producer for time series data using Kafka.

I walk through this tutorial and others here on GitHub and on my Medium blog. Here is a friend link for open access to the article on Towards Data Science: Make a mock “real-time” data stream with Python and Kafka. I'll always add friend links on my GitHub tutorials for free Medium access if you don't have a paid Medium membership (referral link).

If you find any of this useful, I always appreciate contributions to my Saturday morning fancy coffee fund!

This repo demos how to convert a csv file of timestamped data into a real-time stream useful for testing streaming analytics. An example input file with random time series data and a script for generating the file are included in the data directory.

The producer and consumer Python scripts use Confluent's Kafka client for Python, which is installed in the Docker image built with the accompanying Dockerfile, if you choose to use it.

Requires Docker and Docker Compose.

Usage

Clone repo and cd into directory.

git clone https://github.com/mtpatter/time-series-kafka-demo.git
cd time-series-kafka-demo

Start the Kafka broker

docker compose up --build

Build a Docker image (optionally, for the producer and consumer)

From the main root directory:

docker build -t "kafkacsv" .

If you want to use Docker for the python scripts, this should now work:

docker run -it --rm kafkacsv python bin/sendStream.py -h

Start a consumer

To start a consumer for printing all messages in real-time from the stream "my-stream":

python bin/processStream.py my-stream

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/processStream.py my-stream

Produce a time series stream

Send time series from data/data.csv to topic “my-stream”, and speed it up by a factor of 10.

python bin/sendStream.py data/data.csv my-stream --speed 10

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/sendStream.py data/data.csv my-stream --speed 10

Shut down and clean up

Stop the consumer with Return and Ctrl+C.

Shutdown Kafka broker system:

docker compose down
Automatic links from code examples to reference documentation

sphinx-codeautolink Automatic links from Python code examples to reference documentation at the flick of a switch! sphinx-codeautolink analyses the co

Felix Hildén 41 Dec 17, 2022
Python document object mapper (load python object from JSON and vice-versa)

lupin is a Python JSON object mapper lupin is meant to help in serializing python objects to JSON and unserializing JSON data to python objects. Insta

Aurélien Amilin 24 Nov 09, 2022
Count the number of lines of code in a directory, minus the irrelevant stuff

countloc Simple library to count the lines of code in a directory (excluding stuff like node_modules) Simply just run: countloc node_modules args to

Anish 4 Feb 14, 2022
Python code for working with NFL play by play data.

nfl_data_py nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout. Includes im

82 Jan 05, 2023
freeCodeCamp Scientific Computing with Python Project for Certification.

Polygon_Area_Calculator freeCodeCamp Python Project freeCodeCamp Scientific Computing with Python Project for Certification. In this project you will

Rajdeep Mondal 1 Dec 23, 2021
🐱‍🏍 A curated list of awesome things related to Hugo themes.

awesome-hugo-themes Automated deployment @ 2021-10-12 06:24:07 Asia/Shanghai &sorted=updated Theme Author License GitHub Stars Updated Blonde wamo MIT

13 Dec 12, 2022
Generating a report CSV and send it to an email - Python / Django Rest Framework

Generating a report in CSV format and sending it to a email How to start project. Create a folder in your machine Create a virtual environment python3

alexandre Lopes 1 Jan 17, 2022
A `:github:` role for Sphinx

sphinx-github-role A github role for Sphinx. Usage Basic usage MyST: :caption: index.md See {github}`astrojuanlu/sphinx-github-role#1`. reStructuredT

Juan Luis Cano Rodríguez 4 Nov 22, 2022
Minimal reproducible example for `mkdocstrings` Python handler issue

Minimal reproducible example for `mkdocstrings` Python handler issue

Hayden Richards 0 Feb 17, 2022
Generate a backend and frontend stack using Python and json-ld, including interactive API documentation.

d4 - Base Project Generator Generate a backend and frontend stack using Python and json-ld, including interactive API documentation. d4? What is d4 fo

Markus Leist 3 May 03, 2022
Some code that takes a pipe-separated input and converts that into a table!

tablemaker A program that takes an input: a | b | c # With comments as well. e | f | g h | i |jk And converts it to a table: ┌───┬───┬────┐ │ a │ b │

CodingSoda 2 Aug 30, 2022
Legacy python processor for AsciiDoc

AsciiDoc.py This branch is tracking the alpha, in-progress 10.x release. For the stable 9.x code, please go to the 9.x branch! AsciiDoc is a text docu

AsciiDoc.py 178 Dec 25, 2022
Generate modern Python clients from OpenAPI

openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need

555 Jan 02, 2023
Resource hub for Obsidian resources.

Obsidian Community Vault Welcome! This is an experimental vault that is maintained by the Obsidian community. For best results we recommend downloadin

Obsidian Community 320 Jan 02, 2023
Dynamic Resume Generator

Dynamic Resume Generator

Quinten Lisowe 15 May 19, 2022
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with

Curvenote 15 Dec 21, 2022
Lightweight, configurable Sphinx theme. Now the Sphinx default!

What is Alabaster? Alabaster is a visually (c)lean, responsive, configurable theme for the Sphinx documentation system. It is Python 2+3 compatible. I

Jeff Forcier 670 Dec 19, 2022
A web app builds using streamlit API with python backend to analyze and pick insides from multiple data formats.

Data-Analysis-Web-App Data Analysis Web App can analysis data in multiple formates(csv, txt, xls, xlsx, ods, odt) and gives shows you the analysis in

Kumar Saksham 19 Dec 09, 2022
Obmovies - A short guide on setting up the system and environment dependencies required for ob's Movies database

Obmovies - A short guide on setting up the system and environment dependencies required for ob's Movies database

1 Jan 04, 2022
Material for the ros2 crash course

Material for the ros2 crash course

Emmanuel Dean 1 Jan 22, 2022