PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.

Overview

PdpCLI

Actions Status Python version PyPI version License

Quick Links

Introduction

PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline powered by pdpipe from a configuration file. You can also extend pipeline stages and data readers / writers by using your own python scripts.

Features

  • Process pandas DataFrame from CLI without wrting Python scripts
  • Support multiple configuration file formats: YAML, JSON, Jsonnet
  • Read / write data files in the following formats: CSV, TSV, JSON, JSONL, pickled DataFrame
  • Import / export data with multiple protocols: S3 / Databse (MySQL, Postgres, SQLite, ...) / HTTP(S)
  • Extensible pipeline and data readers / writers

Installation

Installing the library is simple using pip.

$ pip install "pdpcli[all]"

Tutorial

Basic Usage

  1. Write a pipeline config file config.yml like below. The type fields under pipeline correspond to the snake-cased class names of the PdpipelineStages. Other fields such as stage and columns are the parameters of the __init__ methods of the corresponging classes. Internally, this configuration file is converted to Python objects by colt.
pipeline:
  type: pipeline
  stages:
    drop_columns:
      type: col_drop
      columns:
        - name
        - job

    encode:
      type: one_hot_encode
      columns: sex

    tokenize:
      type: tokenize_text
      columns: content

    vectorize:
      type: tfidf_vectorize_token_lists
      column: content
      max_features: 10
  1. Build a pipeline by training on train.csv. The following command generages a pickled pipeline file pipeline.pkl after training. If you specify a URL of file path, it will be automatically downloaded and cached.
$ pdp build config.yml pipeline.pkl --input-file https://github.com/altescy/pdpcli/raw/main/tests/fixture/data/train.csv
  1. Apply the fitted pipeline to test.csv and get output of a processed file processed_test.jsonl by the following command. PdpCLI automatically detects the output file format based on the file name. In this example, the processed DataFrame will be exported as the JSON-Lines format.
$ pdp apply pipeline.pkl https://github.com/altescy/pdpcli/raw/main/tests/fixture/data/test.csv --output-file processed_test.jsonl
  1. You can also directly run the pipeline from a config file without fitting pipeline.
$ pdp apply config.yml test.csv --output-file processed_test.jsonl
  1. It is possible to override or add parameters by adding command line arguments:
pdp apply config.yml test.csv pipeline.stages.drop_columns.column=name

Data Reader / Writer

PdpCLI automatically detects a suitable data reader / writer based on a given file name. If you need to use the other data reader / writer, add a reader or writer config to config.yml. The following config is an exmaple to use SQL data reader. SQL reader fetches records from the specified database and converts them into a pandas DataFrame.

reader:
    type: sql
    dsn: postgres://${env:POSTGRES_USER}:${env:POSTGRES_PASSWORD}@your.posgres.server/your_database

Config files are interpreted by OmegaConf, so ${env:...} is interpolated by environment variables.

Prepare yuor SQL file query.sql to fetch data from the database:

select * from your_table limit 1000

You can execute the pipeline with SQL data reader via:

$ POSTGRES_USER=user POSTGRES_PASSWORD=password pdp apply config.yml query.sql

Plugins

By using plugins, you can extend PdpCLI. This plugin feature enables you to use your own pipeline stages, data readers / writers and commands.

Add a new stage

  1. Write your plugin script mypdp.py like below. Stage.register(" ") registers your pipeline stages, and you can specify these stages by writing type: in your config file.
import pdpcli

@pdpcli.Stage.register("print")
class PrintStage(pdpcli.Stage):
    def _prec(self, df):
        return True

    def _transform(self, df, verbose):
        print(df.to_string(index=False))
        return df
  1. Update config.yml to use your plugin.
pipeline:
    type: pipeline
    stages:
        drop_columns:
        ...

        print:
            type: print

        encode:
        ...
  1. Execute command with --module mypdp and you can see the processed DataFrame after running drop_columns.
$ pdp apply config.yml test.csv --module mypdp

Add a new command

You can also add new commands not only stages.

  1. Add the following script to mypdp.py. This greet command prints out a greeting message with your name.
@pdpcli.Subcommand.register(
    name="greet",
    description="say hello",
    help="say hello",
)
class GreetCommand(pdpcli.Subcommand):
    requires_plugins = False

    def set_arguments(self):
        self.parser.add_argument("--name", default="world")

    def run(self, args):
        print(f"Hello, {args.name}!")
  1. To register this command, you need to create the .pdpcli_plugins file in which module names are listed for each line. Due to module importing order, the --module option is unavailable for command registration.
.pdpcli_plugins ">
$ echo "mypdp" > .pdpcli_plugins
  1. Run the following command and get a message like below. By using the .pdpcli_plugins file, it is is not needed to add the --module option to a command line for each execution.
$ pdp greet --name altescy
Hello, altescy!
You might also like...
gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases.
gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases.

gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

CLI program that allows you to change your Alacritty config with one command without editing the config file.
CLI program that allows you to change your Alacritty config with one command without editing the config file.

Pycritty Change your alacritty config on the fly! Installation: pip install pycritty By default, only the program itself will be installed, but you ca

This is a CLI program which can help you generate your own QR Code.

Python-QR-code-generator This is a CLI program which can help you generate your own QR Code. Single.py This will allow you only to input a single mess

A CLI tool that scans through a directory and organizes all loose files into folders by file type.
A CLI tool that scans through a directory and organizes all loose files into folders by file type.

Organizer CLI Organizer CLI is a python command line tool that goes through a given directory and organizes all un-folder bound files into folders by

tox-server is a command line tool which runs tox in a loop and calls it with commands from a remote CLI.

Tox Server tox-server is a command line tool which runs tox in a loop and calls it with commands from a remote CLI. It responds to commands via ZeroMQ

Pyrdle - Play Wordle in the CLI. Write an algorithm to play Wordle for you. Ruin all of the fun you've been having
Pyrdle - Play Wordle in the CLI. Write an algorithm to play Wordle for you. Ruin all of the fun you've been having

Pyrdle - Play Wordle in the CLI. Write an algorithm to play Wordle for you. Ruin all of the fun you've been having

flora-dev-cli (fd-cli) is command line interface software to interact with flora blockchain.

Install git clone https://github.com/Flora-Network/fd-cli.git cd fd-cli python3 -m venv venv source venv/bin/activate pip install -e . --extra-index-u

Python-Stock-Info-CLI: Get stock info through CLI by passing stock ticker.
Python-Stock-Info-CLI: Get stock info through CLI by passing stock ticker.

Python-Stock-Info-CLI Get stock info through CLI by passing stock ticker. Installation Use the following command to install the required modules at on

Releases(v0.4.1)
Owner
Yasuhiro Yamaguchi
Research Engineer / NLP / ML
Yasuhiro Yamaguchi
Ideas on how to quickly learn to build command-line tools

CLI-Bootcamp Ideas on how to quickly learn to build command-line tools Part 1-Bash Week1: Using Linux Lesson 1: Using Linux Shell Lab Lesson 2: How sh

Noah Gift 10 Apr 18, 2022
Terminal with builtin ortholinear keyboard and touch screen as a home automation interface.

OLKB-Terminal Terminal with builtin ortholinear keyboard and touch screen as a home automation interface. Features Step and STLs available for non-com

Jeff Eberl 50 Oct 07, 2022
Another (unofficial) Qt CLI Installer on multi-platforms

Another Qt installer(aqt) Release: Documentation: Test status: and Coverage: This is a utility alternative to the official graphical Qt installer, for

Hiroshi Miura 528 Jan 02, 2023
A CLI messenger for the Signum community.

A CLI messenger for the Signum community. Built for people who like using terminal for their work and want to communicate with other users in the Signum community.

Jush 5 Mar 18, 2022
A python script that enables a raspberry pi sd card through the CLI and automates the process of configuring network details and ssh.

This project is one script (wpa_helper.py) written in python that will allow for the user to automate the proccess of setting up a new boot disk and configuring ssh and network settings for the pi

Theo Kirby 6 Jun 24, 2021
An easy-to-bundle GTK terminal emulator.

EasyTerm An easy-to-bundle GTK terminal emulator. This project is meant to be used as a dependency for other

Bottles 4 May 15, 2022
Python CLI for accessing CSCI320 PDM Database

p320_14 Python CLI for accessing CSCI320 PDM Database Authors: Aidan Mellin Dan Skigen Jacob Auger Kyle Baptiste Before running the application for th

Aidan Mellin 1 Nov 23, 2021
Open-Source Python CLI package for copying DynamoDB tables and items in parallel batch processing + query natural & Global Secondary Indexes (GSIs)

Python Command-Line Interface Package to copy Dynamodb data in parallel batch processing + query natural & Global Secondary Indexes (GSIs).

1 Oct 31, 2021
Display Images in your terminal with python

A python library to display images in the terminal

Pranav Baburaj 57 Dec 30, 2022
Chameleon is yet another PowerShell obfuscation tool designed to bypass AMSI and commercial antivirus solutions.

Chameleon is yet another PowerShell obfuscation tool designed to bypass AMSI and commercial antivirus solutions. The tool has been developed as a Python port of the Chimera project, by tokioneon_.

332 Dec 26, 2022
Jupyter notebook client in neovim

🪐 Jupyter-Nvim Read jupyter notebooks in neovim Note: The plugin is still in alpha stage 👾 Usage Just open any *.ipynb file and voila! ✨ Contributin

Ahmed Khalf 85 Dec 29, 2022
💥 Share files easily over your local network from the terminal!

Fileshare 📨 Share files easily over your local network from the terminal! 📨 Installation # clone the repo $ git clone https://github.com/dopevog/fil

Dopevog 11 Sep 10, 2021
Tool for HackMyVM platform

HMV-cli It is a tool for the HackMyVM platform. With this tool you will be able to see the machines you have pending, filter by difficulty, download d

bitc0de 11 Sep 19, 2022
The Prisma Cloud CLI is a command line interface for Prisma Cloud by Palo Alto Networks.

Prisma Cloud CLI The Prisma Cloud CLI is a command line interface for Prisma Cloud by Palo Alto Networks. Support This project has been developed by P

Palo Alto Networks 13 Oct 14, 2022
Dead simple CLI tool to try Python packages - It's never been easier! :package:

try - It's never been easier to try Python packages try is an easy-to-use cli tool to try out Python packages. Features Install specific package versi

Timo Furrer 659 Dec 28, 2022
A simple web-based SSH client.

Kommander A simple web-based SSH client. It supports: entering SSH login details (including private key and custom ports) and connecting user authenti

KingWaffleIII 2 Jan 01, 2022
NudeNet wrapper made to provide a simple cli interface to the library

Nudenet Wrapper. Small warpper script for NudeNet Made to provide a small and easy to use cli interface with the library. You can indicate a single im

1 Oct 20, 2021
A dec-bin converter uses 2's complement.

2's Complement Dec-Bin Converter A dec-bin converter uses 2's complement. Visit my Medium Post. What is 2's complement? Two's complement is the most c

Khaw Chi Hun (Jacky) 9 Mar 01, 2022
A python Ethereum utilities command-line tool.

peth-cli A python Ethereum utilities command-line tool. After wasting the all day trying to install seth and failed, I took another day to write this.

Moon 55 Nov 15, 2022
Shellmon is a tool used to create and control a webshell remotely, created using the Python3

An Simple PHP Webshell Manager Description Shellmon is a tool used to create and control a webshell remotely, created using the Python3 programming la

22XploiterCrew 12 Dec 30, 2022