Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Overview

👩‍✈️ Coqpit

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Work in progress... 🌡️

Why I need this

What I need from a ML configuration library...

  1. Fixing a general config schema in Python to guide users about expected values.

    Python is good but not universal. Sometimes you train a ML model and use it on a different platform. So, you need your model configuration file importable by other programming languages.

  2. Simple dynamic value and type checking with default values.

    If you are a beginner in a ML project, it is hard to guess the right values for your ML experiment. Therefore it is important to have some default values and know what range and type of input are expected for each field.

  3. Ability to decompose large configs.

    As you define more fields for the training dataset, data preprocessing, model parameters, etc., your config file tends to get quite large but in most cases, they can be decomposed, enabling flexibility and readability.

  4. Inheritance and nested configurations.

    Simply helps to keep configurations consistent and easier to maintain.

  5. Ability to override values from the command line when necessary.

    For instance, you might need to define a path for your dataset, and this changes for almost every run. Then the user should be able to override this value easily over the command line.

    It also allows easy hyper-parameter search without changing your original code. Basically, you can run different models with different parameters just using command line arguments.

  6. Defining dynamic or conditional config values.

    Sometimes you need to define certain values depending on the other values. Using python helps to define the underlying logic for such config values.

  7. No dependencies

    You don't want to install a ton of libraries for just configuration management. If you install one, then it is better to be just native python.

🔍 Examples

👉 Serialization

import os
from dataclasses import asdict, dataclass, field
from coqpit import Coqpit, check_argument
from typing import List, Union


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_c: str = "Coqpit is great!"

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_c', c, restricted=True)


@dataclass
class NestedConfig(Coqpit):
    val_d: int = 10
    val_e: int = None
    val_f: str = "Coqpit is great!"
    sc_list: List[SimpleConfig] = None
    sc: SimpleConfig = SimpleConfig()
    union_var: Union[List[SimpleConfig], SimpleConfig] = field(default_factory=lambda: [SimpleConfig(),SimpleConfig()])

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_d', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_e', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_f', c, restricted=True)
        check_argument('sc_list', c, restricted=True, allow_none=True)
        check_argument('sc', c, restricted=True, allow_none=True)


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    # init 🐸 dataclass
    config = NestedConfig()

    # save to a json file
    config.save_json(os.path.join(file_path, 'example_config.json'))
    # load a json file
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the json file.
    config2.load_json(os.path.join(file_path, 'example_config.json'))
    # now they should be having the same values.
    assert config == config2

    # pretty print the dataclass
    print(config.pprint())

    # export values to a dict
    config_dict = config.to_dict()
    # crate a new config with different values than the defaults
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the exported valuess from the previous config.
    config2.from_dict(config_dict)
    # now they should be having the same values.
    assert config == config2

👉 argparse handling and parsing.

import argparse
import os
from dataclasses import asdict, dataclass, field
from typing import List

from coqpit.coqpit import Coqpit, check_argument
import sys


@dataclass
class SimplerConfig(Coqpit):
    val_a: int = field(default=None, metadata={'help': 'this is val_a'})


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = field(default=10,
                       metadata={'help': 'this is val_a of SimpleConfig'})
    val_b: int = field(default=None, metadata={'help': 'this is val_b'})
    val_c: str = "Coqpit is great!"
    mylist_with_default: List[SimplerConfig] = field(
        default_factory=lambda:
        [SimplerConfig(val_a=100),
         SimplerConfig(val_a=999)],
        metadata={'help': 'list of SimplerConfig'})

    # mylist_without_default: List[SimplerConfig] = field(default=None, metadata={'help': 'list of SimplerConfig'})  # NOT SUPPORTED YET!

    def check_values(self, ):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b',
                       c,
                       restricted=True,
                       min_val=128,
                       max_val=4058,
                       allow_none=True)
        check_argument('val_c', c, restricted=True)


def main():
    file_path = os.path.dirname(os.path.abspath(__file__))

    # initial config
    config = SimpleConfig()
    print(config.pprint())

    # reference config that we like to match with the config above
    config_ref = SimpleConfig(val_a=222,
                              val_b=999,
                              val_c='this is different',
                              mylist_with_default=[
                                  SimplerConfig(val_a=222),
                                  SimplerConfig(val_a=111)
                              ])

    # create and init argparser with Coqpit
    parser = argparse.ArgumentParser()
    parser = config.init_argparse(parser)
    parser.print_help()
    args = parser.parse_args()

    # parse the argsparser
    config.from_argparse(args)
    config.pprint()
    # check the current config with the reference config
    assert config == config_ref


if __name__ == '__main__':
    sys.argv.extend(['--coqpit.val_a', '222'])
    sys.argv.extend(['--coqpit.val_b', '999'])
    sys.argv.extend(['--coqpit.val_c', 'this is different'])
    sys.argv.extend(['--coqpit.mylist_with_default.0.val_a', '222'])
    sys.argv.extend(['--coqpit.mylist_with_default.1.val_a', '111'])
    main()

🤸‍♀️ Merging coqpits

import os
from dataclasses import dataclass
from coqpit.coqpit import Coqpit, check_argument


@dataclass
class CoqpitA(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_d: float = 10.21
    val_c: str = "Coqpit is great!"


@dataclass
class CoqpitB(Coqpit):
    val_d: int = 25
    val_e: int = 257
    val_f: float = -10.21
    val_g: str = "Coqpit is really great!"


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    coqpita = CoqpitA()
    coqpitb = CoqpitB()
    coqpitb.merge(coqpita)
    print(coqpitb.val_a)
    print(coqpitb.pprint())
Comments
  • Allow file-like objects when saving and loading

    Allow file-like objects when saving and loading

    Allow users to save the configs to arbitrary locations through file-like objects. Would e.g. simplify coqui-ai/TTS#683 without adding an fsspec dependency to this library.

    opened by agrinh 6
  • Latest PR causes an issue when a `Serializable` has default None

    Latest PR causes an issue when a `Serializable` has default None

    https://github.com/coqui-ai/coqpit/blob/5379c810900d61ae19d79b73b03890fa103487dd/coqpit/coqpit.py#L539

    @reuben I am on it but if you have an easy fix go for it. Right now it breaks all the TTS trainings.

    opened by erogol 2
  • [feature request] change the `arg_perfix` of coqpit

    [feature request] change the `arg_perfix` of coqpit

    Is it possible to change the arg_perfix when using Coqpit object to another value / empty string? I see the option is supported in the code by changing arg_perfix, but not sure how to access it using the proposed API.

    Thanks for the package, looks very useful!

    opened by mosheman5 1
  • Setup CI to push new tags to PyPI automatically

    Setup CI to push new tags to PyPI automatically

    I'm gonna add a workflow to automatically upload new tags to PyPI. @erogol when you have a chance could you transfer the coqpit project on PyPI to the coqui user?[0] Then you can add your personal account as a maintainer also, so you don't have to change your local setup.

    In the mean time I'll iterate on testpypi.

    [0] https://pypi.org/user/coqui/

    opened by reuben 1
  • Fix rsetattr

    Fix rsetattr

    rsetattr() is updated to pass the new test cases below.

    I don't know if it is the right solution. It might be that rsetattr confuses when coqpit is used as a prefix.

    opened by erogol 0
  • [feature request] Warning when unexpected key is loaded but not present in class

    [feature request] Warning when unexpected key is loaded but not present in class

    Here is an toy scenario where it would be nice to have a warning

    from dataclasses import dataclass
    from coqpit import Coqpit
    
    @dataclass
    class SimpleConfig(Coqpit):
        val_a: int = 10
        val_b: int = None
    
    if __name__ == "__main__":
        config = SimpleConfig()
    
        tmp_config = config.to_dict()
        tmp_config["unknown_key"] = "Ignored value"
        config.from_dict(tmp_config)
        print(config.to_json())
    

    There the value of config.to_json() is

    {
        "val_a": 10,
        "val_b": null
    }
    

    Which is expected behaviour, but we should get a warning that some keys were ignored (IMO)

    feature request 
    opened by WeberJulian 6
  • [feature request] Add `is_defined`

    [feature request] Add `is_defined`

    Use coqpit.is_defined('field') to check if "field" in coqpit and coqpit.field is not None:

    It is a common condition when you parse out a coqpit object.

    feature request 
    opened by erogol 0
  • Allow grouping of argparse fields according to subclassing

    Allow grouping of argparse fields according to subclassing

    When using inheritance to extend config definitions the resulting ArgumentParser has all fields flattened out. It would be nice to group fields by class and allow some control over ordering.

    opened by reuben 2
Releases(v0.0.17)
Owner
Eren Gölge
AI researcher @Coqui.ai
Eren Gölge
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 05, 2023
This is my implementation on the K-nearest neighbors algorithm from scratch using Python

K Nearest Neighbors (KNN) algorithm In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic

sonny1902 1 Jan 08, 2022
Sequence learning toolkit for Python

seqlearn seqlearn is a sequence classification toolkit for Python. It is designed to extend scikit-learn and offer as similar as possible an API. Comp

Lars 653 Dec 27, 2022
Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

Tishya S 1 Oct 28, 2021
Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

X - Data Science Initiative 410 Dec 14, 2022
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map,

Mohamadreza Rezaei 1 Jan 19, 2022
Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them.

Anirudh Edpuganti 3 Apr 03, 2022
This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

MLProject_01 This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev Context Dataset English question data set file F

Hadi Nakhi 1 Dec 18, 2021
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 02, 2023
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 06, 2023
A webpage that utilizes machine learning to extract sentiments from tweets.

Tweets_Classification_Webpage The goal of this project is to be able to predict what rating customers on social media platforms would give to products

Ayaz Nakhuda 1 Dec 30, 2021
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 01, 2023
Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022
Fit interpretable models. Explain blackbox machine learning.

InterpretML - Alpha Release In the beginning machines learned in darkness, and data scientists struggled in the void to explain them. Let there be lig

InterpretML 5.2k Jan 09, 2023
Upgini : data search library for your machine learning pipelines

Automated data search library for your machine learning pipelines → find & deliver relevant external data & features to boost ML accuracy :chart_with_upwards_trend:

Upgini 175 Jan 08, 2023
Falken provides developers with a service that allows them to train AI that can play their games

Falken provides developers with a service that allows them to train AI that can play their games. Unlike traditional RL frameworks that learn through rewards or batches of offline training, Falken is

Google Research 223 Jan 03, 2023
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
scikit-multimodallearn is a Python package implementing algorithms multimodal data.

scikit-multimodallearn is a Python package implementing algorithms multimodal data. It is compatible with scikit-learn, a popul

12 Jun 29, 2022
A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

Dustin Tran 1.5k Dec 29, 2022