Tools for test driven data-wrangling and data validation.

Overview

datatest: Test driven data-wrangling and data validation

Apache 2.0 License Supported Python Versions Installation Requirements Development Repository Current Build Status Development Status Documentation (stable) Documentation (latest)

Datatest helps to speed up and formalize data-wrangling and data validation tasks. It implements a system of validation methods, difference classes, and acceptance managers. Datatest can help you:

  • Clean and wrangle data faster and more accurately.
  • Maintain a record of checks and decisions regarding important data sets.
  • Distinguish between ideal criteria and acceptible deviation.
  • Validate the input and output of data pipeline components.
  • Measure progress of data preparation tasks.
  • On-board new team members with an explicit and structured process.

Datatest can be used directly in your own projects or as part of a testing framework like pytest or unittest. It has no hard dependencies; it's tested on Python 2.6, 2.7, 3.2 through 3.10, PyPy, and PyPy3; and is freely available under the Apache License, version 2.

Documentation:
Official:

Code Examples

Validating a Dictionary of Lists

from datatest import validate, accepted, Invalid


data = {
    'A': [1, 2, 3, 4],
    'B': ['x', 'y', 'x', 'x'],
    'C': ['foo', 'bar', 'baz', 'EMPTY']
}

validate(data.keys(), {'A', 'B', 'C'})

validate(data['A'], int)

validate(data['B'], {'x', 'y'})

with accepted(Invalid('EMPTY')):
    validate(data['C'], str.islower)

Validating a Pandas DataFrame

import pandas as pd
from datatest import register_accessors, accepted, Invalid


register_accessors()
df = pd.read_csv('data.csv')

df.columns.validate({'A', 'B', 'C'})

df['A'].validate(int)

df['B'].validate({'x', 'y'})

with accepted(Invalid('EMPTY')):
    df['C'].validate(str.islower)

Installation

The easiest way to install datatest is to use pip:

pip install datatest

If you are upgrading from version 0.11.0 or newer, use the --upgrade option:

pip install --upgrade datatest

Upgrading From Version 0.9.6

If you have an existing codebase of older datatest scripts, you should upgrade using the following steps:

  • Install datatest 0.10.0 first:

    pip install --force-reinstall datatest==0.10.0
  • Run your existing code and check for DeprecationWarnings.

  • Update the parts of your code that use deprecated features.

  • Once your code is running without DeprecationWarnings, install the latest version of datatest:

    pip install --upgrade datatest

Stuntman Mike

If you need bug-fixes or features that are not available in the current stable release, you can "pip install" the development version directly from GitHub:

pip install --upgrade https://github.com/shawnbrown/datatest/archive/master.zip

All of the usual caveats for a development install should apply---only use this version if you can risk some instability or if you know exactly what you're doing. While care is taken to never break the build, it can happen.

Safety-first Clyde

If you need to review and test packages before installing, you can install datatest manually.

Download the latest source distribution from the Python Package Index (PyPI):

https://pypi.org/project/datatest/#files

Unpack the file (replacing X.Y.Z with the appropriate version number) and review the source code:

tar xvfz datatest-X.Y.Z.tar.gz

Change to the unpacked directory and run the tests:

cd datatest-X.Y.Z
python setup.py test

Don't worry if some of the tests are skipped. Tests for optional data sources (like pandas DataFrames or NumPy arrays) are skipped when the related third-party packages are not installed.

If the source code and test results are satisfactory, install the package:

python setup.py install

Supported Versions

Tested on Python 2.6, 2.7, 3.2 through 3.10, PyPy, and PyPy3. Datatest is pure Python and may also run on other implementations as well (check using "setup.py test" before installing).

Backward Compatibility

If you have existing tests that use API features which have changed since 0.9.0, you can still run your old code by adding the following import to the beginning of each file:

from datatest.__past__ import api09

To maintain existing test code, this project makes a best-effort attempt to provide backward compatibility support for older features. The API will be improved in the future but only in measured and sustainable ways.

All of the data used at the National Committee for an Effective Congress has been checked with datatest for several years so there is, already, a large and growing codebase that relies on current features and must be maintained into the future.

Soft Dependencies

Datatest has no hard, third-party dependencies. But if you want to interface with pandas DataFrames, NumPy arrays, or other optional data sources, you will need to install the relevant packages (pandas, numpy, etc.).

Development Repository

The development repository for datatest is hosted on GitHub.


Freely licensed under the Apache License, Version 2.0

Copyright 2014 - 2021 National Committee for an Effective Congress, et al.

Comments
  • validation errors Extra(nan) or Invalid(nan)

    validation errors Extra(nan) or Invalid(nan)

    Shaun, I am trying your package to see if I can validate a csv file by reading it in pandas. I am getting Extra(nan) dt.validate.superset() or Invalid(nan) dt.validate() . Is there a way I can include those nan in my validation sets?

    Error looks like

    E     ValidationError: may contain only elements of given superset (10000 differences): [
                Extra(nan),
                Extra(nan),
                Extra(nan),
    

    Note: I am reading this particular column as str

    E       ValidationError: does not satisfy 'str' (10000 differences): [
                Invalid(nan),
                Invalid(nan),
                Invalid(nan),
                Invalid(nan),
    

    Let me know if you find a solution or can help me debug

    opened by upretip 5
  • Crashes pytest-xdist processes (NOTE: See comments for fix.)

    Crashes pytest-xdist processes (NOTE: See comments for fix.)

    Hi, all! I've got some problem, when start my tests with pytest-xdist

    MacOS(Also check in debian) python 3.8.2

    pytest==5.4.3 pytest-xdist==1.33.0 datatest==0.9.6

    from datatest import accepted, Extra, validate as __validate
    
    
    def test_should_passed():
        with accepted(Extra):
            __validate({"qwe": 1}, {"qwe": 1}, "")
    
    
    def test_should_failed():
        with accepted(Extra):
            __validate({"qwe": 1}, {"qwe": 2}, "")
    
    
    if __name__ == '__main__':
        import sys, pytest
        sys.exit(pytest.main(['/Users/qa/PycharmProjects/qa/test123.py', '-vvv', '-n', '1', '-s']))
    

    Output:

    test123.py::test_should_passed 
    [gw0] PASSED test123.py::test_should_passed 
    test123.py::test_should_failed !!!!!!!!!!!!!!!!!!!! <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
    
    INTERNALERROR> Traceback (most recent call last):
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/workermanage.py", line 334, in process_from_remote
    INTERNALERROR>     rep = self.config.hook.pytest_report_from_serializable(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
    INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
    INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
    INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
    INTERNALERROR>     return outcome.get_result()
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
    INTERNALERROR>     raise ex[1].with_traceback(ex[2])
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
    INTERNALERROR>     res = hook_impl.function(*args)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 355, in pytest_report_from_serializable
    INTERNALERROR>     return TestReport._from_json(data)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 193, in _from_json
    INTERNALERROR>     kwargs = _report_kwargs_from_json(reportdict)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 485, in _report_kwargs_from_json
    INTERNALERROR>     reprtraceback = deserialize_repr_traceback(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 468, in deserialize_repr_traceback
    INTERNALERROR>     repr_traceback_dict["reprentries"] = [
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 469, in <listcomp>
    INTERNALERROR>     deserialize_repr_entry(x) for x in repr_traceback_dict["reprentries"]
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 464, in deserialize_repr_entry
    INTERNALERROR>     _report_unserialization_failure(entry_type, TestReport, reportdict)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 206, in _report_unserialization_failure
    INTERNALERROR>     raise RuntimeError(stream.getvalue())
    INTERNALERROR> RuntimeError: '----------------------------------------------------------------------------------------------------'
    INTERNALERROR> 'INTERNALERROR: Unknown entry type returned: DatatestReprEntry'
    INTERNALERROR> "report_name: <class '_pytest.reports.TestReport'>"
    INTERNALERROR> {'$report_type': 'TestReport',
    INTERNALERROR>  'duration': 0.002020120620727539,
    INTERNALERROR>  'item_index': 1,
    INTERNALERROR>  'keywords': {'qa': 1, 'test123.py': 1, 'test_should_failed': 1},
    INTERNALERROR>  'location': ('test123.py', 8, 'test_should_failed'),
    INTERNALERROR>  'longrepr': {'chain': [({'extraline': None,
    INTERNALERROR>                           'reprentries': [{'data': {'lines': ['    def '
    INTERNALERROR>                                                               'test_should_failed():',
    INTERNALERROR>                                                               '        with '
    INTERNALERROR>                                                               'accepted(Extra):',
    INTERNALERROR>                                                               '>           '
    INTERNALERROR>                                                               '__validate({"qwe": '
    INTERNALERROR>                                                               '1}, {"qwe": 2}, '
    INTERNALERROR>                                                               '"")',
    INTERNALERROR>                                                               'E           '
    INTERNALERROR>                                                               'datatest.ValidationError: '
    INTERNALERROR>                                                               'does not '
    INTERNALERROR>                                                               'satisfy 2 (1 '
    INTERNALERROR>                                                               'difference): {',
    INTERNALERROR>                                                               'E               '
    INTERNALERROR>                                                               "'qwe': "
    INTERNALERROR>                                                               'Deviation(-1, '
    INTERNALERROR>                                                               '2),',
    INTERNALERROR>                                                               'E           }'],
    INTERNALERROR>                                                     'reprfileloc': {'lineno': 11,
    INTERNALERROR>                                                                     'message': 'ValidationError',
    INTERNALERROR>                                                                     'path': 'test123.py'},
    INTERNALERROR>                                                     'reprfuncargs': {'args': []},
    INTERNALERROR>                                                     'reprlocals': None,
    INTERNALERROR>                                                     'style': 'long'},
    INTERNALERROR>                                            'type': 'DatatestReprEntry'}],
    INTERNALERROR>                           'style': 'long'},
    INTERNALERROR>                          {'lineno': 11,
    INTERNALERROR>                           'message': 'datatest.ValidationError: does not '
    INTERNALERROR>                                      'satisfy 2 (1 difference): {\n'
    INTERNALERROR>                                      "    'qwe': Deviation(-1, 2),\n"
    INTERNALERROR>                                      '}',
    INTERNALERROR>                           'path': '/Users/qa/PycharmProjects/qa/test123.py'},
    INTERNALERROR>                          None)],
    INTERNALERROR>               'reprcrash': {'lineno': 11,
    INTERNALERROR>                             'message': 'datatest.ValidationError: does not '
    INTERNALERROR>                                        'satisfy 2 (1 difference): {\n'
    INTERNALERROR>                                        "    'qwe': Deviation(-1, 2),\n"
    INTERNALERROR>                                        '}',
    INTERNALERROR>                             'path': '/Users/qa/PycharmProjects/qa/test123.py'},
    INTERNALERROR>               'reprtraceback': {'extraline': None,
    INTERNALERROR>                                 'reprentries': [{'data': {'lines': ['    def '
    INTERNALERROR>                                                                     'test_should_failed():',
    INTERNALERROR>                                                                     '        '
    INTERNALERROR>                                                                     'with '
    INTERNALERROR>                                                                     'accepted(Extra):',
    INTERNALERROR>                                                                     '>           '
    INTERNALERROR>                                                                     '__validate({"qwe": '
    INTERNALERROR>                                                                     '1}, '
    INTERNALERROR>                                                                     '{"qwe": '
    INTERNALERROR>                                                                     '2}, "")',
    INTERNALERROR>                                                                     'E           '
    INTERNALERROR>                                                                     'datatest.ValidationError: '
    INTERNALERROR>                                                                     'does not '
    INTERNALERROR>                                                                     'satisfy 2 '
    INTERNALERROR>                                                                     '(1 '
    INTERNALERROR>                                                                     'difference): '
    INTERNALERROR>                                                                     '{',
    INTERNALERROR>                                                                     'E               '
    INTERNALERROR>                                                                     "'qwe': "
    INTERNALERROR>                                                                     'Deviation(-1, '
    INTERNALERROR>                                                                     '2),',
    INTERNALERROR>                                                                     'E           '
    INTERNALERROR>                                                                     '}'],
    INTERNALERROR>                                                           'reprfileloc': {'lineno': 11,
    INTERNALERROR>                                                                           'message': 'ValidationError',
    INTERNALERROR>                                                                           'path': 'test123.py'},
    INTERNALERROR>                                                           'reprfuncargs': {'args': []},
    INTERNALERROR>                                                           'reprlocals': None,
    INTERNALERROR>                                                           'style': 'long'},
    INTERNALERROR>                                                  'type': 'DatatestReprEntry'}],
    INTERNALERROR>                                 'style': 'long'},
    INTERNALERROR>               'sections': []},
    INTERNALERROR>  'nodeid': 'test123.py::test_should_failed',
    INTERNALERROR>  'outcome': 'failed',
    INTERNALERROR>  'sections': [],
    INTERNALERROR>  'testrun_uid': 'c913bf205a874a50a237dcf40d482d06',
    INTERNALERROR>  'user_properties': [],
    INTERNALERROR>  'when': 'call',
    INTERNALERROR>  'worker_id': 'gw0'}
    INTERNALERROR> 'Please report this bug at https://github.com/pytest-dev/pytest/issues'
    INTERNALERROR> '----------------------------------------------------------------------------------------------------'
    [gw0] node down: <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
    [gw0] FAILED test123.py::test_should_failed 
    
    replacing crashed worker gw0
    [gw1] darwin Python 3.8.3 cwd: /Users/qa/PycharmProjects/qa
    INTERNALERROR> Traceback (most recent call last):
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 191, in wrap_session
    INTERNALERROR>     session.exitstatus = doit(config, session) or 0
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 247, in _main
    INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
    INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
    INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
    INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
    INTERNALERROR>     return outcome.get_result()
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
    INTERNALERROR>     raise ex[1].with_traceback(ex[2])
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
    INTERNALERROR>     res = hook_impl.function(*args)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
    INTERNALERROR>     self.loop_once()
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 135, in loop_once
    INTERNALERROR>     call(**kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 263, in worker_runtest_protocol_complete
    INTERNALERROR>     self.sched.mark_test_complete(node, item_index, duration)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/scheduler/load.py", line 151, in mark_test_complete
    INTERNALERROR>     self.node2pending[node].remove(item_index)
    INTERNALERROR> KeyError: <WorkerController gw0>
    

    But if I change second test like this, all works fine:

    def test_should_failed():
        try:
            with accepted(Extra):
                __validate({"qwe": 1}, {"qwe": 2}, "")
        except:
            raise ValueError
    

    I don't know exactly where i should create bug\issue about this :)

    bug 
    opened by VasilyevAA 3
  • AcceptedExtra not working as expected with dicts

    AcceptedExtra not working as expected with dicts

    I expected with AcceptedExtra(): to ignore missing keys in dicts, but instead it raises a Deviation from None.

    Here is an example:

    actual = {'a': 1, 'b': 2}
    expected = {'b': 2}
    with AcceptedExtra():
        validate(actual, requirement=expected)
    

    The output is:

    E           ValidationError: does not satisfy mapping requirements (1 difference): {
                    'a': Deviation(+1, None),
                }
    

    Thanks for the cool package, by the way!

    opened by TheOtherDude 3
  • Add pytest framework trove classifier

    Add pytest framework trove classifier

    Adding the trove classifier will signal that datatest also acts as a pytest plugin. This will also help https://plugincompat.herokuapp.com to find it and list it as a plugin and do regular installation checks.

    For further details see this recently merged PR from hypothesis: https://github.com/HypothesisWorks/hypothesis/pull/1306

    opened by obestwalter 3
  • Magic Reduction

    Magic Reduction

    Issue #7 exposes the degree of magic that is currently present in the DataTestCase methods. Removing (or at least reducing) magic where possible would make the behavior easier to understand and explain.

    In cases where small amounts of magic are useful, methods should be renamed to better reflect what's happening.

    Illustrating the Problem

    This "magic" version:

    def test_active(self):
        self.assertDataSet('active', {'Y', 'N'})
    

    ...is roughly equivalent to:

    def test_active(self):
        subject = self.subject.set('active')
        self.assertEqual(subject, {'Y', 'N'})
    

    The magic version requires detailed knowledge about the method before a newcomer can guess what's happening. The later example is more explicit and easier to reason about.

    Having said this, the magic versions of DataTestCase's methods can save a lot of typing. So what I plan to do is:

    1. Fully implement assertEqual() integration (see issue #7) as well as other standard unittest methods (assertGreater(), etc.).
    2. Rename the existing methods to clearly denote that they run on the subject data (e.g., assertDataSum()assertSubjectSum(), etc.).
    enhancement 
    opened by shawnbrown 3
  • Unique Method

    Unique Method

    Hey Shawn - one of the problems you were speaking about at PyCon 2016 was looking to guarantee that all integers in a list were unique, in an efficient way for large sets of data?

    enhancement 
    opened by RyPeck 3
  • Fix syntax of `python_requires`

    Fix syntax of `python_requires`

    >=2.6.* isn't valid syntax for python_requires(see PEP 440).

    This was causing an alpha release of Poetry to fail to install this package. I think they're going to fix it in future releases, but regardless it'd be helpful if this syntax was fixed.

    opened by ajhynes7 2
  • pytest_runtest_makereport crashes on test exceptions

    pytest_runtest_makereport crashes on test exceptions

    If an exception is thrown within a test that uses the test_db_engine fixture, the pytest_runtest_makereport function crashes. The reason is that it uses Node's deprecated get_marker function, instead of the new get_closest_marker function. See details about this change in pytest here: https://docs.pytest.org/en/latest/mark.html#updating-code

    opened by avshalomt2 2
  • Explore ways to optimize validation and allowance flow.

    Explore ways to optimize validation and allowance flow.

    Once major pieces are in place, explore ways of optimizing the validation/allowance process. Look to implement the following possible improvements:

    • Use lazy evaluation in validate and assertion functions by returning generators instead of fully calculated containers.
    • Create optimized _validate...() functions for faster testing (short-circuit evaluation and Boolean return values) rather than using _compare...() functions in all cases.
    opened by shawnbrown 2
  • Squint objects not handled properly when used as requirements.

    Squint objects not handled properly when used as requirements.

    Squint objects are not being evaluated properly by datatest.validate() function:

    import datatest
    import squint
    
    # Create a Select object.
    select = squint.Select([['A', 'B'], ['x', 1], ['y', 2], ['z', 3]])
    
    # Compare data to itself--passes as expected.
    datatest.validate(
    	select({'A': {'B'}}),
    	select({'A': {'B'}}).fetch(),  # <- Shouldn't be necessary.
    )
    
    # Compare data to itself--fails, unexpectedly.
    datatest.validate(
    	select({'A': {'B'}}),
    	select({'A': {'B'}}),  # <- Not properly handled!
    )
    

    In the code above, the second call to datatest.validate() should pass but, instead, fails with the following message:

    Traceback (most recent call last):
      File "<input>", line 3, in <module>
    	select({'A': {'B'}}),  # <- Not properly handled!
      File "~/datatest-project/datatest/validation.py", line 291, in __call__
    	raise err
    datatest.ValidationError: does not satisfy mapping requirements (3 differences): {
    	'x': [Invalid(1)],
    	'y': [Invalid(2)],
    	'z': [Invalid(3)],
    }
    
    bug 
    opened by shawnbrown 1
  • Selector.load_data() silently fails on missing file.

    Selector.load_data() silently fails on missing file.

    The following should raise an error:

    >>> import datatest
    >>> select = datatest.Selector()
    >>> select = select.load_data('nonexistent_file.csv')
    
    bug 
    opened by shawnbrown 1
  • How to validate Pandas data type

    How to validate Pandas data type "Int64"?

    Pandas recently introduced IntegerArrays which allow integer types to also store a NaN-like value pandas.NA.

    Is there a way to use datatest to validate that a pandas.DataFrame's column is of type Int64, i.e. all values are of that type.

    I tried df["mycolumn"].validate(pd.arrays.IntegerArray) and df["mycolumn"].validate(pd.Int64Dtype) to no avail.

    opened by PanCakeConnaisseur 0
  • Understanding Pandas validation

    Understanding Pandas validation

    Hello, apologies if this is the wrong place to ask this question.

    I am stumped on how datatest's validation mechanism is passing the following example:

    dt.validate(pd.DataFrame(), pd.DataFrame({"A": [1]})
    

    The documentation states:

    For validation, DataFrame objects using the default index type are treated as sequences.

    Shouldn't I be getting the same result as dt.validate([], [1])? What am I missing?

    opened by schlich 1
  • Improve existing or create another Deviation-like difference

    Improve existing or create another Deviation-like difference

    Hello @shawnbrown It would be nice to also show actual value along with deviation and expected value. It would also be nice to be able to see the percentage deviation along with the absolute deviation. Thanks!

    opened by a-chernov 0
  • Improve error message for @working_directory decorator

    Improve error message for @working_directory decorator

    If working_directory() is used as a decorator but the developer forgets to call it with a path, the error message can be confusing because the function is passed in implicitly (via decorator handling):

    >>> from datatest import working_directory
    >>>
    >>> @working_directory
    >>> def foo():
    >>>     return True
    ...
    TypeError: stat: path should be string, bytes, os.PathLike or integer, not function
    

    This misuse is easily detectable in the code and it would be good to improve the error message to help users understand their mistake.

    opened by shawnbrown 0
  • NaT issue

    NaT issue

    Greetings, @shawnbrown

    to be short,

    my pd.Series is like: Date 0 NaT 1 NaT 2 NaT 3 2010-12-31 4 2010-12-31 Name: Date, dtype: datetime64[ns] the type of NaT is: <class 'pandas._libs.tslibs.nattype.NaTType'> when I use the following code:

    with accepted(Extra(pd.NaT)): validate(data, requirement)

    I found that it the NaTs can not be recognized. I tried many types of Extra and tried using function but all faild.

    here I need your help. Thanks for your work.

    opened by Belightar 5
  • Investigate Support for DataFrame-Protocol

    Investigate Support for DataFrame-Protocol

    Keep an eye on wesm/dataframe-protocol#1 and see if it makes sense to change datatest's normalization to support a DataFrame-protocol instead of Dataframes specifically.

    enhancement 
    opened by shawnbrown 0
Releases(0.11.1)
  • 0.11.1(Jan 4, 2021)

    • Fixed validation, predicate, and difference handling of non-comparable objects.
    • Fixed bug in normalization of Queries from squint package.
    • Changed failure output to improve error reporting with pandas accessors.
    • Changed predicate failure message to quote code objects using backticks.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Dec 18, 2020)

    • Removed deprecated decorators: skip(), skipIf(), skipUnless() (use unittest.skip(), etc. instead).
    • Removed deprecated aliases Selector and ProxyGroup.
    • Removed the long-deprecated allowed interface.
    • Removed deprecated acceptances: "specific", "limit", etc.
    • Removed deprecated Select, Query, and Result API. Use squint instead:
      • https://pypi.org/project/squint/
    • Removed deprecated get_reader() function. Use get-reader instead:
      • https://pypi.org/project/get-reader/
    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Dec 17, 2020)

    • Fixed bug where ValidationErrors were crashing pytest-xdist workers.

    • Added tighter Pandas integration using Pandas' extension API.

      After calling the new register_accessors() function, your existing DataFrame, Series, Index, and MultiIndex objects will have a validate() method that can be used instead of the validate() function:

      import padas as pd
      import datatest as dt
      
      dt.register_accessors()  # <- Activate Pandas integration.
      
      df = pd.DataFrame(...)
      df[['A', 'B']].validate((str, int))  # <- New accessor method.
      
    • Changed Pandas validation behavior:

      • DataFrame and Series: These objects are treated as sequences when they use a RangeIndex index (this is the default type assigned when no index is specified). And they are treated as dictionaries when they use an index of any other type--the index values become the dictionary keys.

      • Index and MultiIndex: These objects are treated as sequences.

    • Changed repr behavior of Deviation to make timedeltas more readable.

    • Added Predicate matching support for NumPy types np.character, np.integer, np.floating, and np.complexfloating.

    • Added improved NaN handling:

      • Added NaN support to accepted.keys(), accepted.args(), and validate.interval().
      • Improved existing NaN support for difference comparisons.
      • Added how-to documentation for NaN handling.
    • Added data handling support for squint.Select objects.

    • Added deprecation warnings for soon-to-be-removed functions and classes:

      • Added DeprecationWarning to get_reader function. This function is now available from the get-reader package on PyPI:

        https://pypi.org/project/get-reader/

      • Added DeprecationWarning to Select, Query, and Result classes. These classes will be deprecated in the next release but are now available from the squint package on PyPI:

        https://pypi.org/project/squint/

    • Changed validate.subset() and validate.superset() behavior:

      The semantics are now inverted. This behavior was flipped to more closely match user expectations. The previous semantics were used because they reflect the internal structure of datatest more precisely. But these are implementation details that and they are not as important as having a more intuitive API.

    • Added temporary a warning when using the new subset superset methods to alert users to the new behavior. This warning will be removed from future versions of datatest.

    • Added Python 3.9 and 3.10 testing and support.

    • Removed Python 3.1 testing and support. If you were still using this version of Python, please email me--this is a story I need to hear.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.6(Jun 3, 2019)

    • Changed acceptance API to make it both less verbose and more expressive:

      • Consolidated specific-instance and class-based acceptances into a single interface.

      • Added a new accepted.tolerance() method that subsumes the behavior of accepted.deviation() by supporting Missing and Extra quantities in addition to Deviation objects.

      • Deprecated old methods:

        Old SyntaxNew Syntax
        accepted.specific(...)accepted(...)
        accepted.missing()accepted(Missing)
        accepted.extra()accepted(Extra)
        NO EQUIVALENTaccepted(CustomDifferenceClass)
        accepted.deviation(...)accepted.tolerance(...)
        accepted.limit(...)accepted.count(...)
        NO EQUIVALENTaccepted.count(..., scope='group')

        Other methods--accepted.args(), accepted.keys(), etc.--remain unchanged.

    • Changed validation to generate Deviation objects for a broader definition of quantitative values (like datetime objects)--not just for subclasses of numbers.Number.

    • Changed handling for pandas.Series objects to treat them as sequences instead of mappings.

    • Added handling for DBAPI2 cursor objects to automatically unwrap single-value rows.

    • Removed acceptance classes from datatest namespace--these were inadvertently added in a previous version but were never part of the documented API. They can still be referenced via the acceptances module:

      from datatest.acceptances import ...

    Source code(tar.gz)
    Source code(zip)
  • 0.9.5(May 1, 2019)

    • Changed difference objects to make them hashable (can now be used as set members or as dict keys).
    • Added __slots__ to difference objects to reduce memory consumption.
    • Changed name of Selector class to Select (Selector now deprecated).
    • Changed language and class names from allowed and allowance to accepted and acceptance to bring datatest more inline with manufacturing and engineering terminology. The existing allowed API is now deprecated.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.4(Apr 21, 2019)

    • Added Python 3.8 testing and support.
    • Added new validate methods (moved from how-to recipes into core module):
      • Added approx() method to require for approximate numeric equality.
      • Added fuzzy() method to require strings by approximate match.
      • Added interval() method to require elements within a given interval.
      • Added set(), subset(), and superset() methods for explicit membership checking.
      • Added unique() method to require unique elements.
      • Added order() method to require elements by relative order.
    • Changed default sequence validation to check elements by index position rather than checking by relative order.
    • Added fuzzy-matching allowance to allow strings by approximate match.
    • Added Predicate class to formalize behavior--also provides inverse-matching with the inversion operator (~).
    • Added new methods to Query class:
      • Added unwrap() to remove single-element containers and return their unwrapped contents.
      • Added starmap() to unpack grouped arguments when applying a function to elements.
    • Fixed improper use of assert statements with appropriate conditional checks and error behavior.
    • Added requirement class hierarchy (using BaseRequirement). This gives users a cleaner way to implement custom validation behavior and makes the underlying codebase easier to maintain.
    • Changed name of ProxyGroup to RepeatingContainer.
    • Changed "How To" examples to use the new validation methods.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.3(Jan 29, 2019)

    • Changed bundled pytest plugin to version 0.1.3:
      • This update adds testing and support for latest versions of Pytest and Python (now tested using Pytest 3.3 to 4.1 and Python 2.7 to 3.7).
      • Changed handling for 'mandatory' marker to support older and newer Pytest versions.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.2(Aug 8, 2018)

    Improved data handling features and support for Python 3.7:

    • Changed Query class:
      • Added flatten() method to serialize dictionary results.
      • Added to_csv() method to quickly save results as a CSV file.
      • Changed reduce() method to accept initializer_factory as an optional argument.
      • Changed filter() method to support predicate matching.
    • Added True and False as predicates to support "truth value testing" on arbitrary objects (to match on truthy or falsy).
    • Added ProxyGroup class for performing the same operations on groups of objects at the same time (a common need when testing against reference data).
    • Changed Selector class keyword filtering to support predicate matching.
    • Added handling to get_reader() to support datatest's Selector and Result objects.
    • Fixed get_reader() bug that prevented encoding-fallback recovery when reading from StringIO buffers in Python 2.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(Jun 22, 2018)

    • Added impoved docstrings and other documentation.
    • Changed bundled pytest plugin to version 0.1.2:
      • Added handling for a mandatory marker to support incremental testing (stops session early when a mandatory test fails).
      • Added --ignore-mandatory option to continue tests even when a mandatory test fails.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(Apr 29, 2018)

    • Added bundled version pytest plugin to base installation.
    • Added universal composability for all allowances (using UNION and INTERSECTION via "|" and "&" operators).
    • Added allowed factory class to simplify allowance imports.
    • Changed is_valid() to valid().
    • Changed ValidationError to display differences in sorted order.
    • Added Python 2 and 3 compatible get_reader() to quickly load csv.reader-like interface for Unicode CSV, MS Excel, pandas.DataFrame, DBF, etc.
    • Added formal order of operations for allowance resolution.
    • Added formal predicate object handling.
    • Added Sphinx-tabs style docs for clear separation of pytest and unittest style examples.
    • Changed DataSource to Selector, DataQuery to Query, and DataResult to Result.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.3(Nov 26, 2017)

    • New module-level functions: validate() and is_valid().
    • DataQuery selections now default to a list type when no outer-container is specified.
    • New DataQuery.apply() method for group-wise function application.
    • DataSource.fieldnames attribute is now a tuple (was a list).
    • The ValidationError repr now prints a trailing comma with the last item (for ease of copy-and-paste work flow).
    • Revised sequence validation behavior provides more precise differences.
    • New truncation support for ValidationErrors with long lists of differences.
    • Excess differences in allowed_specific() definitions no longer trigger test failures.
    • New support for user-defined functions to narrow DataSource selections.
    • Better traceback hiding for pytest.
    • Fix bug in DataQuery.map() method--now converts set types into lists.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Jun 11, 2017)

    • Implement Boolean composition for allowed_specific() context manager.
    • Add proper __repr__() support to DataSource and DataQuery.
    • Make sure DataQuery fails early if bad "select" syntax is used or if unknown columns are selected.
    • Add __copy__() method to DataQuery.
    • Change parent class of differences so they no longer inherit from Exception (this confused their intended use).
    • Restructure documentation for ease of reference.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(May 31, 2017)

    • Updated DataQuery select behavior to fail immediately when invalid syntax is used (rather than later when attempting to execute the query).
    • Improved error messages to better explain what went wrong.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(May 31, 2017)

    • Replaces old assertion methods with a single, smarter assertValid() method.
    • DataQuery implements query optimization and uses a simpler and more expressive syntax.
    • Allowances and errors have been reworked to be more expressive.
    • Allowances are now composeable with bit-wise "&" and "|" operators.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0.dev2(Aug 3, 2016)

    • Removes some of the internal magic and renames data assertions to more clearly indicate their intended use.
    • Restructures data allowances to provide more consistent parameters and more flexible usage.
    • Adds new method to assert unique values.
    • Adds full **fmtparams support for CSV handling.
    • Fixes comparison and allowance behavior for None vs. zero.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0.dev1(May 29, 2016)

Pytest modified env

Pytest plugin to fail a test if it leaves modified os.environ afterwards.

wemake.services 7 Sep 11, 2022
A complete test automation tool

Golem - Test Automation Golem is a test framework and a complete tool for browser automation. Tests can be written with code in Python, codeless using

486 Dec 30, 2022
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application

Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application Crawling a server-side-rendered web application is a

70 Aug 07, 2022
A twitter bot that simply replies with a beautiful screenshot of the tweet, powered by poet.so

Poet this! Replies with a beautiful screenshot of the tweet, powered by poet.so Installation git clone https://github.com/dhravya/poet-this.git cd po

Dhravya Shah 30 Dec 04, 2022
Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics.

LoLalytics-scraper Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics. Começando por um único script com

Kevin Souza 1 Feb 19, 2022
PacketPy is an open-source solution for stress testing network devices using different testing methods

PacketPy About PacketPy is an open-source solution for stress testing network devices using different testing methods. Currently, there are only two c

4 Sep 22, 2022
Cloint India Pvt. Ltd's (ClointFusion) Pythonic RPA (Automation) Platform

Welcome to , Made in India with ❤️ Description Cloint India Pvt. Ltd - Python functions for Robotic Process Automation shortly RPA. What is ClointFusi

Cloint India Pvt. Ltd 31 Apr 12, 2022
Testinfra test your infrastructures

Testinfra test your infrastructure Latest documentation: https://testinfra.readthedocs.io/en/latest About With Testinfra you can write unit tests in P

pytest-dev 2.1k Jan 07, 2023
Django test runner using nose

django-nose django-nose provides all the goodness of nose in your Django tests, like: Testing just your apps by default, not all the standard ones tha

Jazzband 880 Dec 15, 2022
AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

6 Jan 28, 2022
0hh1 solver for the web (selenium) and also for mobile (adb)

0hh1 - Solver Aims to solve the '0hh1 puzzle' for all the sizes (4x4, 6x6, 8x8, 10x10 12x12). for both the web version (using selenium) and on android

Adwaith Rajesh 1 Nov 05, 2021
This is a bot that can type without any assistance and have incredible speed.

BulldozerType This is a bot that can type without any assistance and have incredible speed. This bot currently only works on the site https://onlinety

1 Jan 03, 2022
Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced

Instagram-Unfollower-Bot Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced.

Biswarup Bhattacharjee 1 Dec 24, 2021
XSSearch - A comprehensive reflected XSS tool built on selenium framework in python

XSSearch A Comprehensive Reflected XSS Scanner XSSearch is a comprehensive refle

Sathyaprakash Sahoo 49 Oct 18, 2022
Repository for JIDA SNP Browser Web Application: Local Deployment

JIDA JIDA is a web application that retrieves SNP information for a genomic region of interest in Homo sapiens and calculates specific summary statist

3 Mar 03, 2022
A testing system for catching visual regressions in Web applications.

Huxley Watches you browse, takes screenshots, tells you when they change Huxley is a test-like system for catching visual regressions in Web applicati

Facebook Archive 4.1k Nov 30, 2022
Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.

Mimesis - Fake Data Generator Description Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes

Isaak Uchakaev 3.8k Dec 29, 2022
Fidelipy - Semi-automated trading on fidelity.com

fidelipy fidelipy is a simple Python 3.7+ library for semi-automated trading on fidelity.com. The scope is limited to the Trade Stocks/ETFs simplified

Darik Harter 8 May 10, 2022
Python Projects - Few Python projects with Testing using Pytest

Python_Projects Few Python projects : Fast_API_Docker_PyTest- Just a simple auto

Tal Mogendorff 1 Jan 22, 2022
Sixpack is a language-agnostic a/b-testing framework

Sixpack Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries

1.7k Dec 24, 2022