Visions provides an extensible suite of tools to support common data analysis operations

Overview

Visions

JossPaper PyPiDownloadsBadge PyPiDownloadsMonthlyBadge PyPiVersionBadge PythonBadge BinderBadge

And these visions of data types, they kept us up past the dawn.

Visions provides an extensible suite of tools to support common data analysis operations including

  • type inference on unknown data
  • casting data types
  • automated data summarization

https://github.com/dylan-profiler/visions/raw/develop/docsrc/source/_static/side-by-side.png

Documentation

Full documentation can be found here.

Installation

You can install visions via pip:

pip install visions

Alternatives and more details can be found in the documentation.

Supported frameworks

These frameworks are supported out-of-the-box in addition to native Python types:

https://github.com/dylan-profiler/visions/raw/develop/docsrc/source/_static/frameworks.png

  • Numpy
  • Pandas
  • Spark

Contributing and support

Contributions to visions are welcome. For more information, please visit the Community contributions page. The the Github issues tracker is used for reporting bugs, feature requests and support questions.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

https://github.com/dylan-profiler/visions/raw/master/SIDNfonds.png

Comments
  • Numpy backend

    Numpy backend

    Summary

    This PR adds complete numpy backend support for the StandardSet of types. The type implementation is fully compatible with the pandas equivalent implementations with the exception of the object type.

    Caveats

    Whereas pandas provides support for Optional[int] and Optional[bool] numpy doesn't - in order to support those types by default I was forced to make object completely disjoint from any other concrete type. A similar story plays out for datetime objects with timezones which also default to object in numpy.

    opened by ieaves 8
  • API and Usage

    API and Usage

    (This is part of the review in openjournals/joss-reviews#2145)

    Hi @sbrugman,

    I am currently going through the package and I found it a very interesting project. The type inference that already exists in built-in Python is frequently not enough, and I often find myself writing my own functions for it on a case-by-case basis. So, it is nice to see that this is being done.

    However, I do find myself having quite a bit of trouble using the package effectively. Funnily enough, this is mostly caused by the (in my opinion) confusing naming schemes and the structure of the namespace. It may require a bit of effort to solve (not to mention that it might create incompatibilities with previous versions), but I believe that it will greatly improve the user experience when fixed.

    So, the main problem in my opinion is that almost all definitions are stored in their own subpackages in the visions.core subpackage, often separated as well over an implementations and a model subpackage. This means that in order to access a definition, let's say VisionsTypeset, I have to import this from visions.core.model.typeset, while pre-defined typesets I have to import from visions.core.implementations.typesets. In my opinion, it is incredibly confusing that these are stored in different subpackages/submodules, as they are related to the same thing, namely typesets, and I would expect to find all of these definitions in a visions.typesets subpackage. Preferably, all definitions the average user would use, should be available either at root (visions) or a single level deep (visions.xxx).

    I also noticed that almost all definitions have the word visions in their name. I get the feeling that the reason for this is to avoid namespace clashes when someone uses a wildcard import (from visions import *). However, as wildcard imports are heavily discouraged in the community, this leads to the user writing the word visions at least twice for every definition used (for example, using the visions_integer type requires me to write visions.visions_integer, which could be simplified to visions.integer or even visions.int).

    Finally, I am not entirely sure if this has to do with the online documentation being outdated as mentioned in #21, but according to the example here, a visions_standard_set object has a method called detect_type_series. In v0.2.3, this object neither has a method called detect_type_series nor type_detect_series (the name that the stand-alone function has in visions.core.functional), but instead it is called detect_series_type. If possible, could you check and make sure that the methods and stand-alone functions use consistent naming schemes?

    Please let me know if you have any questions.

    enhancement 
    opened by 1313e 7
  • Recommended stack overflow tag for questions

    Recommended stack overflow tag for questions

    I have a question about how to use the library. I considered opening an issue, but I see in the documentation that you recommend asking questions about how to use the package on Stack Overflow. Is there a tag that you'd suggest people use when asking questions there? I don't see anything with visions as a tag, but maybe I'm just the first person to ask a question over there.

    If you think visions would be a good tag choice it would make sense to update the stack overflow ask a question link to pre-populate the question with the tag (https://stackoverflow.com/questions/ask?tags=visions).

    Thanks!

    enhancement 
    opened by sterlinm 6
  • Automate building of documentation

    Automate building of documentation

    The documentation should rebuilt at every merge. The differences caused by the documentation convolute code reviews. The steps to build the documentation are simple and can be automated.

    Suggested solution via Github Actions (e.g.): https://github.com/marketplace/actions/sphinx-build https://github.com/ammaraskar/sphinx-action-test/blob/master/.github/workflows/default.yml

    enhancement 
    opened by sbrugman 6
  • Please push an updated version to pypi to correct dependency on attrs not attr

    Please push an updated version to pypi to correct dependency on attrs not attr

    Describe the bug visions uses the @attr.s decorator which comes from the attrs module, not the attr module. The master version of visions has the correct dependency, but the pypi versions do not.

    Additional context When using pandas_profiling which depends on visions, I got the following error:

    AttributeError: module 'attr' has no attribute 's'
    

    which led me to post this issue.

    bug 
    opened by proinsias 5
  • Version 0.7.5

    Version 0.7.5

    0.7.5 Includes:

    • Fixes to numpy backend for complex, object, email_address, URL, boolean
    • Support for new versions of pandas ABCIndex class (previously called ABCIndexClass)
    • Updated tests for numpy backend
    • Automated Github Actions unit tests on PR
    • Additional documentation
    opened by ieaves 4
  • fail to pass the test with 0.6.1 release

    fail to pass the test with 0.6.1 release

    Describe the bug The tests failed with 0.6.1 release. To Reproduce Steps to reproduce the behavior:

    python setup.py build
    PYTHONPATH=build/lib pytest -v
    

    Expected behavior pass all tests

    Additional context error log:

    =================================== FAILURES ===================================
    _____________________ test_contains[file_mixed_ext x File] _____________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: file_mixed_ext in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _______________________ test_contains[image_png x File] ________________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _______________________ test_contains[image_png x Image] _______________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Image, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png in Image; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    ___________________ test_contains[image_png_missing x File] ____________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = File, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png_missing in File; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    ___________________ test_contains[image_png_missing x Image] ___________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Image, member = True
    
        @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
        def test_contains(series, type, member):
            """Test the generated combinations for "series in type"
        
            Args:
                series: the series to test
                type: the type to test against
                member: the result
            """
            result, message = contains(series, type, member)
    >       assert result, message
    E       AssertionError: image_png_missing in Image; expected True, got False
    E       assert False
    
    tests/typesets/test_complete_set.py:190: AssertionError
    _____________ test_inference[file_mixed_ext x File expected True] ______________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = File, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of file_mixed_ext expected File to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _____________ test_inference[file_mixed_ext x Path expected False] _____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: file_mixed_ext, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of file_mixed_ext expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _______________ test_inference[image_png x Image expected True] ________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Image, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png expected Image to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    _______________ test_inference[image_png x Path expected False] ________________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2    /build/python-visions/src/visions-0.6.1/build/...
    Name: image_png, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    ___________ test_inference[image_png_missing x Image expected True] ____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Image, typeset = CompleteSet, difference = False
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png_missing expected Image to be True (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    ___________ test_inference[image_png_missing x Path expected False] ____________
    
    series = 0    /build/python-visions/src/visions-0.6.1/build/...
    1    /build/python-visions/src/visions-0.6.1/build/...
    2       ...c/visions-0.6.1/build/...
    4                                                 None
    Name: image_png_missing, dtype: object
    type = Path, typeset = CompleteSet, difference = True
    
        @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
        def test_inference(series, type, typeset, difference):
            """Test the generated combinations for "inference(series) == type"
        
            Args:
                series: the series to test
                type: the type to test against
            """
            result, message = infers(series, type, typeset, difference)
    >       assert result, message
    E       AssertionError: inference of image_png_missing expected Path to be False (typeset=CompleteSet)
    E       assert False
    
    tests/typesets/test_complete_set.py:317: AssertionError
    =============================== warnings summary ===============================
    tests/test_root.py::test_multiple_roots
      /build/python-visions/src/visions-0.6.1/build/lib/visions/typesets/typeset.py:88: UserWarning: {Generic} were isolates in the type relation map and consequently orphaned. Please add some mapping to the orphaned nodes.
        warnings.warn(message)
    
    tests/test_summarization.py::test_complex_missing_summary
      /usr/lib/python3.8/site-packages/numpy/core/_methods.py:47: ComplexWarning: Casting complex values to real discards the imaginary part
        return umr_sum(a, axis, dtype, out, keepdims, initial, where)
    
    -- Docs: https://docs.pytest.org/en/stable/warnings.html
    =========================== short test summary info ============================
    FAILED tests/typesets/test_complete_set.py::test_contains[file_mixed_ext x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png x Image]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x File]
    FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x Image]
    FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x File expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x Path expected False]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Image expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Path expected False]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Image expected True]
    FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Path expected False]
    ================= 11 failed, 8954 passed, 2 warnings in 15.94s =================
    

    see also the complete build log here

    bug 
    opened by hubutui 4
  • No module named 'shapely'

    No module named 'shapely'

    (This is part of the review in openjournals/joss-reviews#2145)

    When I try to execute the example given here, I get an error stating No module named 'shapely'. I see that this is a dependency of visions, but is only listed in the requirements_test.txt. You probably have to add this requirement to the requirements.txt as well.

    PS: Currently, the requirements of the package are both listed in their own separate file and in the setup.py file. To avoid confusion for yourself, it is probably better to only use either. You can read in a requirements file and use it in the setup.py file by using:

    # Get the requirements list
    with open('requirements.txt', 'r') as f:
        requirements = f.read().splitlines()
    

    Keep in mind that it is possible to link different requirements files together. For example, you can link requirements.txt and requirements_dev.txt together by adding the line -r requirements.txt to the top of requirements_dev.txt. This means that installing the requirements of requirements_dev.txt will use both files. This however won't work if you parse the file in a setup.py file. In that case, you can simply read both files and append them together if necessary.

    opened by 1313e 4
  • Bump pyarrow from 1.0.1 to 5.0.0

    Bump pyarrow from 1.0.1 to 5.0.0

    Bumps pyarrow from 1.0.1 to 5.0.0.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 3
  • all nulls should be inferred as generic

    all nulls should be inferred as generic

    i don't think this should be expected behavior.

    In [41]: infer_type(pd.DataFrame({'x':['', '']}), StandardSet())
    Out[41]: {'x': DateTime}
    
    In [39]: infer_type(pd.DataFrame({'x':[None, None]}), StandardSet())
    Out[39]: {'x': Boolean}
    
    bug 
    opened by majidaldo 3
  • comma separator

    comma separator

    New functionality

    • comma separator handling for string digits
    • new utility functionality for working with missing values

    Major Proposed Changes

    • Integer should be a strict subset of Float
    opened by ieaves 3
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi dylan-profiler/visions!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • Sktime semantic data types for time series & vision

    Sktime semantic data types for time series & vision

    I've recently been made aware of this excellent and imo much needed library by @lmmentel.

    The reason is its similarity to the datatypes module of sktime, which introduces semantic typing for time series related data types - we distinguish "mtypes" (machine representations) and "scitypes" (scientific types, what visions calls semantic type). More details here as reference.

    Few questions for visions devs:

    • time series are known to be a notoriously splintered field in terms of data representation, and even more when it comes to learning tasks (as in your ML example). Do you see visions moving in the direction of typing for ML?
    • would you have time to look into the sktime datatypes module and assess how similar this is to visions? If similar, we might be tempted to take a dependency on visions and contribute. Key features are mtype conversions, scitype inference, checks that also return metadata (e.g., number of time stamps in a series, which can be represented 4 different ways)
    enhancement 
    opened by fkiraly 7
  • src/visions/types/url.py passes non URLs

    src/visions/types/url.py passes non URLs

    src/visions/types/url.py does not correctly validate URLs.

    First, the example code (lines 14--19) from the docs do not return True:

    Python 3.9.4 (default, Apr  9 2021, 09:32:38)
    [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import visions
    >>> from urllib.parse import urlparse
    >>> urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
    >>> x = [urlparse(url) for url in urls]
    >>> x in visions.URL
    False
    >>> x
    [ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment=''), ParseResult(scheme='https', netloc='github.com', path='/pandas-profiling/pandas-profiling', params='', query='', fragment='')]
    >>> import pkg_resources
    >>> pkg_resources.get_distribution("visions").version
    '0.7.4'
    

    Second, non URLs are passing:

    >>> urlparse('junk') in visions.URL
    True
    >>>
    

    The code should probably check something like the following for each element of x:

        try:
            result = urlparse(x)
            return all([result.scheme, result.netloc])
        except:
            return False
    

    Finally, and this is a suggested enhancement, I think the behavior would be more useful if it handled raw strings and did the parsing internally without the caller having to supply a parser:

    urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
    >>> urls in visions.URL
    True
    
    bug 
    opened by leapingllamas 3
  • How to check if a type is/is_not parent of another type ?

    How to check if a type is/is_not parent of another type ?

    Follow the example of "Problem type inference".

    graph

    From one dataframe, I already make a list of type for each column. Here is the type_list:

    [Discrete,
     Nominal,
     Discrete,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Nominal,
     Binary,
     Discrete,
     Discrete,
     Discrete,
     Nominal,
     Binary]
    

    type(type_list[0]) give visions.types.type.VisionsBaseTypeMeta

    Now, I want to check if each type either have parent type of Categorical or Numeric.

    for column, t in zip(column, type_list):
         if is_type_parent_of_categorical(t): 
                category_job(dataframe[column]) 
    
    # binary is child if Categorical
    is_type_parent_of_categorical(type_list[14]) -> True 
    
    # Discrete is child of Numeric 
    is_type_parent_of_categorical(type_list[0]) -> False 
    

    How should I implement is_type_parent_of_categorical ?

    My workaround seem to work because string comparision:

    def is_type_parent_of_categorical(visions_type):
            type_str = str(visions_type)
                if type_str in ["Categorical", "Ordinal", "Nominal", "Binary"]:
                    return True
                return False
    
    enhancement 
    opened by ttpro1995 2
  • function: 'lowest' common type

    function: 'lowest' common type

    Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.

    A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.

    enhancement 
    opened by majidaldo 2
Releases(v0.7.5)
  • v0.7.5(Dec 5, 2021)

  • v0.7.4(Sep 27, 2021)

  • v0.7.2(Sep 27, 2021)

  • v0.7.1(Feb 4, 2021)

  • v0.7.0(Jan 5, 2021)

  • v0.6.4(Oct 17, 2020)

    ENH: swifter apply for pandas backend FIX: fix for issue #147 ENH: __version__ attribute made available ENH: improved typing and CI ENH: contrib types/typesets for a low-threshold contribution of types

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Oct 11, 2020)

    ENH: Expose state using typeset.detect and typeset.infer ENH: plotting of typesets improved FIX: fix and test cases for #136 CLN: pre-commit with black, isort, pyupgrade, flake8 ENH: type relations are now accessible by type (e.g. Float.relations[Integer])

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Sep 22, 2020)

  • v0.5.1(Sep 22, 2020)

    • Introduce stateful type inference and casting
    • Expose test utils to users and fix diagnostic information
    • Integer consistency for the standard set
    • Use pd.BooleanDtype for newer versions of pandas
    • Latest black formatting
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Aug 16, 2020)

    API breaking changes:

    • migration to single dispatch on typeset methods
    • updated API to unify detect / infer / cast against Series and DataFrames
    • improvements to boolean type
    Source code(tar.gz)
    Source code(zip)
  • v0.4.6(Jul 28, 2020)

  • v0.4.5(Jul 28, 2020)

  • v0.4.4(May 11, 2020)

  • v0.4.3(May 10, 2020)

  • v0.4.2(May 10, 2020)

    Support for Files and Images, rewritten summarization functions

    • Renamed ExistingPath to File
    • Renamed ImagePath to Image
    • Version bump to 0.4.2
    • Summaries: return series instead of dict
    • Categorical: unicode counts now based on original character distribution instead of unique characters which are used as intermediate step for increased performance.
    • Categorical: aggregate functions are included for string length (min, max, mean, median).
    • Path: number of unique values for the path parts are returned
    • Image: make Exif and Hash calculations optional. Also return width, height and area.
    • File: in addition to the file_size, return creation, modification and access time (which were already returned).
    Source code(tar.gz)
    Source code(zip)
Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

tldextract Python Module tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and s

John Kurkowski 1.6k Jan 03, 2023
Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

Oracle DevRel 55 Oct 24, 2022
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
Bigdata Simulation Library Of Dream By Sandman Books

BIGDATA SIMULATION LIBRARY OF DREAM BY SANDMAN BOOKS ================= Solution Architecture Description In the realm of Dreaming, its ruler SANDMAN,

Maycon Cypriano 3 Jun 30, 2022
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

Daniel Chen 102 Nov 16, 2022
Gaussian processes in TensorFlow

Website | Documentation (release) | Documentation (develop) | Glossary Table of Contents What does GPflow do? Installation Getting Started with GPflow

GPflow 1.7k Jan 06, 2023
A program that uses an API and a AI model to get info of sotcks

Stock-Market-AI-Analysis I dont mind anyone using this code but please give me credit A program that uses an API and a AI model to get info of stocks

1 Dec 17, 2021
Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt Labs 6.3k Jan 08, 2023
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

SasiVatsal 8 Oct 18, 2022
A library to create multi-page Streamlit applications with ease.

A library to create multi-page Streamlit applications with ease.

Jackson Storm 107 Jan 04, 2023
A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Unnikrishnan 2 Dec 12, 2021
Ejercicios Panda usando Pandas

Readme Below we add configuration details to locally test your application To co

1 Jan 22, 2022