Visions provides an extensible suite of tools to support common data analysis operations

Last update: Dec 28, 2022

Overview

Visions

And these visions of data types, they kept us up past the dawn.

Visions provides an extensible suite of tools to support common data analysis operations including

type inference on unknown data
casting data types
automated data summarization

Documentation

Full documentation can be found here.

Installation

You can install visions via pip:

pip install visions

Alternatives and more details can be found in the documentation.

Supported frameworks

These frameworks are supported out-of-the-box in addition to native Python types:

Numpy
Pandas
Spark

Contributing and support

Contributions to visions are welcome. For more information, please visit the Community contributions page. The the Github issues tracker is used for reporting bugs, feature requests and support questions.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

Comments

Numpy backend

Summary

This PR adds complete numpy backend support for the StandardSet of types. The type implementation is fully compatible with the pandas equivalent implementations with the exception of the object type.

Caveats

Whereas pandas provides support for Optional[int] and Optional[bool] numpy doesn't - in order to support those types by default I was forced to make object completely disjoint from any other concrete type. A similar story plays out for datetime objects with timezones which also default to object in numpy.

opened by ieaves 8
API and Usage

(This is part of the review in openjournals/joss-reviews#2145)

Hi @sbrugman,

I am currently going through the package and I found it a very interesting project. The type inference that already exists in built-in Python is frequently not enough, and I often find myself writing my own functions for it on a case-by-case basis. So, it is nice to see that this is being done.

However, I do find myself having quite a bit of trouble using the package effectively. Funnily enough, this is mostly caused by the (in my opinion) confusing naming schemes and the structure of the namespace. It may require a bit of effort to solve (not to mention that it might create incompatibilities with previous versions), but I believe that it will greatly improve the user experience when fixed.

So, the main problem in my opinion is that almost all definitions are stored in their own subpackages in the visions.core subpackage, often separated as well over an implementations and a model subpackage. This means that in order to access a definition, let's say VisionsTypeset, I have to import this from visions.core.model.typeset, while pre-defined typesets I have to import from visions.core.implementations.typesets. In my opinion, it is incredibly confusing that these are stored in different subpackages/submodules, as they are related to the same thing, namely typesets, and I would expect to find all of these definitions in a visions.typesets subpackage. Preferably, all definitions the average user would use, should be available either at root (visions) or a single level deep (visions.xxx).

I also noticed that almost all definitions have the word visions in their name. I get the feeling that the reason for this is to avoid namespace clashes when someone uses a wildcard import (from visions import *). However, as wildcard imports are heavily discouraged in the community, this leads to the user writing the word visions at least twice for every definition used (for example, using the visions_integer type requires me to write visions.visions_integer, which could be simplified to visions.integer or even visions.int).

Finally, I am not entirely sure if this has to do with the online documentation being outdated as mentioned in #21, but according to the example here, a visions_standard_set object has a method called detect_type_series. In v0.2.3, this object neither has a method called detect_type_series nor type_detect_series (the name that the stand-alone function has in visions.core.functional), but instead it is called detect_series_type. If possible, could you check and make sure that the methods and stand-alone functions use consistent naming schemes?

Please let me know if you have any questions.
enhancement

opened by 1313e 7
Recommended stack overflow tag for questions

I have a question about how to use the library. I considered opening an issue, but I see in the documentation that you recommend asking questions about how to use the package on Stack Overflow. Is there a tag that you'd suggest people use when asking questions there? I don't see anything with visions as a tag, but maybe I'm just the first person to ask a question over there.

If you think visions would be a good tag choice it would make sense to update the stack overflow ask a question link to pre-populate the question with the tag (https://stackoverflow.com/questions/ask?tags=visions).

Thanks!
enhancement

opened by sterlinm 6
Automate building of documentation

The documentation should rebuilt at every merge. The differences caused by the documentation convolute code reviews. The steps to build the documentation are simple and can be automated.

Suggested solution via Github Actions (e.g.): https://github.com/marketplace/actions/sphinx-build https://github.com/ammaraskar/sphinx-action-test/blob/master/.github/workflows/default.yml
enhancement

opened by sbrugman 6
Please push an updated version to pypi to correct dependency on attrs not attr
Describe the bug visions uses the @attr.s decorator which comes from the attrs module, not the attr module. The master version of visions has the correct dependency, but the pypi versions do not.

Additional context When using pandas_profiling which depends on visions, I got the following error:

AttributeError: module 'attr' has no attribute 's'

which led me to post this issue.
bug
opened by proinsias 5
Version 0.7.5
0.7.5 Includes:

Fixes to numpy backend for complex, object, email_address, URL, boolean

Support for new versions of pandas ABCIndex class (previously called ABCIndexClass)

Updated tests for numpy backend

Automated Github Actions unit tests on PR

Additional documentation
opened by ieaves 4

fail to pass the test with 0.6.1 release

Describe the bug The tests failed with 0.6.1 release. To Reproduce Steps to reproduce the behavior:

python setup.py build
PYTHONPATH=build/lib pytest -v

Expected behavior pass all tests

Additional context error log:

=================================== FAILURES ===================================
_____________________ test_contains[file_mixed_ext x File] _____________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: file_mixed_ext, dtype: object
type = File, member = True

    @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
    def test_contains(series, type, member):
        """Test the generated combinations for "series in type"
    
        Args:
            series: the series to test
            type: the type to test against
            member: the result
        """
        result, message = contains(series, type, member)
>       assert result, message
E       AssertionError: file_mixed_ext in File; expected True, got False
E       assert False

tests/typesets/test_complete_set.py:190: AssertionError
_______________________ test_contains[image_png x File] ________________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: image_png, dtype: object
type = File, member = True

    @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
    def test_contains(series, type, member):
        """Test the generated combinations for "series in type"
    
        Args:
            series: the series to test
            type: the type to test against
            member: the result
        """
        result, message = contains(series, type, member)
>       assert result, message
E       AssertionError: image_png in File; expected True, got False
E       assert False

tests/typesets/test_complete_set.py:190: AssertionError
_______________________ test_contains[image_png x Image] _______________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: image_png, dtype: object
type = Image, member = True

    @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
    def test_contains(series, type, member):
        """Test the generated combinations for "series in type"
    
        Args:
            series: the series to test
            type: the type to test against
            member: the result
        """
        result, message = contains(series, type, member)
>       assert result, message
E       AssertionError: image_png in Image; expected True, got False
E       assert False

tests/typesets/test_complete_set.py:190: AssertionError
___________________ test_contains[image_png_missing x File] ____________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2       ...c/visions-0.6.1/build/...
4                                                 None
Name: image_png_missing, dtype: object
type = File, member = True

    @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
    def test_contains(series, type, member):
        """Test the generated combinations for "series in type"
    
        Args:
            series: the series to test
            type: the type to test against
            member: the result
        """
        result, message = contains(series, type, member)
>       assert result, message
E       AssertionError: image_png_missing in File; expected True, got False
E       assert False

tests/typesets/test_complete_set.py:190: AssertionError
___________________ test_contains[image_png_missing x Image] ___________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2       ...c/visions-0.6.1/build/...
4                                                 None
Name: image_png_missing, dtype: object
type = Image, member = True

    @pytest.mark.parametrize(**get_contains_cases(series, contains_map, typeset))
    def test_contains(series, type, member):
        """Test the generated combinations for "series in type"
    
        Args:
            series: the series to test
            type: the type to test against
            member: the result
        """
        result, message = contains(series, type, member)
>       assert result, message
E       AssertionError: image_png_missing in Image; expected True, got False
E       assert False

tests/typesets/test_complete_set.py:190: AssertionError
_____________ test_inference[file_mixed_ext x File expected True] ______________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: file_mixed_ext, dtype: object
type = File, typeset = CompleteSet, difference = False

    @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
    def test_inference(series, type, typeset, difference):
        """Test the generated combinations for "inference(series) == type"
    
        Args:
            series: the series to test
            type: the type to test against
        """
        result, message = infers(series, type, typeset, difference)
>       assert result, message
E       AssertionError: inference of file_mixed_ext expected File to be True (typeset=CompleteSet)
E       assert False

tests/typesets/test_complete_set.py:317: AssertionError
_____________ test_inference[file_mixed_ext x Path expected False] _____________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: file_mixed_ext, dtype: object
type = Path, typeset = CompleteSet, difference = True

    @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
    def test_inference(series, type, typeset, difference):
        """Test the generated combinations for "inference(series) == type"
    
        Args:
            series: the series to test
            type: the type to test against
        """
        result, message = infers(series, type, typeset, difference)
>       assert result, message
E       AssertionError: inference of file_mixed_ext expected Path to be False (typeset=CompleteSet)
E       assert False

tests/typesets/test_complete_set.py:317: AssertionError
_______________ test_inference[image_png x Image expected True] ________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: image_png, dtype: object
type = Image, typeset = CompleteSet, difference = False

    @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
    def test_inference(series, type, typeset, difference):
        """Test the generated combinations for "inference(series) == type"
    
        Args:
            series: the series to test
            type: the type to test against
        """
        result, message = infers(series, type, typeset, difference)
>       assert result, message
E       AssertionError: inference of image_png expected Image to be True (typeset=CompleteSet)
E       assert False

tests/typesets/test_complete_set.py:317: AssertionError
_______________ test_inference[image_png x Path expected False] ________________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2    /build/python-visions/src/visions-0.6.1/build/...
Name: image_png, dtype: object
type = Path, typeset = CompleteSet, difference = True

    @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
    def test_inference(series, type, typeset, difference):
        """Test the generated combinations for "inference(series) == type"
    
        Args:
            series: the series to test
            type: the type to test against
        """
        result, message = infers(series, type, typeset, difference)
>       assert result, message
E       AssertionError: inference of image_png expected Path to be False (typeset=CompleteSet)
E       assert False

tests/typesets/test_complete_set.py:317: AssertionError
___________ test_inference[image_png_missing x Image expected True] ____________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2       ...c/visions-0.6.1/build/...
4                                                 None
Name: image_png_missing, dtype: object
type = Image, typeset = CompleteSet, difference = False

    @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
    def test_inference(series, type, typeset, difference):
        """Test the generated combinations for "inference(series) == type"
    
        Args:
            series: the series to test
            type: the type to test against
        """
        result, message = infers(series, type, typeset, difference)
>       assert result, message
E       AssertionError: inference of image_png_missing expected Image to be True (typeset=CompleteSet)
E       assert False

tests/typesets/test_complete_set.py:317: AssertionError
___________ test_inference[image_png_missing x Path expected False] ____________

series = 0    /build/python-visions/src/visions-0.6.1/build/...
1    /build/python-visions/src/visions-0.6.1/build/...
2       ...c/visions-0.6.1/build/...
4                                                 None
Name: image_png_missing, dtype: object
type = Path, typeset = CompleteSet, difference = True

    @pytest.mark.parametrize(**get_inference_cases(series, inference_map, typeset))
    def test_inference(series, type, typeset, difference):
        """Test the generated combinations for "inference(series) == type"
    
        Args:
            series: the series to test
            type: the type to test against
        """
        result, message = infers(series, type, typeset, difference)
>       assert result, message
E       AssertionError: inference of image_png_missing expected Path to be False (typeset=CompleteSet)
E       assert False

tests/typesets/test_complete_set.py:317: AssertionError
=============================== warnings summary ===============================
tests/test_root.py::test_multiple_roots
  /build/python-visions/src/visions-0.6.1/build/lib/visions/typesets/typeset.py:88: UserWarning: {Generic} were isolates in the type relation map and consequently orphaned. Please add some mapping to the orphaned nodes.
    warnings.warn(message)

tests/test_summarization.py::test_complex_missing_summary
  /usr/lib/python3.8/site-packages/numpy/core/_methods.py:47: ComplexWarning: Casting complex values to real discards the imaginary part
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED tests/typesets/test_complete_set.py::test_contains[file_mixed_ext x File]
FAILED tests/typesets/test_complete_set.py::test_contains[image_png x File]
FAILED tests/typesets/test_complete_set.py::test_contains[image_png x Image]
FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x File]
FAILED tests/typesets/test_complete_set.py::test_contains[image_png_missing x Image]
FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x File expected True]
FAILED tests/typesets/test_complete_set.py::test_inference[file_mixed_ext x Path expected False]
FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Image expected True]
FAILED tests/typesets/test_complete_set.py::test_inference[image_png x Path expected False]
FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Image expected True]
FAILED tests/typesets/test_complete_set.py::test_inference[image_png_missing x Path expected False]
================= 11 failed, 8954 passed, 2 warnings in 15.94s =================

all nulls should be inferred as generic

i don't think this should be expected behavior.

In [41]: infer_type(pd.DataFrame({'x':['', '']}), StandardSet())
Out[41]: {'x': DateTime}

In [39]: infer_type(pd.DataFrame({'x':[None, None]}), StandardSet())
Out[39]: {'x': Boolean}

bug

opened by majidaldo 3

comma separator
New functionality

comma separator handling for string digits

new utility functionality for working with missing values

Major Proposed Changes

Integer should be a strict subset of Float
opened by ieaves 3
Add CodeQL workflow for GitHub code scanning
Hi dylan-profiler/visions!

This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

Questions? Check out the FAQ below!

FAQ

Click here to expand the FAQ section

How often will the code scanning analysis run?

By default, code scanning will trigger a scan with the CodeQL engine on the following events:

On every pull request — to flag up potential security problems for you to investigate before merging a PR.

On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.

Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

What will this cost?

Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

What types of problems does CodeQL find?

The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

How do I upgrade my CodeQL engine?

No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

The analysis doesn’t seem to be working

If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

How do I disable LGTM.com?

If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

Which source code hosting platforms does code scanning support?

GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

How do I know this PR is legitimate?

This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

I have another question / how do I get in touch?

Please join the discussion here to ask further questions and send us suggestions!
opened by lgtm-com[bot] 0
Sktime semantic data types for time series & vision
I've recently been made aware of this excellent and imo much needed library by @lmmentel.

The reason is its similarity to the datatypes module of sktime, which introduces semantic typing for time series related data types - we distinguish "mtypes" (machine representations) and "scitypes" (scientific types, what visions calls semantic type). More details here as reference.

Few questions for visions devs:

time series are known to be a notoriously splintered field in terms of data representation, and even more when it comes to learning tasks (as in your ML example). Do you see visions moving in the direction of typing for ML?

would you have time to look into the sktime datatypes module and assess how similar this is to visions? If similar, we might be tempted to take a dependency on visions and contribute. Key features are mtype conversions, scitype inference, checks that also return metadata (e.g., number of time stamps in a series, which can be represented 4 different ways)

enhancement
opened by fkiraly 7

src/visions/types/url.py passes non URLs

src/visions/types/url.py does not correctly validate URLs.

First, the example code (lines 14--19) from the docs do not return True:

Python 3.9.4 (default, Apr  9 2021, 09:32:38)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import visions
>>> from urllib.parse import urlparse
>>> urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
>>> x = [urlparse(url) for url in urls]
>>> x in visions.URL
False
>>> x
[ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment=''), ParseResult(scheme='https', netloc='github.com', path='/pandas-profiling/pandas-profiling', params='', query='', fragment='')]
>>> import pkg_resources
>>> pkg_resources.get_distribution("visions").version
'0.7.4'

Second, non URLs are passing:

>>> urlparse('junk') in visions.URL
True
>>>

The code should probably check something like the following for each element of x:

    try:
        result = urlparse(x)
        return all([result.scheme, result.netloc])
    except:
        return False

Finally, and this is a suggested enhancement, I think the behavior would be more useful if it handled raw strings and did the parsing internally without the caller having to supply a parser:

urls = ['http://www.cwi.nl:80/%7Eguido/Python.html', 'https://github.com/pandas-profiling/pandas-profiling']
>>> urls in visions.URL
True

bug

opened by leapingllamas 3

How to check if a type is/is_not parent of another type ?

Follow the example of "Problem type inference".

graph

From one dataframe, I already make a list of type for each column. Here is the type_list:

[Discrete,
 Nominal,
 Discrete,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Binary,
 Discrete,
 Discrete,
 Discrete,
 Nominal,
 Binary]

type(type_list[0]) give visions.types.type.VisionsBaseTypeMeta

Now, I want to check if each type either have parent type of Categorical or Numeric.

for column, t in zip(column, type_list):
     if is_type_parent_of_categorical(t): 
            category_job(dataframe[column])

# binary is child if Categorical
is_type_parent_of_categorical(type_list[14]) -> True 

# Discrete is child of Numeric 
is_type_parent_of_categorical(type_list[0]) -> False

How should I implement is_type_parent_of_categorical ?

My workaround seem to work because string comparision:

def is_type_parent_of_categorical(visions_type):
        type_str = str(visions_type)
            if type_str in ["Categorical", "Ordinal", "Nominal", "Binary"]:
                return True
            return False

enhancement

opened by ttpro1995 2

function: 'lowest' common type

Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.

A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.
enhancement

opened by majidaldo 2

Releases(v0.7.5)

v0.7.5(Dec 5, 2021)

FIX: Numpy backend FIX: Compatibility fixes with new versions of pandas ENH: Documentation updates ENH: CI
Source code(tar.gz)
Source code(zip)
v0.7.4(Sep 27, 2021)

FIX: bottleneck import removed (py3.9+ incompatible)
Source code(tar.gz)
Source code(zip)
v0.7.2(Sep 27, 2021)

FEAT: Numpy and spark backends DEPS: bottleneck dependency removed (py3.9+ incompatible)
Source code(tar.gz)
Source code(zip)
v0.7.1(Feb 4, 2021)

FIX: CI issues FIX: Unit tests with multiple pandas boolean types. FIX: Documentation changes
Source code(tar.gz)
Source code(zip)
v0.7.0(Jan 5, 2021)
public methods on typesets are now static

introduces a new declarative API

documentation changes and fixes.

Source code(tar.gz)
Source code(zip)
v0.6.4(Oct 17, 2020)

ENH: swifter apply for pandas backend FIX: fix for issue #147 ENH: __version__ attribute made available ENH: improved typing and CI ENH: contrib types/typesets for a low-threshold contribution of types
Source code(tar.gz)
Source code(zip)
v0.6.1(Oct 11, 2020)

ENH: Expose state using typeset.detect and typeset.infer ENH: plotting of typesets improved FIX: fix and test cases for #136 CLN: pre-commit with black, isort, pyupgrade, flake8 ENH: type relations are now accessible by type (e.g. Float.relations[Integer])
Source code(tar.gz)
Source code(zip)
v0.6.0(Sep 22, 2020)

Improved default handling of sparse and empty series (+ testing)
Source code(tar.gz)
Source code(zip)
v0.5.1(Sep 22, 2020)
Introduce stateful type inference and casting

Expose test utils to users and fix diagnostic information

Integer consistency for the standard set

Use pd.BooleanDtype for newer versions of pandas

Latest black formatting

Source code(tar.gz)
Source code(zip)
v0.5.0(Aug 16, 2020)
API breaking changes:

migration to single dispatch on typeset methods

updated API to unify detect / infer / cast against Series and DataFrames

improvements to boolean type

Source code(tar.gz)
Source code(zip)
v0.4.6(Jul 28, 2020)

Fixes big_o requirement regression.
Source code(tar.gz)
Source code(zip)
v0.4.5(Jul 28, 2020)

Extended root node detection over relation graph & generic support for user customized root nodes.
Source code(tar.gz)
Source code(zip)
v0.4.4(May 11, 2020)

Split text summary in summary of length and unicode
Source code(tar.gz)
Source code(zip)
v0.4.3(May 10, 2020)

Fix for empty text series
Source code(tar.gz)
Source code(zip)
v0.4.2(May 10, 2020)
Support for Files and Images, rewritten summarization functions

Renamed ExistingPath to File

Renamed ImagePath to Image

Version bump to 0.4.2

Summaries: return series instead of dict

Categorical: unicode counts now based on original character distribution instead of unique characters which are used as intermediate step for increased performance.

Categorical: aggregate functions are included for string length (min, max, mean, median).

Path: number of unique values for the path parts are returned

Image: make Exif and Hash calculations optional. Also return width, height and area.

File: in addition to the file_size, return creation, modification and access time (which were already returned).

Source code(tar.gz)
Source code(zip)

Owner

DYLAN: Tools for effective data analysis

GitHub Repository https://dylan-profiler.github.io/visions/visions/getting_started/usage/types.html

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

FilterPy - Kalman filters and other optimal and non-optimal estimation filters in Python. NOTE: Imminent drop of support of Python 2.7, 3.4. See secti

2.5k Dec 30, 2022

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

tldextract Python Module tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and s

1.6k Jan 03, 2023

Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

55 Oct 24, 2022

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022

Bigdata Simulation Library Of Dream By Sandman Books

BIGDATA SIMULATION LIBRARY OF DREAM BY SANDMAN BOOKS ================= Solution Architecture Description In the realm of Dreaming, its ruler SANDMAN,

3 Jun 30, 2022

Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

3 Mar 03, 2022

A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023

Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

6 Jun 07, 2022

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

76 Nov 30, 2022

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

102 Nov 16, 2022

Gaussian processes in TensorFlow

Website | Documentation (release) | Documentation (develop) | Glossary Table of Contents What does GPflow do? Installation Getting Started with GPflow

1.7k Jan 06, 2023

A program that uses an API and a AI model to get info of sotcks

Stock-Market-AI-Analysis I dont mind anyone using this code but please give me credit A program that uses an API and a AI model to get info of stocks

1 Dec 17, 2021

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

6.3k Jan 08, 2023

Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

17 Aug 21, 2022

Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

8 Oct 18, 2022

Visions provides an extensible suite of tools to support common data analysis operations

Related tags

Overview

Visions

Documentation

Installation

Supported frameworks

Contributing and support

Acknowledgements

Comments

Summary

Caveats

FAQ

How often will the code scanning analysis run?

What will this cost?

What types of problems does CodeQL find?

How do I upgrade my CodeQL engine?

The analysis doesn’t seem to be working

How do I disable LGTM.com?

Which source code hosting platforms does code scanning support?

How do I know this PR is legitimate?

I have another question / how do I get in touch?

Releases(v0.7.5)

v0.7.5(Dec 5, 2021)

v0.7.4(Sep 27, 2021)

v0.7.2(Sep 27, 2021)

v0.7.1(Feb 4, 2021)

v0.7.0(Jan 5, 2021)

v0.6.4(Oct 17, 2020)

v0.6.1(Oct 11, 2020)

v0.6.0(Sep 22, 2020)

v0.5.1(Sep 22, 2020)

v0.5.0(Aug 16, 2020)

v0.4.6(Jul 28, 2020)

v0.4.5(Jul 28, 2020)

v0.4.4(May 11, 2020)

v0.4.3(May 10, 2020)

v0.4.2(May 10, 2020)