Create HTML profiling reports from pandas DataFrame objects

Overview

Pandas Profiling

Pandas Profiling Logo Header

Build Status Code Coverage Release Version Python Version Code style: black

Documentation | Slack | Stack Overflow

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
  • File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

Announcements

Version v2.11.0 released featuring an exciting integration with Great Expectations that many of you requested (see details below).

Spark backend in progress: We can happily announce that we're nearing v1 for the Spark backend for generating profile reports. Stay tuned.

Support pandas-profiling

The development of pandas-profiling relies completely on contributions. If you find value in the package, we welcome you to support the project directly through GitHub Sponsors! Please help me to continue to support this package. It's extra exciting that GitHub matches your contribution for the first year.

Find more information here:

February 20, 2021 πŸ’˜


Contents: Examples | Installation | Documentation | Large datasets | Command line usage | Advanced usage | integrations | Support | Types | How to contribute | Editor Integration | Dependencies


Examples

The following examples can give you an impression of what the package can do:

Specific features:

Tutorials:

Installation

Using pip

PyPi Downloads PyPi Monthly Downloads PyPi Version

You can install using the pip package manager by running

pip install pandas-profiling[notebook]

Alternatively, you could install the latest version directly from Github:

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Using conda

Conda Downloads Conda Version

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

From source

Download the source code by cloning the repository or by pressing 'Download ZIP' on this page.

Install by navigating to the proper directory and running:

python setup.py install

Documentation

The documentation for pandas_profiling can be found here. Previous documentation is still available here.

Getting started

Start by loading in your pandas DataFrame, e.g. by using:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

To generate the report, run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Explore deeper

You can configure the profile report in any way you like. The example code below loads the explorative configuration file, that includes many features for text (length distribution, unicode information), files (file size, creation time) and images (dimensions, exif information). If you are interested what exact settings were used, you can compare with the default configuration file.

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

Learn more about configuring pandas-profiling on the Advanced usage page.

Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook. There are two interfaces (see animations below): through widgets and through a HTML report.

Notebook Widgets

This is achieved by simply displaying the report. In the Jupyter Notebook, run:

profile.to_widgets()

The HTML report can be included in a Jupyter notebook:

HTML

Run the following code:

profile.to_notebook_iframe()

Saving the report

If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, you can obtain the data as JSON:

# As a string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

Large datasets

Version 2.4 introduces minimal mode.

This is a default configuration that disables expensive computations (such as correlations and dynamic binning).

Use the following syntax:

profile = ProfileReport(large_dataset, minimal=True)
profile.to_file("output.html")

Command line usage

For standard formatted CSV files that can be read immediately by pandas, you can use the pandas_profiling executable.

Run the following for information about options and arguments.

pandas_profiling -h

Advanced usage

A set of options is available in order to adapt the report generated.

  • title (str): Title for the report ('Pandas Profiling Report' by default).
  • pool_size (int): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
  • progress_bar (bool): If True, pandas-profiling will display a progress bar.
  • infer_dtypes (bool): When True (default) the dtype of variables are inferred using visions using the typeset logic (for instance a column that has integers stored as string will be analyzed as if being numeric).

More settings can be found in the default configuration file, minimal configuration file and dark themed configuration file.

You find the configuration docs on the advanced usage page here

Example

profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file("output.html")

Integrations

Great Expectations

Great Expectations

Profiling your data is closely related to data validation: often validation rules are defined in terms of well-known statistics. For that purpose, pandas-profiling integrates with Great Expectations. This a world-class open-source library that helps you to maintain data quality and improve communication about data between teams. Great Expectations allows you to create Expectations (which are basically unit tests for your data) and Data Docs (conveniently shareable HTML data reports). pandas-profiling features a method to create a suite of Expectations based on the results of your ProfileReport, which you can store, and use to validate another (or future) dataset.

You can find more details on the Great Expectations integration here

Supporting open source

Maintaining and developing the open-source code for pandas-profiling, with millions of downloads and thousands of users, would not be possible without support of our gracious sponsors.

Lambda Labs

Lambda workstations, servers, laptops, and cloud services power engineers and researchers at Fortune 500 companies and 94% of the top 50 universities. Lambda Cloud offers 4 & 8 GPU instances starting at $1.50 / hr. Pre-installed with TensorFlow, PyTorch, Ubuntu, CUDA, and cuDNN.

We would like to thank our generous Github Sponsors supporters who make pandas-profiling possible:

Martin Sotir, Brian Lee, Stephanie Rivera, abdulAziz, gramster

More info if you would like to appear here: Github Sponsor page

Types

Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). pandas-profiling currently, recognizes the following types: Boolean, Numerical, Date, Categorical, URL, Path, File and Image.

We have developed a type system for Python, tailored for data analysis: visions. Selecting the right typeset drastically reduces the complexity the code of your analysis. Future versions of pandas-profiling will have extended type support through visions!

Contributing

Read on getting involved in the Contribution Guide.

A low threshold place to ask questions or start contributing is by reaching out on the pandas-profiling Slack. Join the Slack community.

Editor integration

PyCharm integration

  1. Install pandas-profiling via the instructions above
  2. Locate your pandas-profiling executable.
    • On macOS / Linux / BSD:
      $ which pandas_profiling
      (example) /usr/local/bin/pandas_profiling
    • On Windows:
      $ where pandas_profiling
      (example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe
  3. In PyCharm, go to Settings (or Preferences on macOS) > Tools > External tools
  4. Click the + icon to add a new external tool
  5. Insert the following values
    • Name: Pandas Profiling
    • Program: The location obtained in step 2
    • Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
    • Working Directory: $ProjectFileDir$

PyCharm Integration

To use the PyCharm Integration, right click on any dataset file:

External Tools > Pandas Profiling.

Other integrations

Other editor integrations may be contributed via pull requests.

Dependencies

The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

Filename Requirements
requirements.txt Package requirements
requirements-dev.txt Requirements for development
requirements-test.txt Requirements for testing
setup.py Requirements for Widgets etc.
Comments
  • pandas-profiling not compatible with pandas v1.0

    pandas-profiling not compatible with pandas v1.0

    Describe the bug

    pandas-profiling not compatible with pandas v1.0. The key method "ProfileReport" returns error "TypeError: concat() got an unexpected keyword argument 'join_axes'" as join_axes is deprecated starting from Pandas v1.0. https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html?highlight=concat

    To Reproduce

    import pandas as pd import pandas_profiling

    def test_issueXXX(): df = pd.read_csv(r'')

    pf = pandas.profiling.ProfileReport(df)

    TypeError: concat() got an unexpected keyword argument 'join_axes'

    Version information:

    • Python version: 3.7.
    • Environment: Command Line and Pycharm
    • pandas-profiling: 1.4.1
    • pandas: 1.0
    bug πŸ› 
    opened by mantou16 27
  • AttributeError: 'DataFrame' object has no attribute 'profile_report'

    AttributeError: 'DataFrame' object has no attribute 'profile_report'

    Describe the bug Running the example in readme generates an error.

    To Reproduce Running:

    import numpy as np
    import pandas as pd
    import pandas_profiling
    
    df = pd.DataFrame(
        np.random.rand(100, 5),
        columns=['a', 'b', 'c', 'd', 'e']
    )
    df.profile_report()
    

    in a Jupyter notebook gives:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-16-f9a7584e785c> in <module>
    ----> 1 df.profile_report()
    
    ~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
       5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
       5066                 return self[name]
    -> 5067             return object.__getattribute__(self, name)
       5068 
       5069     def __setattr__(self, name, value):
    
    AttributeError: 'DataFrame' object has no attribute 'profile_report'
    

    Version information: alabaster==0.7.12 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.2 asn1crypto==0.24.0 astroid==2.2.5 astropy==3.1.2 atomicwrites==1.3.0 attrs==19.1.0 Babel==2.6.0 backcall==0.1.0 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.7.1 bitarray==0.8.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.0.4 boto==2.49.0 Bottleneck==1.2.1 certifi==2019.3.9 cffi==1.12.2 chardet==3.0.4 Click==7.0 cloudpickle==0.8.0 clyent==1.2.2 colorama==0.4.1 conda==4.6.14 conda-build==3.17.8 conda-verify==3.1.1 confuse==1.0.0 contextlib2==0.5.5 cryptography==2.6.1 cycler==0.10.0 Cython==0.29.6 cytoolz==0.9.0.1 dask==1.1.4 decorator==4.4.0 defusedxml==0.5.0 distributed==1.26.0 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2 future==0.17.1 gevent==1.4.0 glob2==0.6 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 hpat==0.28.1 html5lib==1.0.1 htmlmin==0.1.12 idna==2.8 imageio==2.5.0 imagesize==1.1.0 importlib-metadata==0.0.0 ipykernel==5.1.0 ipyparallel==6.2.4 ipython==7.4.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.16 itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.3 jeepney==0.4 Jinja2==2.10 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-core==4.4.0 jupyterlab==0.35.4 jupyterlab-server==0.2.0 keyring==18.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1 libarchive-c==2.8 lief==0.9.0 lightgbm==2.2.3 llvmlite==0.28.0 locket==0.2.0 lxml==4.3.2 MarkupSafe==1.1.1 matplotlib==3.0.3 mccabe==0.6.1 missingno==0.4.1 mistune==0.8.4 mkl-fft==1.0.10 mkl-random==1.0.2 mock==2.0.0 more-itertools==6.0.0 mpi4py==3.0.1 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.4.1 nbformat==4.4.0 networkx==2.2 nltk==3.4 nose==1.3.7 notebook==5.7.8 numba==0.43.1 numerapi==1.5.1 numerox==3.7.0 numexpr==2.6.9 numpy==1.16.2 numpydoc==0.8.0 olefile==0.46 openpyxl==2.6.1 packaging==19.0 pandas==0.24.2 pandas-profiling==1.4.1 pandocfilters==1.4.2 parso==0.3.4 partd==0.3.10 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1 pbr==5.2.0 pep8==1.7.1 pexpect==4.6.0 phik==0.9.8 pickleshare==0.7.5 Pillow==5.4.1 pkginfo==1.5.0.1 plotly==3.8.1 pluggy==0.9.0 ply==3.11 prometheus-client==0.6.0 prompt-toolkit==2.0.9 psutil==5.6.1 ptyprocess==0.6.0 py==1.8.0 pyarrow==0.11.1 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.2 pyflakes==2.1.1 Pygments==2.3.1 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.3.1 pyrsistent==0.14.11 PySocks==1.6.8 pytest==4.3.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-pylint==0.14.0 pytest-remotedata==0.3.1 python-dateutil==2.8.0 python-igraph==0.7.1.post6 pytz==2018.9 PyWavelets==1.0.2 PyYAML==5.1 pyzmq==18.0.0 QtAwesome==0.5.7 qtconsole==4.4.3 QtPy==1.7.0 requests==2.21.0 retrying==1.3.3 rope==0.12.0 ruamel-yaml==0.15.46 scikit-image==0.14.2 scikit-learn==0.20.3 scipy==1.2.1 seaborn==0.9.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==1.8.5 sphinxcontrib-websupport==1.1.0 spyder==3.3.3 spyder-kernels==0.4.2 SQLAlchemy==1.3.1 statsmodels==0.9.0 sympy==1.3 tables==3.5.1 tblib==1.3.2 terminado==0.8.1 testpath==0.4.2 toolz==0.9.0 tornado==6.0.2 tqdm==4.31.1 traitlets==4.3.2 typed-ast==1.4.0 unicodecsv==0.14.1 urllib3==1.24.1 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.4.2 wrapt==1.11.1 wurlitzer==1.0.2 xlrd==1.2.0 XlsxWriter==1.1.5 xlwt==1.3.0 zict==0.1.4 zipp==0.3.3

    Additional context Add any other context about the problem here.

    bug πŸ› 
    opened by bdch1234 22
  • Ploting a response variable on the histograms

    Ploting a response variable on the histograms

    Hey,

    Great job with pandas-profiling I love it. I think it would be great to have an extra parameter to specify a response column. Plotting the average response for every bin of the histograms (for each variables) would allow to see obvious trends/correlations and would be useful for any regression problem (might be more tricky for classification where the response are discrete). What do you think ?

    Thanks!

    feature request πŸ’¬ 
    opened by Optimox 17
  • feat: added filter to locate columns

    feat: added filter to locate columns

    This is a follow-up PR to the PR made earlier (#1096). Closes #638 Have changed the input from an text field to a dropdown as per @fabclmnt's suggestion.

    Here's how it looks and works now:

    https://user-images.githubusercontent.com/57868024/194428807-a7642deb-6ba5-4404-95ef-3e9605ba10cd.mp4

    The dropdown isn't visible due to restrictions on the screen-recorder, here's an image of it in action for reference.

    image

    P.S. I'm sorry for the hassle in the previous PR, I haven't worked with git very much. Thank you for your patience.

    opened by g-kabra 16
  • Potential incompatiblity with Pandas 1.4.0

    Potential incompatiblity with Pandas 1.4.0

    Describe the bug

    Pandas version 1.4.0 was release few days ago and some tests start failing. I was able to reproduce with a minimum example which is failing with Pandas 1.4.0 and working with Pandas 1.3.5.

    To Reproduce

    import pandas as pd
    import pandas_profiling
    
    data = {"col1": [1, 2], "col2": [3, 4]}
    dataframe = pd.DataFrame(data=data)
    
    profile = pandas_profiling.ProfileReport(dataframe, minimal=False)
    profile.to_html()
    

    When running with Pandas 1.4.0, I get the following traceback:

    Traceback (most recent call last):
      File "/tmp/bug.py", line 8, in <module>
        profile.to_html()
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 368, in to_html
        return self.html
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 185, in html
        self._html = self._render_html()
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 287, in _render_html
        report = self.report
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 179, in report
        self._report = get_report_structure(self.config, self.description_set)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 161, in description_set
        self._description_set = describe_df(
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/describe.py", line 71, in describe
        series_description = get_series_descriptions(
      File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
        return func(*args, **kwargs)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 92, in pandas_get_series_descriptions
        for i, (column, description) in enumerate(
      File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 870, in next
        raise value
      File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 125, in worker
        result = (True, func(*args, **kwds))
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 72, in multiprocess_1d
        return column, describe_1d(config, series, summarizer, typeset)
      File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
        return func(*args, **kwargs)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 50, in pandas_describe_1d
        return summarizer.summarize(config, series, dtype=vtype)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summarizer.py", line 37, in summarize
        _, _, summary = self.handle(str(dtype), config, series, {"type": str(dtype)})
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 62, in handle
        return op(*args)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
        return f(*res)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
        return f(*res)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
        return f(*res)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 17, in func2
        res = g(*x)
      File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
        return func(*args, **kwargs)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 65, in inner
        return fn(config, series, summary)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 82, in inner
        return fn(config, series, summary)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 205, in pandas_describe_categorical_1d
        summary.update(length_summary_vc(value_counts))
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 162, in length_summary_vc
        "median_length": weighted_median(
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/utils_pandas.py", line 13, in weighted_median
        w_median = (data[weights == np.max(weights)])[0]
    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
    

    If I try changing the minimal from False to True, the script is now passing.

    Version information:

    Failing environment

    Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.4.0 | 3.1.0 Full pip list:

    Package               Version
    --------------------- ---------
    attrs                 21.4.0
    certifi               2021.10.8
    charset-normalizer    2.0.10
    cycler                0.11.0
    fonttools             4.28.5
    htmlmin               0.1.12
    idna                  3.3
    ImageHash             4.2.1
    Jinja2                3.0.3
    joblib                1.0.1
    kiwisolver            1.3.2
    MarkupSafe            2.0.1
    matplotlib            3.5.1
    missingno             0.5.0
    multimethod           1.6
    networkx              2.6.3
    numpy                 1.22.1
    packaging             21.3
    pandas                1.4.0
    pandas-profiling      3.1.0
    phik                  0.12.0
    Pillow                9.0.0
    pip                   21.3.1
    pydantic              1.9.0
    pyparsing             3.0.7
    python-dateutil       2.8.2
    pytz                  2021.3
    PyWavelets            1.2.0
    PyYAML                6.0
    requests              2.27.1
    scipy                 1.7.3
    seaborn               0.11.2
    setuptools            60.0.5
    six                   1.16.0
    tangled-up-in-unicode 0.1.0
    tqdm                  4.62.3
    typing_extensions     4.0.1
    urllib3               1.26.8
    visions               0.7.4
    wheel                 0.37.1
    

    Working environment

    Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.3.5 | 3.1.0 Full pip list:

    Package               Version
    --------------------- ---------
    attrs                 21.4.0
    certifi               2021.10.8
    charset-normalizer    2.0.10
    cycler                0.11.0
    fonttools             4.28.5
    htmlmin               0.1.12
    idna                  3.3
    ImageHash             4.2.1
    Jinja2                3.0.3
    joblib                1.0.1
    kiwisolver            1.3.2
    MarkupSafe            2.0.1
    matplotlib            3.5.1
    missingno             0.5.0
    multimethod           1.6
    networkx              2.6.3
    numpy                 1.22.1
    packaging             21.3
    pandas                1.3.5
    pandas-profiling      3.1.0
    phik                  0.12.0
    Pillow                9.0.0
    pip                   21.3.1
    pydantic              1.9.0
    pyparsing             3.0.7
    python-dateutil       2.8.2
    pytz                  2021.3
    PyWavelets            1.2.0
    PyYAML                6.0
    requests              2.27.1
    scipy                 1.7.3
    seaborn               0.11.2
    setuptools            60.0.5
    six                   1.16.0
    tangled-up-in-unicode 0.1.0
    tqdm                  4.62.3
    typing_extensions     4.0.1
    urllib3               1.26.8
    visions               0.7.4
    wheel                 0.37.1
    

    Let me know if I can provide more details and thank you for your good work!

    bug πŸ› 
    opened by Lothiraldan 15
  • TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

    TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

        stats['range'] = stats['max'] - stats['min']
    TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
    

    I got this error

    bug πŸ› information requested ❔ help wanted πŸ™‹ 
    opened by eyadsibai 15
  • 2.10.0 -  TraitError: The 'value' trait of a HTML instance must be a unicode string...

    2.10.0 - TraitError: The 'value' trait of a HTML instance must be a unicode string...

    Describe the bug

    Hi there - Looks like latest release (2.10.0) has broken a to_widgets functionality as outlined in the Getting started section of the docs. Confirmed eolling back to 2.9.0 does not produce the issue.

    To Reproduce

    # pandas_profiling==2.10.0
    import numpy as np
    import pandas as pd
    from pandas_profiling import ProfileReport
    
    df = pd.DataFrame(
        np.random.rand(100, 5),
        columns=["a", "b", "c", "d", "e"]
    )
    
    profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)
    
    profile.to_widgets()
    
    

    Returns:

    TraitError: The 'value' trait of a HTML instance must be a unicode string, but a value of Numeric <class 'visions.types.type.VisionsBaseTypeMeta'> was specified.
    

    Version information: 2.10.0

    Additional context

    opened by rynmccrmck 14
  • ZeroDivisionError when using version 1.4.1

    ZeroDivisionError when using version 1.4.1

    There was a change in behavior between versions 1.4.0 and 1.4.1 where some calls to ProfileReport that previously succeeded will now raise a ZeroDivisionError.

    An example reproduction is to take the following code and run it in a Jupyter notebook cell:

    import pandas
    import pandas_profiling
    
    import IPython
    
    df = pandas.DataFrame({'c': 'v'}, index=['c'])
    report = pandas_profiling.ProfileReport(df)
    IPython.core.display.HTML(report.html)
    

    With version 1.4.0 this produced an HTML report, but with version 1.4.1 it produces the following stack trace:

    ZeroDivisionErrorTraceback (most recent call last)
    <ipython-input-2-ffb5392b4284> in <module>()
          5 
          6 df = pandas.DataFrame({'c': 'v'}, index=['c'])
    ----> 7 report = pandas_profiling.ProfileReport(df)
          8 IPython.core.display.HTML(report.html)
    
    /usr/local/lib/python2.7/dist-packages/pandas_profiling/__init__.pyc in __init__(self, df, **kwargs)
         67 
         68         self.html = to_html(sample,
    ---> 69                             description_set)
         70 
         71         self.description_set = description_set
    
    /usr/local/lib/python2.7/dist-packages/pandas_profiling/report.pyc in to_html(sample, stats_object)
        192 
        193     # Add plot of matrix correlation
    --> 194     pearson_matrix = plot.correlation_matrix(stats_object['correlations']['pearson'], 'Pearson')
        195     spearman_matrix = plot.correlation_matrix(stats_object['correlations']['spearman'], 'Spearman')
        196     correlations_html = templates.template('correlations').render(
    
    /usr/local/lib/python2.7/dist-packages/pandas_profiling/plot.pyc in correlation_matrix(corrdf, title, **kwargs)
        134     plt.title(title, size=18)
        135     plt.colorbar(matrix_image)
    --> 136     axes_cor.set_xticks(np.arange(0, corrdf.shape[0], corrdf.shape[0] * 1.0 / len(labels)))
        137     axes_cor.set_yticks(np.arange(0, corrdf.shape[1], corrdf.shape[1] * 1.0 / len(labels)))
        138     axes_cor.set_xticklabels(labels, rotation=90)
    
    ZeroDivisionError: float division by zero
    
    opened by ojarjur 14
  • pandas_profiling.utils.cache

    pandas_profiling.utils.cache

    ModuleNotFoundError: No module named 'pandas_profiling.utils'*

    To Reproduce

    Version information:

    Additional context

    information requested ❔ 
    opened by ajaimes07 13
  • This call to matplotlib.use() has no effect because the backend has already

    This call to matplotlib.use() has no effect because the backend has already

    /home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/pandas_profiling/base.py:20: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.

    The backend was originally set to 'module://ipykernel.pylab.backend_inline' by the following code: File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in app.launch_new_instance() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start ioloop.IOLoop.instance().start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events self._handle_recv() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell handler(stream, idents, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes if self.run_code(code, result): File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 8, in import matplotlib.pyplot as plt File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 69, in from matplotlib.backends import pylab_setup File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/backends/init.py", line 14, in line for line in traceback.format_stack()

    matplotlib.use('Agg')

    opened by iweey 13
  • IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

    Describe the bug

    running the example below gives this error IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

    latest version on conda-forge

    To Reproduce

    wine.csv

    import numpy as np
    import pandas as pd
    from pandas_profiling import ProfileReport
    
    df = pd.read_csv("wine.csv")
    
    profile = ProfileReport(df, title="Pandas Profiling Report")
    
    profile.to_file("tmp.html")
    

    Version information:

    • Python 3.9
    • pandas-profiling 3.1.0 pyhd8ed1ab_ 0 conda-forge
    • pandas 1.4.2 py39h1832856_1 conda-forge
    bug πŸ› 
    opened by darenr 12
  • Feature Request

    Feature Request

    Missing functionality

    Override templates of the html flavour.

    Proposed feature

    Allow overriding (some) templates in src/pandas_profiling/report/presentation/flavours/html/templates/ to personalize pdp.

    Alternatives considered

    I monkey patched pdp to support overriding templates, e.g. to change the layout a bit as jinja2 supports this, but this isn't a clean way to do it.

    from pandas_profiling.report.presentation.flavours.html import templates
    from pandas_profiling.report.formatters import fmt, fmt_badge, fmt_numeric, fmt_percent
    import jinja2
    from jinja2 import ChoiceLoader, FileSystemLoader
    
    templates.package_loader = ChoiceLoader([
        FileSystemLoader(some_path),
        templates.package_loader,
    ])
    
    templates.jinja2_env = jinja2.Environment(
        lstrip_blocks=True, trim_blocks=True, loader=templates.package_loader
    )
    templates.jinja2_env.filters["is_list"] = lambda x: isinstance(x, list)
    templates.jinja2_env.filters["fmt_badge"] = fmt_badge
    templates.jinja2_env.filters["fmt_percent"] = fmt_percent
    templates.jinja2_env.filters["fmt_numeric"] = fmt_numeric
    templates.jinja2_env.filters["fmt"] = fmt
    

    Additional context

    No response

    needs-triage 
    opened by prhbrt 0
  • Interaction plots for time series data

    Interaction plots for time series data

    Missing functionality

    Interaction plots for numeric time series variables.

    Proposed feature

    Calculate interaction plots for both numeric and numeric time series variables. Is there a setting to enable this?

    Alternatives considered

    I considered setting tsmode=False, but then I loose the autocorrelation plots.

    needs-triage 
    opened by MauritsDescamps 0
  • Bug Report: KeyError: 'max_length' when comparing two profile_report (`minimal=True` is used to generate these report)

    Bug Report: KeyError: 'max_length' when comparing two profile_report (`minimal=True` is used to generate these report)

    Current Behaviour

    There is an error message:

    `--------------------------------------------------------------------------- KeyError Traceback (most recent call last) /var/folders/60/6qphmx_d7x7_11vpj8524vf40000gn/T/ipykernel_17862/709405443.py in 7 8 # Save report to file ----> 9 comparison_report.to_file("comparison.html")

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent) 307 create_html_assets(self.config, output_file) 308 --> 309 data = self.to_html() 310 311 if output_file.suffix != ".html":

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_html(self) 418 419 """ --> 420 return self.html 421 422 def to_json(self) -> str:

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in html(self) 229 def html(self) -> str: 230 if self._html is None: --> 231 self._html = self._render_html() 232 return self._html 233

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in _render_html(self) 337 from pandas_profiling.report.presentation.flavours import HTMLReport 338 --> 339 report = self.report 340 341 with tqdm(

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in report(self) 223 def report(self) -> Root: 224 if self._report is None: --> 225 self._report = get_report_structure(self.config, self.description_set) 226 return self._report 227

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in get_report_structure(config, summary) 376 items=list(summary["variables"]), 377 item=Container( --> 378 render_variables_section(config, summary), 379 sequence_type="accordion", 380 name="Variables",

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in render_variables_section(config, dataframe_summary) 157 variable_type = summary["type"] 158 render_map_type = render_map.get(variable_type, render_map["Unsupported"]) --> 159 template_variables.update(render_map_type(config, template_variables)) 160 161 # Ignore these

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical(config, summary) 405 406 if length: --> 407 length_table, length_histo = render_categorical_length(config, summary, varid) 408 overview_items.append(length_table) 409

    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical_length(config, summary, varid) 61 { 62 "name": "Max length", ---> 63 "value": fmt_number(summary["max_length"]), 64 "alert": False, 65 },

    KeyError: 'max_length'`

    Expected Behaviour

    Run without error

    Data Description

    I'm runing the code for dataset comparison. The original code in that link works well. But when I set minimal=True to creat report, then compare the report, there comes a error

    Code that reproduces the bug

    from pandas_profiling import ProfileReport
    
    train_df = pd.read_csv("train.csv")
    train_report = ProfileReport(train_df, title="Train", minimal=True)
    
    test_df = pd.read_csv("test.csv")
    test_report = ProfileReport(test_df, title="Test", minimal=True)
    
    comparison_report = train_report.compare(test_report)
    comparison_report.to_file("comparison.html")
    

    pandas-profiling version

    v3.5.0

    Dependencies

    pandas                       1.4.2
    pandas-profiling             3.5.0
    

    OS

    Mac

    Checklist

    • [X] There is not yet another bug report for this issue in the issue tracker
    • [X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
    • [X] The issue has not been resolved by the entries listed under Common Issues.
    needs-triage 
    opened by xiaoye-hua 0
  • Does pandas-profiling work in Jupyter Notebooks on AWS?

    Does pandas-profiling work in Jupyter Notebooks on AWS?

    Does pandas-profiling work in Jupyter Notebooks on AWS? I understand there are a lot of configuration differences that can lead to issues but whenever I try to produce a profiling report, I get the following errors when I run:

    profile = ProfileReport(df, 'myreport')
    profile.to_file('s3://myfolder/myreport.html')
    
    Summarize dataset:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 427/438 [01:14<00:01,  8.03it/s, Calculate auto correlation]                    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/multimethod/__init__.py:315: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
      return func(*args, **kwargs)
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
      warnings.warn("The input array could not be properly "
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:4881: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
      warnings.warn(stats.ConstantInputWarning(warn_msg))
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/correlations.py:67: UserWarning: There was an attempt to calculate the auto correlation, but this failed.
    To hide this warning, disable the calculation
    (using `df.profile_report(correlations={"auto": {"calculate": False}})`
    If this is problematic for your use case, please report this as an issue:
    https://github.com/ydataai/pandas-profiling/issues
    (include the error message: 'No data; `observed` has size 0.')
      warnings.warn(
    Summarize dataset:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 428/438 [28:20<32:48, 196.80s/it, Calculate spearman correlation]/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/multimethod/__init__.py:315: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
      return func(*args, **kwargs)
    Summarize dataset:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 430/438 [30:55<21:07, 158.47s/it, Calculate kendall correlation] /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:5218: RuntimeWarning: overflow encountered in long_scalars
      (2 * xtie * ytie) / m + x0 * y0 / (9 * m * (size - 2)))
    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:5219: RuntimeWarning: invalid value encountered in sqrt
      z = con_minus_dis / np.sqrt(var)
    Summarize dataset:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 432/438 [45:40<00:38,  6.34s/it, Calculate phi_k correlation]   
    ---------------------------------------------------------------------------
    _RemoteTraceback                          Traceback (most recent call last)
    _RemoteTraceback: 
    """
    Traceback (most recent call last):
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/loky/backend/queues.py", line 125, in _feed
        obj_ = dumps(obj, reducers=reducers)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/loky/backend/reduction.py", line 211, in dumps
        dump(obj, buf, reducers=reducers, protocol=protocol)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/loky/backend/reduction.py", line 204, in dump
        _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 632, in dump
        return Pickler.dump(self, obj)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/_memmapping_reducer.py", line 446, in __call__
        for dumped_filename in dump(a, filename):
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 553, in dump
        NumpyPickler(f, protocol=protocol).dump(value)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/pickle.py", line 487, in dump
        self.save(obj)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 352, in save
        wrapper.write_array(obj, self)
      File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 134, in write_array
        pickler.file_handle.write(chunk.tobytes('C'))
    OSError: [Errno 28] No space left on device
    """
    
    The above exception was the direct cause of the following exception:
    
    PicklingError                             Traceback (most recent call last)
    <ipython-input-9-34649000e9e9> in <module>
          1 profile = ProfileReport(df_perf_18, title="MyReport")
    ----> 2 profile.to_file(f"s3://sf-puas-prod-use1-pc/fire/research/home_telematics/adt/analysis/MyReport.html")
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent)
        307                 create_html_assets(self.config, output_file)
        308 
    --> 309             data = self.to_html()
        310 
        311             if output_file.suffix != ".html":
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in to_html(self)
        418 
        419         """
    --> 420         return self.html
        421 
        422     def to_json(self) -> str:
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in html(self)
        229     def html(self) -> str:
        230         if self._html is None:
    --> 231             self._html = self._render_html()
        232         return self._html
        233 
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in _render_html(self)
        337         from pandas_profiling.report.presentation.flavours import HTMLReport
        338 
    --> 339         report = self.report
        340 
        341         with tqdm(
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in report(self)
        223     def report(self) -> Root:
        224         if self._report is None:
    --> 225             self._report = get_report_structure(self.config, self.description_set)
        226         return self._report
        227 
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in description_set(self)
        205     def description_set(self) -> Dict[str, Any]:
        206         if self._description_set is None:
    --> 207             self._description_set = describe_df(
        208                 self.config,
        209                 self.df,
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/describe.py in describe(config, df, summarizer, typeset, sample)
         93         pbar.total += len(correlation_names)
         94 
    ---> 95         correlations = {
         96             correlation_name: progress(
         97                 calculate_correlation, pbar, f"Calculate {correlation_name} correlation"
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/describe.py in <dictcomp>(.0)
         94 
         95         correlations = {
    ---> 96             correlation_name: progress(
         97                 calculate_correlation, pbar, f"Calculate {correlation_name} correlation"
         98             )(config, df, correlation_name, series_description)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/utils/progress_bar.py in inner(*args, **kwargs)
          9     def inner(*args, **kwargs) -> Any:
         10         bar.set_postfix_str(message)
    ---> 11         ret = fn(*args, **kwargs)
         12         bar.update()
         13         return ret
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/correlations.py in calculate_correlation(config, df, correlation_name, summary)
        105     correlation = None
        106     try:
    --> 107         correlation = correlation_measures[correlation_name].compute(
        108             config, df, summary
        109         )
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/multimethod/__init__.py in __call__(self, *args, **kwargs)
        313         func = self[tuple(func(arg) for func, arg in zip(self.type_checkers, args))]
        314         try:
    --> 315             return func(*args, **kwargs)
        316         except TypeError as ex:
        317             raise DispatchError(f"Function {func.__code__}") from ex
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/pandas/correlations_pandas.py in pandas_phik_compute(config, df, summary)
        152         from phik import phik_matrix
        153 
    --> 154         correlation = phik_matrix(df[selected_cols], interval_cols=list(intcols))
        155 
        156     return correlation
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/phik/phik.py in phik_matrix(df, interval_cols, bins, quantile, noise_correction, dropna, drop_underflow, drop_overflow, verbose, njobs)
        254         verbose=verbose,
        255     )
    --> 256     return phik_from_rebinned_df(
        257         data_binned,
        258         noise_correction,
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/phik/phik.py in phik_from_rebinned_df(data_binned, noise_correction, dropna, drop_underflow, drop_overflow, njobs)
        164         ]
        165     else:
    --> 166         phik_list = Parallel(n_jobs=njobs)(
        167             delayed(_calc_phik)(co, data_binned[list(co)], noise_correction)
        168             for co in itertools.combinations_with_replacement(
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/parallel.py in __call__(self, iterable)
       1096 
       1097             with self._backend.retrieval_context():
    -> 1098                 self.retrieve()
       1099             # Make sure that we get a last message telling us we are done
       1100             elapsed_time = time.time() - self._start_time
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/parallel.py in retrieve(self)
        973             try:
        974                 if getattr(self._backend, 'supports_timeout', False):
    --> 975                     self._output.extend(job.get(timeout=self.timeout))
        976                 else:
        977                     self._output.extend(job.get())
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
        565         AsyncResults.get from multiprocessing."""
        566         try:
    --> 567             return future.result(timeout=timeout)
        568         except CfTimeoutError as e:
        569             raise TimeoutError from e
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
        436                     raise CancelledError()
        437                 elif self._state == FINISHED:
    --> 438                     return self.__get_result()
        439 
        440                 self._condition.wait(timeout)
    
    ~/SageMaker/.envs/mykernel/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
        388         if self._exception:
        389             try:
    --> 390                 raise self._exception
        391             finally:
        392                 # Break a reference cycle with the exception in self._exception
    
    PicklingError: Could not pickle the task to send it to the workers.
    

    I'm on the latest version of pandas-profiling (just installed it today).

    question/discussion ❓ information requested ❔ 
    opened by JohnTravolski 3
  • bug: variables list is causing a misconfiguration in the UI variables section

    bug: variables list is causing a misconfiguration in the UI variables section

    Current Behaviour

    图片

    Expected Behaviour

    It would be easier on eyes if we make it as pill buttons instead, just like the one in "Overview"

    图片

    Example:

    图片

    Data Description

    https://pandas-profiling.ydata.ai/examples/master/features/united_report.html

    pandas-profiling version

    vdev

    Checklist

    • [X] There is not yet another bug report for this issue in the issue tracker
    • [X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
    • [X] The issue has not been resolved by the entries listed under Common Issues.
    bug πŸ› 
    opened by rivanfebrian123 7
Releases(v3.6.2)
  • v3.6.2(Jan 2, 2023)

  • v3.6.1(Dec 23, 2022)

  • v3.6.0(Dec 21, 2022)

    3.6.0 (2022-12-21)

    Bug Fixes

    • add css to cope with large tables (7f42f87)
    • adjust categoricals layout (f0bb45a)
    • categorical data not being obscured in the common values plot (40236bc)
    • compare report ignoring config parameter (3d60556)
    • compare report warnings always showing the last alert type (6b3c13d)
    • comparison fails when duplicates are disable (#1208) (6d19620)
    • do no raise exception for percentage formatter (3ea626d)
    • enforce recomputation of description sets (a9fd1c8)
    • error comparing only one precomputed profile (00646cd)
    • html: sensible cloud-platform notebook html rendering (b22ece2)
    • ignoring config of precomputed reports (6478c40)
    • only compute auto correlation when no config is specified (d5d4f58)
    • remove malfunctioning hook (e2593f5)
    • remove unused test (2170338)
    • return the proper type for widgets (4c0b358)
    • set compute default to false (c70e491)
    • solve mypy error (9c4266e)
    • solve mypy issue (e3e7788)
    • uses colors from the specified config (c0c556d)
    • utils: use 'urllib.request' instead of 'requests' (#1177) (e4d020b), closes #1168

    Features

    • add heatmap values as a table under correlations (fc5da9e)
    • allow to specify the configuration for the comparison report (ad725b0)
    • design improvements on the correlations section (e5cd8cf)
    • implement imbalanced warning (ce84c81)
    • update variables layout (#1207) (cf0e0a7)
    Source code(tar.gz)
    Source code(zip)
  • v3.5.0(Nov 22, 2022)

    3.5.0 (2022-11-22)

    Bug Fixes

    Features

    Source code(tar.gz)
    Source code(zip)
  • v3.4.0(Oct 20, 2022)

    3.4.0 (2022-10-20)

    Bug Fixes

    Features

    Source code(tar.gz)
    Source code(zip)
  • v3.3.0(Sep 7, 2022)

  • v3.2.0(May 2, 2022)

  • v3.1.0(Sep 27, 2021)

  • v3.0.0(May 11, 2021)

  • v2.13.0(May 8, 2021)

  • v2.12.0(May 5, 2021)

  • v2.11.0(Feb 20, 2021)

  • v2.10.1(Feb 7, 2021)

  • v2.10.0rc1(Jan 5, 2021)

  • v2.9.0(Sep 3, 2020)

  • v2.9.0rc1(Jul 12, 2020)

    This release candidate improves handling of sensitive data and futhermore reduces technical debt with various fixes. The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.

    A warm thank you to everyone who has contributed to this release: @gauravkumar37 @Jooong @smaranjitghose @XavierBanos Tam Nguyen @andycraig @mgorsk1 @mbh86 @MHUNCHO @GaelVaroquaux @AmauryLepicard @baluyotraf @pvojnisek @abegong

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(May 12, 2020)

    pandas-profiling now has build-in supports for Files and Images, such as extracting file sizes, creation dates and dimensions and scanning for truncated images or those containing EXIF information. Moreover, the text analysis features have also been reworked, providing more informative statistics.

    Read the changelog v2.8.0 for more details.

    Contributors: @loopyme @Bradley-Butcher @willemhendriks, @IscaAy, @frellnick, @dataverz @ieaves

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(May 11, 2020)

  • v2.7.0(May 7, 2020)

    Announcement and changelog are available in the documentation.

    We are grateful for @loopyme and @kyleYang for creating parts of the features on this release.

    Thanks for all contributors that made this release possible @1313e @dataprofessor @neomatrix369 @jiangfangfangxm @WesleyTheGeolien @NickYi1990 @ricgu8086.

    Source code(tar.gz)
    Source code(zip)
  • v2.6.0(Apr 13, 2020)

    Dependency policy

    The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.

    Pandas v1

    Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors here). At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas' latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.

    Python 3.6+ features

    Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won't benefit from updates or maintenance.

    Extended continuous integration

    Starting from this release, we use Github Actions and Travis CI combined to increase maintainability. Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Feb 14, 2020)

    • Progress bar added (#224)
    • Character analysis for Text/NLP (#278)
    • Themes: configuration and demo's (Orange, Dark)
    • Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
    • Toggle descriptions at correlations.

    Deprecation:

    • This is the last version to support Python 3.5.

    Stability:

    • The order of columns changed when sort="None" (#377, fixed).
    • Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)
    • Improved mixed type detection (#351)
    • Refactor of report structures.
    • Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).
    • Distinct counts exclude NaNs.
    • Fixed alerts in notebooks.

    Other improvements:

    • Warnings are now sorted.
    • Links to Binder and Google Colab are added for notebooks (#349)
    • The overview section is tabbed.
    • Commit for pandas-profiling v2.5.0
    • Progress bar added (#224)
    • Character analysis for Text/NLP (#278)
    • Themes: configuration and demo's (Orange, Dark)
    • Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
    • Toggle descriptions at correlations.

    Deprecation:

    • This is the last version to support Python 3.5.

    Stability:

    • The order of columns changed when sort="None" (#377, fixed).
    • Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)
    • Improved mixed type detection (#351)
    • Refactor of report structures.
    • Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).
    • Distinct counts exclude NaNs.
    • Fixed alerts in notebooks.

    Other improvements:

    • Warnings are now sorted.
    • Links to Binder and Google Colab are added for notebooks (#349)
    • The overview section is tabbed.
    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Jan 8, 2020)

    The v2.4.0 release decouples the data structure of reports from the actual rendering. It's now much simpler to change the user interface, whether the user is in a jupyter notebook, webpage, native application or just wants a json view of the data.

    We are also proud to announce that we are accepted for the GitHub Sponsor programme. You are cordially invited to support me through this programme, because you want to see me continue working on this project and to boost community funding, GitHub will match your contribution!

    Other improvements:

    • extended configuration with better defaults, including minimal mode for big data (#258, #310)
    • more example datasets
    • rejection of highly correlated variables is generalized (#284, #299)
    • many structural and stability improvements (#254, #274, #239)

    Special thanks to @marco-cardoso @ajupton @lvwerra @gliptak @neomatrix369 for their contributions.

    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Jul 27, 2019)

    • (Experimental) Support for "path" type
    • Fix numeric precision (#225)
    • Force labels in missing values diagram for large number of columns (#222)
    • Add pull request template
    • Add Census Dataset from the UCI ML Repository

    Thanks @bensdm and @huaiweicheng for your valuable contributions to this version!

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jul 22, 2019)

    New release introducing variable size binning (via astropy), PyCharm integration and various fixes and optimizations.

    • Added Variable bin sizing via Bayesian Boxing (feature request [#216])
    • PyCharm integration, console attempts to detect file type.
    • Fixed bug [#215].
    • Updated the missingno package to 0.4.2, fixing the font size in the bar diagram.
    • Various optimizations

    Thanks to: @Utsav37 @mansenfranzen @jakevdp

    Source code(tar.gz)
    Source code(zip)
  • v2.1.2(Jul 11, 2019)

  • v2.1.1(Jul 11, 2019)

  • v2.1.0(Jul 6, 2019)

    The pandas-profiling release version 2.1.0 includes:

    • Correlations: correlation calculations are now more fault tolerant ([#51] and [#197]), correlation names in the report are clarified.
    • Jupyter Notebook: rendering a profiling report is done inside the srcdoc attribute (which fixes [#199]), a full-width option is added and the column layout is improved.
    • User experience: The table styling and sample section formatting is improved.
    • Warnings: detection added for categorical variable that is suspected to be of the datetime type.
    • Documentation and community:
      • The Contribution page helps users that want to contribute.
      • Typo's fixed [#195], Thank you @abhilashshakti
      • Added more examples.
    • Other bugfixes and improvements:
      • Add version information to console interface.
      • Fix: Remove one-time used logger [#202]
      • Fix: Dealing with string indices [#200]

    Contributors: @abhilashshakti @adamrossnelson @manycoding @InsciteAnalytics

    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jun 23, 2019)

  • v2.0.2(Jun 22, 2019)

    Revised version structure, fixed recursion preventing installation of dependencies ([#184]).

    The setup.py file used to include utils from the package prior to installation. This causes errors when the dependencies are not yet present.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 21, 2019)

A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

H2O.ai 1.6k Jan 05, 2023
The easy way to write your own flavor of Pandas

Pandas Flavor The easy way to write your own flavor of Pandas Pandas 0.23 added a (simple) API for registering accessors with Pandas objects. Pandas-f

Zachary Sailer 260 Jan 01, 2023
High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

Man Group 2.9k Dec 23, 2022
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

swifter A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner. Blog posts Release 1.0.0 Fir

Jason Carpenter 2.2k Jan 04, 2023
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

pandas-log The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common funct

Eyal Trabelsi 206 Dec 13, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 01, 2023
Modin: Speed up your Pandas workflows by changing a single line of code

Scale your pandas workflows by changing one line of code To use Modin, replace the pandas import: # import pandas as pd import modin.pandas as pd Inst

8.2k Jan 01, 2023
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format,

RAPIDS 5.2k Dec 31, 2022
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 348 Jan 03, 2023
Universal 1d/2d data containers with Transformers functionality for data analysis.

XPandas (extended Pandas) implements 1D and 2D data containers for storing type-heterogeneous tabular data of any type, and encapsulates feature extra

The Alan Turing Institute 25 Mar 14, 2022
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
sqldf for pandas

pandasql pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar

yhat 1.2k Jan 09, 2023
A pure Python implementation of Apache Spark's RDD and DStream interfaces.

pysparkling Pysparkling provides a faster, more responsive way to develop programs for PySpark. It enables code intended for Spark applications to exe

Sven Kreiss 254 Dec 06, 2022
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second πŸš€

What is Vaex? Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular data

vaex io 7.7k Jan 01, 2023
Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs Β» Live notebook Β· Issues Β· Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

Databricks 3.2k Jan 04, 2023