Visualize and compare datasets, target values and associations, with one line of code.

Overview

v v

Sweetviz Logo

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!

Features

Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.

The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.

Usage and parameters are described below, you can also find an article describing its features in depth and see examples in action HERE.

Sweetviz development is still ongoing! Please let me know if you run into any data, compatibility or install issues! Thank you for reporting any BUGS in the issue tracking system here, and I welcome your feedback and questions on usage/features in the brand-new GitHub "Discussions" tab right here!.

Examples

Example report using the Titanic dataset

Article describing its features in depth

Features

  • Target analysis
    • Shows how a target value (e.g. "Survived" in the Titanic dataset) relates to other features
  • Visualize and compare
    • Distinct datasets (e.g. training vs test data)
    • Intra-set characteristics (e.g. male versus female)
  • Mixed-type associations
    • Sweetviz integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient) and categorical-numerical (correlation ratio) datatypes seamlessly, to provide maximum information for all data types.
  • Type inference
    • Automatically detects numerical, categorical and text features, with optional manual overrides
  • Summary information
    • Type, unique values, missing values, duplicate rows, most frequent values
    • Numerical analysis:
      • min/max/range, quartiles, mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness

Upgrading

Some people have experienced mixed results behavior upgrading through pip. To update to the latest from an existing install, it is recommended to pip uninstall sweetviz first, then simply install.

Installation

Sweetviz currently supports Python 3.6+ and Pandas 0.25.3+. Reports are output using the base "os" module, so custom environments such as Google Colab which require custom file operations are not yet supported, although I am looking into a solution.

Using pip

The best way to install sweetviz (other than from source) is to use pip:

pip install sweetviz

Installation issues & fixes

In some rare cases, users have reported errors such as ModuleNotFoundError: No module named 'sweetviz' and AttributeError: module 'sweetviz' has no attribute 'analyze'. In those cases, we suggest the following:

  • Make sure none of your scripts are named sweetviz.py, as that interferes with the library itself. Delete or rename that script (and any associated .pyc files), and try again.
  • Try uninstalling the library using pip uninstall sweetviz, then reinstalling
  • The issue may stem from using multiple versions of Python, or from OS permissions. The following Stack Overflow articles have resolved many of these issues reported: Article 1, Article 2, Article 3
  • If all else fails, post a bug issue here on github. Thank you for taking the time, it may help resolve the issue for you and everyone else!

Basic Usage

Creating a report is a quick 2-line process:

  1. Create a DataframeReport object using one of: analyze(), compare() or compare_intra()
  2. Use a show_xxx() function to render the report. You can now use either html or notebook report options, as well as scaling: (more info on these options below)

Report_Show_Options

Step 1: Create the report

There are 3 main functions for creating reports:

  • analyze(...)
  • compare(...)
  • compare_intra(...)

Analyzing a single dataframe (and its optional target feature)

To analyze a single dataframe, simply use the analyze(...) function, then the show_html(...) function:

import sweetviz as sv

my_report = sv.analyze(my_dataframe)
my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

When run, this will output a 1080p widescreen html app in your default browser: Widescreen demo

Optional arguments

The analyze() function can take multiple other arguments:

analyze(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],
            target_feat: str = None,
            feat_cfg: FeatureConfig = None,
            pairwise_analysis: str = 'auto'):
  • source: Either the data frame (as in the example) or a tuple containing the data frame and a name to show in the report. e.g. my_df or [my_df, "Training"]
  • target_feat: A string representing the name of the feature to be marked as "target". Only BOOLEAN and NUMERICAL features can be targets for now.
  • feat_cfg: A FeatureConfig object representing features to be skipped, or to be forced a certain type in the analysis. The arguments can either be a single string or list of strings. Parameters are skip, force_cat, force_num and force_text. The "force_" arguments override the built-in type detection. They can be constructed as follows:
feature_config = sv.FeatureConfig(skip="PassengerId", force_text=["Age"])
  • pairwise_analysis: Correlations and other associations can take quadratic time (n^2) to complete. The default setting ("auto") will run without warning until a data set contains "association_auto_threshold" features. Past that threshold, you need to explicitly pass the parameter pairwise_analysis="on" (or ="off") since processing that many features would take a long time. This parameter also covers the generation of the association graphs (based on Drazen Zaric's concept):

Pairwise sample

Comparing two dataframes (e.g. Test vs Training sets)

To compare two data sets, simply use the compare() function. Its parameters are the same as analyze(), except with an inserted second parameter to cover the comparison dataframe. It is recommended to use the [dataframe, "name"] format of parameters to better differentiate between the base and compared dataframes. (e.g. [my_df, "Train"] vs my_df)

my_report = sv.compare([my_dataframe, "Training Data"], [test_df, "Test Data"], "Survived", feature_config)

Comparing two subsets of the same dataframe (e.g. Male vs Female)

Another way to get great insights is to use the comparison functionality to split your dataset into 2 sub-populations.

Support for this is built in through the compare_intra() function. This function takes a boolean series as one of the arguments, as well as an explicit "name" tuple for naming the (true, false) resulting datasets. Note that internally, this creates 2 separate dataframes to represent each resulting group. As such, it is more of a shorthand function of doing such processing manually.

my_report = sv.compare_intra(my_dataframe, my_dataframe["Sex"] == "male", ["Male", "Female"], feature_config)

Step 2: Show the report

Once you have created your report object (e.g. my_report in the examples above), simply pass it into one of the two `show' functions:

show_html()

show_html(  filepath='SWEETVIZ_REPORT.html', 
            open_browser=True, 
            layout='widescreen', 
            scale=None)

show_html(...) will create and save an HTML report at the given file path. There are options for:

  • layout: Either 'widescreen' or 'vertical'. The widescreen layout displays details on the right side of the screen, as the mouse goes over each feature. The new (as of 2.0) vertical layout is more compact horizontally and enables expanding each detail area upon clicking.
  • scale: Use a floating-point number (scale= 0.8 or None) to scale the entire report. This is very useful to fit reports to any output.
  • open_browser: Enables the automatic opening of a web browser to show the report. Since under some circumstances this is not desired (or causes issues with some IDE's), you can disable it here.

show_notebook()

show_notebook(  w=None, 
                h=None, 
                scale=None,
                layout='widescreen',
                filepath=None)

show_notebook(...) is new as of 2.0 and will embed an IFRAME element showing the report right inside a notebook (e.g. Jupyter, Google Colab, etc.).

Note that since notebooks are generally a more constrained visual environment, it is probably a good idea to use custom width/height/scale values (w, h, scale) and even set custom default values in an INI override (see below). The options are:

  • w (width): Sets the width of the output window for the report (the full report may not fit; use layout and/or scale for the report itself). Can be as a percentage string (w="100%") or number of pixels (w=900).
  • h (height): Sets the height of the output window for the report. Can be as a number of pixels (h=700) or "Full" to stretch the window to be as tall as all the features (h="Full").
  • scale: Same as for show_html, above.
  • layout: Same as for show_html, above.
  • scale: Same as for show_html, above.
  • filepath: An optional output HTML report.

Customizing defaults: the Config file

The package contains an INI file for configuration. You can override any setting by providing your own then calling this before creating a report:

sv.config_parser.read("Override.ini")

IMPORTANT #1: it is best to load overrides before any other command, as many of the INI options are used in the report generation.

IMPORTANT #2: always set the header (e.g. [General] before the value, otherwise there will be an error).

Most useful config overrides

You can look into the file sweetviz_defaults.ini for what can be overriden (warning: much of it is a work in progress and not well documented), but the most useful overrides are as follows.

Default report layout, size

Override any of these (by putting them in your own INI, again do not forget the header), to avoid having to set them every time you do a "show" command:

Important: note the double '%' if specifying a percentage

[Output_Defaults]
html_layout = widescreen
html_scale = 1.0
notebook_layout = vertical
notebook_scale = 0.9
notebook_width = 100%%
notebook_height = 700
New: Chinese, Japanse, Korean (CJK) character support
[General]
use_cjk_font = 1 

Will switch the font in the graphs to use a CJK-compatible font. Although this font is not as compact, it will get rid of any warnings and "unknown character" symbols for these languages.

Remove Sweetviz logo
[Layout]
show_logo = 0

Will remove the Sweetviz logo from the top of the page.

Correlation/Association analysis

A major source of insight and unique feature of Sweetviz' associations graph and analysis is that it unifies in a single graph (and detail views):

  • Numerical correlation (between numerical features)
  • Uncertainty coefficient (for categorical-categorical)
  • Correlation ratio (for categorical-numerical) Pairwise sample

Squares represent categorical-featured-related variables and circles represent numerical-numerical correlations. Note that the trivial diagonal is left empty, for clarity.

IMPORTANT: categorical-categorical associations (provided by the SQUARES showing the uncertainty coefficient) are ASSYMMETRICAL, meaning that each row represents how much the row title (on the left) gives information on each column. For example, "Sex", "Pclass" and "Fare" are the elements that give the most information on "Survived".

For the Titanic dataset, this information is rather symmetrical but it is not always the case!

Correlations are also displayed in the detail section of each feature, with the target value highlighted when applicable. e.g.:

Associations detail

Finally, it is worth noting these correlation/association methods shouldn’t be taken as gospel as they make some assumptions on the underlying distribution of data and relationships. However they can be a very useful starting point.

Troubleshooting / FAQ

  • Installation issues

Please see the "Installation issues & fixes" section at the top of this document

  • Asian characters, "RuntimeWarning: Glyph ### missing from current font"

See section above regarding CJK characters support. If you find the need for additional character types, definitely post a request in the issue tracking system.

  • ...any other issues

Development is ongoing so absolutely feel free to report any issues and/or suggestions in the issue tracking system here or in our forum (you should be able to log in with your Github account!)

Contribute

This is my first open-source project! I built it to be the most useful tool possible and help as many people as possible with their data science work. If it is useful to you, your contribution is more than welcome and can take many forms:

1. Spread the word!

A STAR here on GitHub, and a Twitter or Instagram post are the easiest contribution and can potentially help grow this project tremendously! If you find this project useful, these quick actions from you would mean a lot and could go a long way.

Kaggle notebooks/posts, Medium articles, YouTube video tutorials and other content take more time but will help all the more!

2. Report bugs & issues

I expect there to be many quirks once the project is used by more and more people with a variety of new (& "unclean") data. If you found a bug, please open a new issue here.

3. Suggest and discuss usage/features

To make Sweetviz as useful as possible we need to hear what you would like it to do, or what it could do better! Head on to our Discourse server and post your suggestions there; no login required!.

4. Contribute to the development

I definitely welcome the help I can get on this project, simply get in touch on the issue tracker and/or our Discourse forum.

Please note that after a hectic development period, the code itself right now needs a bit of cleanup. :)

Special thanks & related materials

I want Sweetviz to be a hub of the best of what's out there, a way to get the most valuable information and visualization, without reinventing the wheel.

As such, I want to point some of those great resources that were inspiring and integrated into Sweetviz:

And of course, very special thanks to everyone who have contributed on Github, through reports, feedback and commits!

Comments
  • ValueError: index must be monotonic increasing or decreasing

    ValueError: index must be monotonic increasing or decreasing

    I am able to generate the same report on Titanic data as in the Medium articles. However, when I try to test the Boston housing data, I get the errors as below:

    ValueError Traceback (most recent call last) ~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in get_slice_bound(self, label, side, kind) 5166 try: -> 5167 return self._searchsorted_monotonic(label, side) 5168 except ValueError:

    ~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side) 5127 -> 5128 raise ValueError("index must be monotonic increasing or decreasing") 5129

    ValueError: index must be monotonic increasing or decreasing

    During handling of the above exception, another exception occurred:

    KeyError Traceback (most recent call last) in ----> 1 my_report = sv.analyze(dfx)

    Any ideas on the error?

    Thanks.

    bug 
    opened by phillip1029 18
  • FloatingPointError: divide by zero encountered in true_divide

    FloatingPointError: divide by zero encountered in true_divide

    I ran into a "FloatingPointError: divide by zero encountered in true_divide" in the pairwise feature portion of the code. Apparently there was a divide by zero issue in the cov part of the underlying code.

    The trace of the error is as follows: file: sv_public.py, line 13, in analyze, pairwise_analysis, feat_cfg) file: dataframe_report.py, line 243, in init, self.process_associations(features_to_process, source_target_series, compare_target series file: dataframe_report.py, line 423, in process_associations, feature.source.corr(other.source, method='pearson') file: series.py line 2254, in corr, this.values, other.values, method=method, min_periods=min_periods file: nanops.py, line 69, in _f, return f(*args,*kwargs) file: nanops.py, line 1240, in nancorr, return f(a,b) file: nanops.py, line 1256, in _pearson, return np.corrcoef(a,b)[0,1] file: <array_function internals>, line 6, in corrcoef file: function_base.py,line 2526 in corrcoef, c=cov(x,y,rowvar) file: <array_function internals>, line 6, in cov file: function_base.py, line 2455, in cov, c=np.true_divide(1,fact)

    My dataframe had some empty strings where nulls should have been, but there were other columns that had similar features, but they never threw this error.

    bug 
    opened by jmcneal84 17
  • Integer feature with values 1 and 2 cannot be handled as categorical?

    Integer feature with values 1 and 2 cannot be handled as categorical?

    Hey guys, I'm getting an error when handling integer columns but the error message is not very clear for me to understand what is going on. So far it looks like a bug to me. Here it goes.

    We start by importing basic stuff and generate a pandas dataframe with 4 columns containing random real numbers, plus an integer column named 'target' with values 1 and 2.

    import sweetviz as sv
    import pandas as pd
    import numpy as np
    
    np.random.seed(42)
    np_data = np.random.randn(10, 4)
    df = pd.DataFrame(np_data, columns=['col1', 'col2', 'col3', 'col4'])
    df['target'] = 1.0
    df['target'].iloc[5:] = 2.
    df['target'] = df['target'].astype(int)
    

    Taking a look at the original types of the dataframe (df.dtypes), we have as a result: col1 float64 col2 float64 col3 float64 col4 float64 target int32 dtype: object

    Error: TypeError

    compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])
    compareReport.show_html()
    

    gives this message:

    TypeError                                 Traceback (most recent call last)
    <ipython-input-54-8e3e89553904> in <module>
          1 #feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
    ----> 2 compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])#, feat_cfg=feature_config, target_feat='target')
          3 compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"
    
    ~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\sv_public.py in compare_intra(source_df, condition_series, names, target_feat, feat_cfg, pairwise_analysis)
         42     report = sweetviz.DataframeReport([data_true, names[0]], target_feat,
         43                                       [data_false, names[1]],
    ---> 44                                       pairwise_analysis, feat_cfg)
         45     return report
         46 
    
    ~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
        215             # start = time.perf_counter()
        216             self.progress_bar.set_description(':' + f.source.name + '')
    --> 217             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
        218             self.progress_bar.update(1)
        219             # print(f"DONE FEATURE------> {f.source.name}"
    
    ~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\series_analyzer.py in analyze_feature_to_dictionary(to_process)
         92         compare_type = determine_feature_type(to_process.compare,
         93                                               to_process.compare_counts,
    ---> 94                                               returned_feature_dict["type"], "COMPARED")
         95         if compare_type != FeatureType.TYPE_ALL_NAN and \
         96             source_type != FeatureType.TYPE_ALL_NAN:
    
    ~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\type_detection.py in determine_feature_type(series, counts, must_be_this_type, which_dataframe)
         73             var_type = FeatureType.TYPE_TEXT
         74         else:
    ---> 75             raise TypeError(f"Cannot force series '{series.name}' in {which_dataframe} to be from its type {var_type} to\n"
         76                             f"DESIRED type {must_be_this_type}. Check documentation for the possible coercion possibilities.\n"
         77                             f"This can be solved by changing the source data or is sometimes caused by\n"
    
    TypeError: Cannot force series 'target' in COMPARED to be from its type FeatureType.TYPE_CAT to
    DESIRED type FeatureType.TYPE_BOOL. Check documentation for the possible coercion possibilities.
    This can be solved by changing the source data or is sometimes caused by
    a feature type mismatch between source and compare dataframes.
    

    If I explicitly supply the feat_cfg argument the result is the same.

    feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
    compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"], feat_cfg=feature_config)
    compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"
    

    However, if I add 10 to the 'target' column (it will now have 11 and 12 as values), the report is generated without errors. Am I missing something or it is indeed a bug?

    bug 
    opened by shgo 11
  • cast key to string

    cast key to string

    In some cases the key is a boolean value not a string. A keyerror is produced when a boolean value appears in key. Reference #42

    I was able to recreate the issue as user described and was able to fix by casting key as string. It seems like the key should always be a string.

    opened by a246530 10
  • TypeError: DatetimeIndex cannot perform the operation sum

    TypeError: DatetimeIndex cannot perform the operation sum

    I've a dataset which has date_time column of the format: 2020-07-12 11:37:25

    I get the following error:

    :date_time:                        |███                  | [ 14%]   00:00  -> (00:03 left)
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-77-cbd387f7f43e> in <module>()
          1 #analyzing the dataset
    ----> 2 techglares_report = sv.analyze(df)
    
    6 frames
    /usr/local/lib/python3.6/dist-packages/sweetviz/sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
         11             pairwise_analysis: str = 'auto'):
         12     report = sweetviz.DataframeReport(source, target_feat, None,
    ---> 13                                       pairwise_analysis, feat_cfg)
         14     return report
         15 
    
    /usr/local/lib/python3.6/dist-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
        214             # start = time.perf_counter()
        215             self.progress_bar.set_description(':' + f.source.name + '')
    --> 216             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
        217             self.progress_bar.update(1)
        218             # print(f"DONE FEATURE------> {f.source.name}"
    
    /usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
         90 
         91     # Establish base stats
    ---> 92     add_series_base_stats_to_dict(to_process.source, to_process.source_counts, returned_feature_dict)
         93     if to_process.compare is not None:
         94         add_series_base_stats_to_dict(to_process.compare, to_process.compare_counts, compare_dict)
    
    /usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in add_series_base_stats_to_dict(series, counts, updated_dict)
         42     base_stats = updated_dict["base_stats"]
         43     num_total = counts["num_rows_total"]
    ---> 44     num_zeros = series[series == 0].sum()
         45     non_nan = counts["num_rows_with_data"]
         46     base_stats["total_rows"] = num_total
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, min_count, **kwargs)
      11180             skipna=skipna,
      11181             numeric_only=numeric_only,
    > 11182             min_count=min_count,
      11183         )
      11184 
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
       3901             numeric_only=numeric_only,
       3902             filter_type=filter_type,
    -> 3903             **kwds,
       3904         )
       3905 
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/base.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
       1058         if func is None:
       1059             raise TypeError(
    -> 1060                 f"{type(self).__name__} cannot perform the operation {name}"
       1061             )
       1062         return func(skipna=skipna, **kwds)
    
    TypeError: DatetimeIndex cannot perform the operation sum
    

    I'm running sweetviz on Google Colab.

    Is there any way to solve this error?

    opened by vidyap-xgboost 9
  • Charset utf-8

    Charset utf-8

    First of all it’s awesome! Many thanks for your effort on data visualization! There is a small issue maybe, the html report lacks a meta tag showing the charset as “utf-8”; by adding it, the report can correctly show the MBCS characters and will catch eyes of more global analysts. Thanks again! Hope this project goes better!

    opened by 95Key 9
  • show_html() doesn't shows the output jupyter notebook / lab

    show_html() doesn't shows the output jupyter notebook / lab

    Hi there,

    I try to use sweetviz in local:

    • Ubuntu 20.04

    And in anaconda enterprise:

    • K8s with centOS

    Both lead to the same issue. The display of the output in jupyter lab and notebook isn't visible.

    Local:

    • Jupyter_lab=2.0 AE:
    • jupyter=1.0.0
    • jupyter_client=5.3.3
    • jupyter_console=6.0.0
    • jupyter_core=4.5.0
    • Jupyter_lab=1.1.3
    • ipython=7.8.0

    The report has been generated but not display.

    How to fix it?

    Best

    report output 
    opened by Christophe-pere 8
  • show_html Generate a Strange Layout of Analysis

    show_html Generate a Strange Layout of Analysis

    Hey there, this is a great package and it is pretty handy. However, I run into a strange layout issue when generating the plot.

    sweet = stz.analyze(data) sweet.show_html()

    Above is the code I used, and I attached the result's layout as a png file below.

    1: Would you kindly inform me of the option for layout in show_html()? 2: How I should solve this issue?

    Thank you so much!

    SWEETVIZ_REPORT-html

    bug report output can't repro issue closing as cannot repro and no more reports 
    opened by HaoVJiang 7
  • error Font family ['STIXGeneral'] not found. Falling back to DejaVu Sans. occure

    error Font family ['STIXGeneral'] not found. Falling back to DejaVu Sans. occure

    hi,

    sweetviz does not work for a special table

    i get the following error AAfindfont: Font family ['STIXGeneral'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. ... RecursionError: maximum recursion depth exceeded in comparison

    how can i resolve this ?

    thanks for any hint

    bug 
    opened by fleschgordon 6
  • Error message in pip install sweetviz

    Error message in pip install sweetviz

    Hi, attempting to install sweetviz using pip install sweetviz, but kept encountering following error message (reproduced below) Am using pandas version 1.0.1. Kindly advise, thanks.

    Installing collected packages: importlib-resources, pandas, tqdm, sweetviz Attempting uninstall: pandas Found existing installation: pandas 1.0.1 Uninstalling pandas-1.0.1: ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\users\\65943\\anaconda3\\lib\\site-packages\\~andas\\_libs\\algos.cp37-win_amd64.pyd' Consider using the--useroption or check the permissions.

    opened by AngShengJun 6
  • error in graph_associations.py line 210, ValueError: cannot convert float NaN to integer

    error in graph_associations.py line 210, ValueError: cannot convert float NaN to integer

    Error thrown up during analyze(dataframe), right after :PAIRWISE DONE: and Creating Associations graph... Traceback (most recent call last):

    File "", line 1, in myreport = sv.analyze(df)

    File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\sv_public.py", line 13, in analyze pairwise_analysis, feat_cfg)

    File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\dataframe_report.py", line 246, in init self._association_graphs["all"] = GraphAssoc(self, "all", self._associations)

    File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 165, in init f = corrplot(graph_data, dataframe_report)

    File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 410, in corrplot dataframe_report = dataframe_report

    File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 318, in heatmap cur_size[1] / 2, facecolor=value_to_color(color[index]),

    File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 210, in value_to_color ind = int(val_position * (n_colors - 1)) # target index in the color palette

    ValueError: cannot convert float NaN to integer

    bug 
    opened by cnblevins 6
  • json files.

    json files.

    I am fun of this library. I have used, I would like to use for all EDA, however it is giving error with a dataframe out of json file. After making a dataframe, I didn't think this would be a problem.

    opened by gozdeydd 0
  • iteritems & mad deprecated

    iteritems & mad deprecated

    My code: import sweetviz as sv my_report = sv.analyze(df) my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

    Warnings:

    C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\dataframe_report.py:74: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. all_source_names = [cur_name for cur_name, cur_series in source_df.iteritems()] C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\dataframe_report.py:109: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. filtered_series_names_in_source = [cur_name for cur_name, cur_series in source_df.iteritems()

    C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_cat.py:28: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in category_counts.iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_text.py:19: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in to_process.source_counts["value_counts_without_nan"].iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_text.py:19: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in to_process.source_counts["value_counts_without_nan"].iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_cat.py:28: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in category_counts.iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_numeric.py:25: FutureWarning: The 'mad' method is deprecated and will be removed in a future version. To compute the same result, you may do (df - df.mean()).abs().mean(). stats["mad"] = series.mad() C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_numeric.py:25: FutureWarning: The 'mad' method is deprecated and will be removed in a future version. To compute the same result, you may do (df - df.mean()).abs().mean(). stats["mad"] = series.mad()

    opened by tallgaijin 0
  • sweetviz shows wrong target rate for numerical variable

    sweetviz shows wrong target rate for numerical variable

    I am trying to plot the distribution of a variable and target rate in each of its value, sweetviz shows wrong target rate. Below is the reproducible code.

    import pandas as pd
    import sweetviz as sv
    
    var1 = [0.]*10 + [1.]*10 + [2]*10 + [3]*10
    target = [0]*2 + [1]*8 + [0]*4 +[1]*6 + [0]*8 + [1]*2 + [0]*10
    df = pd.DataFrame({'var1':var1, 'target':target})
    
    fc = sv.FeatureConfig(force_num=['var1'])
    report = sv.analyze([df, 'Train'], target_feat='target', feat_cfg=fc, pairwise_analysis='off')
    report.show_html('report.html')
    report.show_notebook('report.html')
    
    image

    I know that, if var1 is forcefully set to categorical, it shows the correct output. But it is not useful for me, since categorical variables sweetviz charts are not sorted based axis labels, but on the size of category.

    image

    How to make this work, by keep the variable numerical itself?

    opened by shreeprasadbhat 0
  • Add an argument to silence the progress bar

    Add an argument to silence the progress bar

    Is it possible to add an argument to silence the progress bar ?

    We want to use SweetViz in an automatique pipeline and store the report in a database. We already have a lot of logs in our process, hence we would love to get rid of the progress bar logs. We can deactivate tqdm before loading SweetViz, but that would also impact others parts of our process.

    One solution might be to add an argument in DataframeReport.__init__ and set self.progress_bar to a fake logger.

    opened by LexABzH 0
  • Not reading overrrde.ini - to remove logo

    Not reading overrrde.ini - to remove logo

    Hi,

    I have tried to insert the line sv.config_parser.read("override.ini") into my code right after the import after I have set "show_logo = 0" under the layout section but I noticed it is still reading from the default .ini file as opposed to the new one (which I have duplicated from existing one before apply my change).

    If I proceed to set the "show_logo = 0" in teh default ini file, it is working (logo no longer shows), Any advise?

    Thanks.

    opened by thongfam 0
  • Use html correlation heatmap (Associations) instead of picture.

    Use html correlation heatmap (Associations) instead of picture.

    If we have more than 100 features, no label is clear in current correlation map.

    image

    But if we create heatmap by seaborn or just pandas, user can zoom html to see characters clearly.

    image image

    Further more, use html+js can provide hover infomation on heatmap cells.

    opened by PaleNeutron 0
Releases(v2.1.4)
Owner
Francois Bertrand
Francois Bertrand
:art: Diagram as Code for prototyping cloud system architectures

Diagrams Diagram as Code. Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture d

MinJae Kwon 27.5k Dec 30, 2022
Make sankey, alluvial and sankey bump plots in ggplot

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

David Sjoberg 156 Jan 03, 2023
Write python locally, execute SQL in your data warehouse

RasgoQL Write python locally, execute SQL in your data warehouse ≪ Read the Docs · Join Our Slack » RasgoQL is a Python package that enables you to ea

Rasgo 265 Nov 21, 2022
Draw datasets from within Jupyter.

drawdata This small python app allows you to draw a dataset in a jupyter notebook. This should be very useful when teaching machine learning algorithm

vincent d warmerdam 505 Nov 27, 2022
A deceptively simple plotting library for Streamlit

🍅 Plost A deceptively simple plotting library for Streamlit. Because you've been writing plots wrong all this time. Getting started pip install plost

Thiago Teixeira 192 Dec 29, 2022
Exploratory analysis and data visualization of aircraft accidents and incidents in Brazil.

Exploring aircraft accidents in Brazil Occurrencies with aircraft in Brazil are investigated by the Center for Investigation and Prevention of Aircraf

Augusto Herrmann 5 Dec 14, 2021
Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

Jacob Wasserman 502 Dec 28, 2022
Project coded in Python using Pandas to look at changes in chase% for batters facing a pitcher first time through the order vs. thrid time

Project coded in Python using Pandas to look at changes in chase% for batters facing a pitcher first time through the order vs. thrid time

Jason Kraynak 1 Jan 07, 2022
Create a table with row explanations, column headers, using matplotlib

Create a table with row explanations, column headers, using matplotlib. Intended usage was a small table containing a custom heatmap.

4 Aug 14, 2022
HW 2: Visualizing interesting datasets

HW 2: Visualizing interesting datasets Check out the project instructions here! Mean Earnings per Hour for Males and Females My first graph uses data

7 Oct 27, 2021
Visualize tensors in a plain Python REPL using Sparklines

Visualize tensors in a plain Python REPL using Sparklines

Shawn Presser 43 Sep 03, 2022
Color scales in Python for humans

colorlover Color scales for humans IPython notebook: https://plot.ly/ipython-notebooks/color-scales/ import colorlover as cl from IPython.display impo

Plotly 146 Sep 25, 2022
A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece.

COVID-19-Greece A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece. Data sources Data provided by Johns Hopki

Isabelle Viktoria Maciohsek 23 Jan 03, 2023
GD-UltraHack - A Mod Menu for Geometry Dash. Specifically a MegahackV5 clone in Python. Only for Windows

GD UltraHack: The Mod Menu that Nobody asked for. This is a mod menu for the gam

zeo 1 Jan 05, 2022
Some examples with MatPlotLib library in Python

MatPlotLib Example Some examples with MatPlotLib library in Python Point: Run files only in project's directory About me Full name: Matin Ardestani Ag

Matin Ardestani 4 Mar 29, 2022
Generating interfaces(CLI, Qt GUI, Dash web app) from a Python function.

oneFace is a Python library for automatically generating multiple interfaces(CLI, GUI, WebGUI) from a callable Python object. oneFace is an easy way t

NaNg 31 Oct 21, 2022
a python function to plot a geopandas dataframe

Pretty GeoDataFrame A minimum python function (~60 lines) to draw pretty geodataframe. Based on matplotlib, shapely, descartes. Installation just use

haoming 27 Dec 05, 2022
Generate a 3D Skyline in STL format and a OpenSCAD file from Gitlab contributions

Your Gitlab's contributions in a 3D Skyline gitlab-skyline is a Python command to generate a skyline figure from Gitlab contributions as Github did at

Félix Gómez 70 Dec 22, 2022
Example Code Notebooks for Data Visualization in Python

This repository contains sample code scripts for creating awesome data visualizations from scratch using different python libraries (such as matplotli

Javed Ali 27 Jan 04, 2023