Port of dplyr and other related R packages in python, using pipda.

Related tags

Data Analysisdatar
Overview

datar

Port of dplyr and other related R packages in python, using pipda.

Pypi Github Building Docs and API Codacy Codacy coverage

Documentation | Reference Maps | Notebook Examples | API | Blog

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from the original packages. So that minimal effort is needed for those who are familar with those R packages to transition to python.

Installtion

pip install -U datar
# to make sure dependencies to be up-to-date
# pip install -U varname pipda datar

datar requires python 3.7.1+ and is backended by pandas (1.2+).

Example usage

0 0 zero 0 1 1 one 1 2 2 two 2 3 3 three 3 """ df >> mutate(z=if_else(f.x>1, 1, 0)) """# output: x y z 0 0 zero 0 1 1 one 0 2 2 two 1 3 3 three 1 """ df >> filter(f.x>1) """# output: x y 0 2 two 1 3 three """ df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1) """# output: x y z 0 2 two 1 1 3 three 1 """">
from datar import f
from datar.dplyr import mutate, filter, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter, if_else, tibble

df = tibble(
    x=range(4),
    y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
        x        y       z
  
              
               
               

                0       0     zero       0

                1       1      one       1

                2       2      two       2

                3       3    three       3

                """


                df 
                >> 
                mutate(
                z
                =
                if_else(
                f.
                x
                >
                1, 
                1, 
                0))

                """# output:

                        x        y       z

                  
                  
                   
                    
                    0 0 zero 0 
                    1 1 one 0 
                    2 2 two 1 
                    3 3 three 1 
                    """ 
                    df 
                    >> 
                    filter(
                    f.
                    x
                    >
                    1) 
                    """# output: 
                     x y 
                     
                      
                       0 2 two 1 3 three """ df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1) """# output:  x y z  
                         
                          
                           
                           0 2 two 1 
                           1 3 three 1 
                           """  
                           
> ggplot(aes(x='x', y='y')) + theme_classic() + geom_line(aes(color='sign'), size=1.2))">
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar.base import sin, pi
from plotnine import ggplot, aes, geom_line, theme_classic

df = tibble(x=numpy.linspace(0, 2*pi, 500))
(df >>
  mutate(y=sin(f.x), sign=if_else(f.y>=0, "positive", "negative")) >>
  ggplot(aes(x='x', y='y')) +
  theme_classic() +
  geom_line(aes(color='sign'), size=1.2))

example

# easy to integrate with other libraries
# for example: klib
import klib
from pipda import register_verb
from datar.datasets import iris
from datar.dplyr import pull

dist_plot = register_verb(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()

example

CLI interface

See datar-cli

Example:

❯ datar import table2 | datar head
       country    year        type      count
      <object> <int64>    <object>    <int64>
0  Afghanistan    1999       cases        745
1  Afghanistan    1999  population   19987071
2  Afghanistan    2000       cases       2666
3  Afghanistan    2000  population   20595360
4       Brazil    1999       cases      37737
5       Brazil    1999  population  172006362
0 Afghanistan 1999 cases 1490 1 Afghanistan 1999 population 39974142 2 Afghanistan 2000 cases 2666 3 Afghanistan 2000 population 20595360 4 Brazil 1999 cases 75474 5 Brazil 1999 population 344012724 6 Brazil 2000 cases 80488 7 Brazil 2000 population 174504898 8 China 1999 cases 424516 9 China 1999 population 2545830544 10 China 2000 cases 213766 11 China 2000 population 1280428583">
❯ datar import table2 | \
    datar mutate --count "if_else(f.year==1999, f.count*2, f.count)"
        country    year        type       count
       <object> <int64>    <object>     <int64>
0   Afghanistan    1999       cases        1490
1   Afghanistan    1999  population    39974142
2   Afghanistan    2000       cases        2666
3   Afghanistan    2000  population    20595360
4        Brazil    1999       cases       75474
5        Brazil    1999  population   344012724
6        Brazil    2000       cases       80488
7        Brazil    2000  population   174504898
8         China    1999       cases      424516
9         China    1999  population  2545830544
10        China    2000       cases      213766
11        China    2000  population  1280428583
Comments
  • Any way to stop the re package  being overwritten?

    Any way to stop the re package being overwritten?

    re is needed to do regular expressions. Then re has to be imported after the datar.

    from datar.all import *
    import re
    

    I always import re first.Then id doesn't work after being overwritten. Sometimes I use re in function like this :

    def test(x,y) 
       re.sub(..)
       re.replace(..)
       return ...
    
    
    question 
    opened by antonio-yu 27
  • f.duplicated() not working in filter

    f.duplicated() not working in filter

    Sometimes I wanna keep all the duplicated rows. While in pandas, done like this mtcarss[mtcars.duplicated(keep=False)] In datar, it does not work.

    from datar.all import * 
    from datar.datasets import mtcars
    
    mtcars >> select('cyl','hp','gear','disp')>> filter(f.duplicated(keep=False))
    

    But in the follow two ways,it works.

    # 1  f.series 
    
    mtcars >> select('cyl','hp','gear','disp')>> filter(f.cyl.duplicated(keep=False))
    
    # 2 select all the columns 
    
    mtcars >> select('cyl','hp','gear','disp')>> filter(f['cyl'].duplicated(keep=False))
    

    It seems that only series can be passed to the filter

    optim 
    opened by antonio-yu 27
  • feature request: have simple way to create functions for use inside verbs that are unaware of groupedness

    feature request: have simple way to create functions for use inside verbs that are unaware of groupedness

    Discussed in https://github.com/pwwang/datar/discussions/136

    Originally posted by ftobin August 26, 2022 Older versions of datar used pipda with a fairly easy to use way of creating vectorized functions:

    @register_func(None, context=Context.EVAL)
    def weighted_mean(
        x: NumericOrIter,
        w: NumericOrIter = None,
        na_rm: bool = False,
    ) 
    

    From what I gather from the current methodology (see in the current weighted_mean()), there are multiple functions that needed to handle grouped/non-grouped versions of the dataframe.

    What would be great if there if there an approach that allows me to be "dplyr-ish" and just have a vectorized function that is unaware of groupedness, similar to how functions used inside of dplyr functions can work over grouped and non-grouped tibbles.

    enhancement 
    opened by pwwang 24
  • piping to verbs is not passing the dataframe argument

    piping to verbs is not passing the dataframe argument

    Piping into verbs like mutate seems to be broken in 0.9.0. Passing in the dataframe directly works, but still generates a warning.

    >>> import datar.all
    >>> import datar.datasets
    >>> datar.datasets.mtcars >> datar.all.mutate(x = 3)
    /home/bizdev/.pyenv/versions/3.9.13/lib/python3.9/site-packages/pipda/utils.py:68: VerbCallingCheckWarning: Failed to detect AST node calling `mutate`, assuming a normal call.
      warnings.warn(
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/bizdev/.pyenv/versions/3.9.13/lib/python3.9/site-packages/pipda/verb.py", line 216, in __call__
        raise TypeError(f"{self.__name__}() missing at least one argument.")
    TypeError: mutate() missing at least one argument.
    >>> datar.__version__
    '0.9.0'
    >>> import pipda
    >>> pipda.__version__
    '0.7.6'
    
    >>> datar.all.mutate(datar.datasets.mtcars, x = 3)
    /home/bizdev/.pyenv/versions/3.9.13/lib/python3.9/site-packages/pipda/utils.py:68: VerbCallingCheckWarning: Failed to detect AST node calling `mutate`, assuming a normal call.
      warnings.warn(
                              mpg     cyl      disp      hp      drat        wt      qsec      vs      am    gear    carb       x
                        <float64> <int64> <float64> <int64> <float64> <float64> <float64> <int64> <int64> <int64> <int64> <int64>
    Mazda RX4                21.0       6     160.0     110      3.90     2.620     16.46       0       1       4       4       3
    Mazda RX4 Wag            21.0       6     160.0     110      3.90     2.875     17.02       0       1       4       4       3
    
    
    opened by ftobin 15
  • `datar` not working on RStudio notebooks

    `datar` not working on RStudio notebooks

    I am trying to run code on RStudio R notebooks with datar but the code does not run. I would like to use R notebooks to highlight parts of the datar code and run separately.

    @pwwang what is the issue with running datar on RStudio.

    datar_rstudio

    doc needs more info 
    opened by rleyvasal 14
  • Lubridate commands

    Lubridate commands

    Hi @pwwang , do you have plans to add lubridate commands to datar?

    I am trying to convert the Date column on stock time series data to date time with datar mutate.

    Data from yahoo finance

    import pandas as pd
    from datar.all import *
    
    aapl = pd.read_csv("AAPL.csv")
    
    aapl.Date = pd.to_datetime(aapl.Date.astype('str')) # with pandas this works to change the data type to datetime
    
    aapl = aapl >> mutate(Date = as_datetime(f.Date))  # this does not work and shows error message
    
    aapl = aapl >> mutate(Date = as_date(f.Date))  #this does not work and does not show error message
    
    
    doc enhancement 
    opened by rleyvasal 14
  • ImportError: cannot import name 'VarnameException'

    ImportError: cannot import name 'VarnameException'

    Issue

    I just upgraded datar from 0.4.3 to 0.4.4 with pip install -U datar and got error ImportError: cannot import name 'VarnameException' when importing datar with code from datar.all import *

    below is the error message:

    ImportError: cannot import name 'VarnameException' from 'varname' (C:\Anaconda3\lib\site-packages\varname\__init__.py)
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-2-268be173473d> in <module>
    ----> 1 from datar.all import *
          2 from datar.datasets import mtcars
          3 mtcars  >> mutate(mpl = f.mpg/4)
    
    C:\Anaconda3\lib\site-packages\datar\all.py in <module>
          7 from .base import *
          8 from .base import _warn as _
    ----> 9 from .datar import *
         10 from .dplyr import _no_warn as _
         11 from .dplyr import _builtin_names as _dplyr_builtin_names
    
    C:\Anaconda3\lib\site-packages\datar\datar\__init__.py in <module>
          1 """Specific verbs/funcs from this package"""
          2 
    ----> 3 from .verbs import get, flatten, drop_index
          4 from .funcs import itemgetter
    
    C:\Anaconda3\lib\site-packages\datar\datar\verbs.py in <module>
          9 from ..core.contexts import Context
         10 from ..core.grouped import DataFrameGroupBy
    ---> 11 from ..dplyr import select, slice_
         12 
         13 
    
    C:\Anaconda3\lib\site-packages\datar\dplyr\__init__.py in <module>
          4 from .across import across, c_across, if_all, if_any
          5 from .arrange import arrange
    ----> 6 from .bind import bind_cols, bind_rows
          7 from .context import (
          8     cur_column,
    
    C:\Anaconda3\lib\site-packages\datar\dplyr\bind.py in <module>
         15 from ..core.names import repair_names
         16 from ..core.grouped import DataFrameGroupBy
    ---> 17 from ..tibble import tibble
         18 
         19 
    
    C:\Anaconda3\lib\site-packages\datar\tibble\__init__.py in <module>
          1 """APIs for R-tibble"""
    ----> 2 from .tibble import tibble, tibble_row, tribble, zibble
          3 from .verbs import (
          4     enframe,
          5     deframe,
    
    C:\Anaconda3\lib\site-packages\datar\tibble\tibble.py in <module>
          4 
          5 from pandas import DataFrame
    ----> 6 from varname import argname, varname, VarnameException
          7 
          8 import pipda
    
    ImportError: cannot import name 'VarnameException' from 'varname' (C:\Anaconda3\lib\site-packages\varname\__init__.py)
    

    Expected

    Expect datar to work without issue after upgrading to new versions.

    opened by rleyvasal 13
  • Operator `&` losing index

    Operator `&` losing index

    when the case_when is used the output is not as expected.

    Code to replicate

    mtcars >> mutate(gas_milage = case_when(
                                f.mpg >21  and f.mpg <= 22, "ok",
                                f.mpg >22, "best",
                                True, "other"
    
    ))
    

    Issue: The last line in the output does not meet the f.mpg >21 and f.mpg <= 22, "ok" but it is still applied the "ok" label

    case_when

    Expected result

    Only rows meeting the f.mpg >21 and f.mpg <= 22, "ok" condition labeled "ok", all other rows not meeting any condition should be labeled "other"

    bug 
    opened by rleyvasal 11
  • Function to show the translation to `pandas`

    Function to show the translation to `pandas`

    Hey @pwwang , does datar have a function to display the translation to pandas commands?

    I believe it would be a very useful addition for many reasons, one being that it would help a lot datar and pandas users to work together in a project.

    In R, an analogous function would be dplyr::show_query(), which shows the translation to SQL, like in the example below:

    df <- dbplyr::lazy_frame(mtcars)
    df |> dplyr::select(mpg) |>  dplyr::show_query()
    #> <SQL>
    #> SELECT `mpg`
    #> FROM `df`
    

    Thank you

    opened by GitHunter0 9
  • pipe operator doesn't work in plain python prompt

    pipe operator doesn't work in plain python prompt

    seems that the pipe operator doesn't work when using datar in virtual anaconda environments. here's a snippet of running the example code ran in anaconda prompt:

    from datar.all import f, mutate, filter, if_else, tibble

    [2021-08-03 19:57:12][datar][WARNING] Builtin name "filter" has been overriden by datar.

    df = tibble(
        x=range(4),
        y=['zero', 'one', 'two', 'three']
    )
    

    C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda\utils.py:159: UserWarning: Failed to fetch the node calling the function, call it with the original function. warnings.warn(

    df >> mutate(z=f.x)

    C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda\utils.py:159: UserWarning: Failed to fetch the node calling the function, call it with the original function. warnings.warn( Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda\register.py", line 396, in wrapper return calling_rule(generic, args, kwargs, envdata) File "C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda_calling.py", line 93, in verb_calling_rule3 return generic(*args, **kwargs) File "C:\ProgramData\Anaconda3\envs\conda_start\lib\functools.py", line 872, in wrapper raise TypeError(f'{funcname} requires at least ' TypeError: _not_implemented requires at least 1 positional argument

    mutate(df, z=f.x)

    >         x        y       z
    >   <int64> <object> <int64>
    > 0       0     zero       0
    > 1       1      one       1
    > 2       2      two       2
    > 3       3    three       3
    

    pandas 1.2.3 python 3.8.1

    ps thanks a lot for the package, hopefully the issue can be closed soon :)

    opened by thegiordano 9
  • Piping syntax not running in raw python REPL

    Piping syntax not running in raw python REPL

    @GitHunter0

    I'm just having an issue with multi-line execution of datar code in VScode.

    If a run this line by line, it works smoothly.

    from datar.all import (f, mutate, tibble, fct_infreq, fct_inorder, pull)
    df = tibble(var=['b','b','b','c','a','a'])
    df = df >> mutate(fct_var = f['var'].astype("category"))
    

    However, if I select all the lines and execute them, it returns:

    C:\Users\user_name\miniconda3\envs\py38\lib\site-packages\pipda\utils.py:161: UserWarning: Failed to fetch the node calling the function, call it with the original function.
    
    >>> df = df >> mutate(fct_var = f['var'].astype("category"))
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\user_name\miniconda3\envs\py38\lib\site-packages\pipda\register.py", line 396, in wrapper
        return calling_rule(generic, args, kwargs, envdata)
      File "C:\Users\user_name\miniconda3\envs\py38\lib\site-packages\pipda\_calling.py", line 93, in verb_calling_rule3
        return generic(*args, **kwargs)
      File "C:\Users\user_name\miniconda3\envs\py38\lib\functools.py", line 872, in wrapper
        raise TypeError(f'{funcname} requires at least '
    TypeError: _not_implemented requires at least 1 positional argument
    

    Originally posted by @GitHunter0 in https://github.com/pwwang/datar/discussions/48#discussioncomment-1286583

    doc raw python repl 
    opened by pwwang 8
  • [BUG] relocate() with any_of() does not order column names correctly

    [BUG] relocate() with any_of() does not order column names correctly

    datar version checks

    • [X] I have checked that this issue has not already been reported.

    • [X] I have confirmed this bug exists on the latest version of datar and its backends.

    Issue Description

    When a column (in this case 'x2') is not present, relocate() with any_of() does not respect the order passed.

    import datar.all as d
    from datar import f
    import pandas as pd
    
    df = pd.DataFrame({'x1': [1], 'x4': [4], 'x3': [3]})
    
    df >> d.relocate(d.any_of(['x1', 'x2', 'x3', 'x4']))
    #>        x1      x4      x3
      <int64> <int64> <int64>
    0       1       4       3
    

    Expected Behavior

    The new order should be 'x1', 'x3', 'x4'

    Installed Versions

    python : 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)] datar : 0.11.1 simplug : 0.2.2 executing : 1.2.0 pipda : 0.11.0 datar-numpy : 0.1.0 numpy : 1.23.5 datar-pandas: 0.2.0 pandas : 1.5.2

    bug 
    opened by GitHunter0 0
  • `TibbleGrouped` object is not expandable in VSCode jupyter data viewer

    `TibbleGrouped` object is not expandable in VSCode jupyter data viewer

    When I create grouped data with datar's group_by(), I get an undesirable DataFrameGroupBy element instead of a DataFrame. It is not desirable to have a DataFrameGroupBy in VSCode because the dataframe cannot be clicked on the Variables Window of VSCode to see the entire dataframe, whereas the mtcars can be click to exposed the full dataset because it is a DataFrame.

    The code below creates grouped data in datar and grouped data in pandas; However, datar creates a DataFrameGroupBy instead of a dataframe.

    from datar.all import *
    from datar.datasets import mtcars
    datar_group = mtcars >> group_by(f.hp) >> count()
    pandas_group = mtcars.groupby('hp').size().reset_index().rename(columns = {0:"n"})
    

    datar_group

    3rd-party 
    opened by rleyvasal 9
Releases(0.11.1)
  • 0.11.1(Dec 15, 2022)

    • πŸ› Fix get_versions() not showing plugin versions
    • πŸ› Fix plugins not loaded when loading datasets
    • 🚸 Add github issue templates

    What's Changed

    • 0.11.1 by @pwwang in https://github.com/pwwang/datar/pull/164

    Full Changelog: https://github.com/pwwang/datar/compare/0.11.0...0.11.1

    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Dec 15, 2022)

    • πŸ“ Add testimonials and backend badges in README.md
    • πŸ› Load entrypoint plugins only when APIs are called (#162)
    • πŸ’₯ Rename other module to misc

    What's Changed

    • 0.11.0 by @pwwang in https://github.com/pwwang/datar/pull/163

    Full Changelog: https://github.com/pwwang/datar/compare/0.10.3...0.11.0

    Source code(tar.gz)
    Source code(zip)
  • 0.10.3(Dec 9, 2022)

    • ⬆️ Bump simplug to 0.2.2
    • ✨ Add apis.other.array_ufunc to support numpy ufuncs
    • πŸ’₯ Change hook data_api to load_dataset
    • ✨ Allow backend for c[]
    • ✨ Add DatarOperator.with_backend() to select backend for operators
    • βœ… Add tests
    • πŸ“ Update docs for backend supports

    What's Changed

    • 0.10.3 by @pwwang in https://github.com/pwwang/datar/pull/160

    Full Changelog: https://github.com/pwwang/datar/compare/0.10.2...0.10.3

    Source code(tar.gz)
    Source code(zip)
  • 0.10.2(Dec 7, 2022)

    • πŸš‘ Fix false warning when importing from all

    What's Changed

    • 0.10.2 by @pwwang in https://github.com/pwwang/datar/pull/159

    Full Changelog: https://github.com/pwwang/datar/compare/0.10.1...0.10.2

    Source code(tar.gz)
    Source code(zip)
  • 0.10.1(Dec 5, 2022)

    • Pump simplug to 0.2

    What's Changed

    • 0.10.1 by @pwwang in https://github.com/pwwang/datar/pull/158

    Full Changelog: https://github.com/pwwang/datar/compare/0.10.0...0.10.1

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Dec 2, 2022)

    • Detach backend support, so that more backends can be supported easier in the future
      • numpy backend: https://github.com/pwwang/datar-numpy
      • pandas backend: https://github.com/pwwang/datar-pandas
    • Adopt pipda 0.10 so that functions can be pipeable (#148)
    • Support pandas 1.5+ (#144), but v1.5.0 excluded (see pandas-dev/pandas#48645)

    What's Changed

    • 0.10.0 by @pwwang in https://github.com/pwwang/datar/pull/157

    Full Changelog: https://github.com/pwwang/datar/compare/0.9.1...0.10.0

    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(Oct 13, 2022)

  • 0.9.0(Sep 14, 2022)

    Fixes

    • Fix weighted_mean not handling group variables with NaN values (#137)
    • Fix weighted_mean on NA raising error instead of returning NA (#139)
    • Fix pandas .groupby() used internally not inheriting sort, dropna and observed (#138, #142)
    • Fix mutate/summarise not counting references inside function as used for _keep "used"/"unused"
    • Fix metadata _datar of nested TibbleGrouped not frozen

    Breaking changes

    • Refactor core.factory.func_factory() (#140)
    • Use base.c[...] for range short cut, instead of f[...]
    • Use tibble.fibble() when constructing Tibble inside a verb, instead of tibble.tibble()
    • Make n a keyword-only argument for base.ntile

    Deprecation

    • Deprecate verb_factory, use register_verb from pipda instead
    • Deprecate base.data_context

    Dependences

    • Adopt pipda v0.7.1
    • Remove varname dependency
    • Install pdtypes by default

    What's Changed

    • 0.9.0 by @pwwang in https://github.com/pwwang/datar/pull/143

    Full Changelog: https://github.com/pwwang/datar/compare/0.8.6...0.9.0

    Source code(tar.gz)
    Source code(zip)
  • 0.8.6(Aug 25, 2022)

    • πŸ› Fix weighted_mean not working for grouped data (#133)
    • βœ… Add tests for weighted_mean on grouped data
    • ⚑️ Optimize distinct on existing columns (#128)

    What's Changed

    • πŸ”– 0.8.6 by @pwwang in https://github.com/pwwang/datar/pull/134

    Full Changelog: https://github.com/pwwang/datar/compare/0.8.5...0.8.6

    Source code(tar.gz)
    Source code(zip)
  • 0.8.5(May 23, 2022)

    What's Changed

    • πŸ”– 0.8.5 by @pwwang in https://github.com/pwwang/datar/pull/125
      • πŸ› Fix columns missing after Join by same columns using mapping (https://github.com/pwwang/datar/issues/122)

    Full Changelog: https://github.com/pwwang/datar/compare/0.8.4...0.8.5

    Source code(tar.gz)
    Source code(zip)
  • 0.8.4(May 14, 2022)

    What's Changed

    • 0.8.4 by @pwwang in https://github.com/pwwang/datar/pull/120
      • βž– Add optional deps to extras so they aren't installed by default
      • 🎨 Give better message when optional packages not installed

    Full Changelog: https://github.com/pwwang/datar/compare/0.8.3...0.8.4

    Source code(tar.gz)
    Source code(zip)
  • 0.8.3(May 13, 2022)

  • 0.8.2(May 10, 2022)

    • ♻️ Move glimpse to dplyr (as glimpse is a tidyverse-dplyr API)
    • πŸ› Fix glimpse() output not rendering in qtconsole (#117)
    • πŸ› Fix base.match() for pandas 1.3.0
    • πŸ› Allow base.match() to work with grouping data (#115)
    • πŸ“Œ Use rtoml (python-simpleconf) instead of toml (See https://github.com/pwwang/toml-bench)
    • πŸ“Œ Update dependencies
    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(Apr 19, 2022)

  • 0.8.0(Apr 12, 2022)

    • ✨ Support base.glimpse() (#107, machow/siuba#409)
    • πŸ› Register base.factor() and accept grouped data (#108)
    • ✨ Allow configuration file to save default options
    • πŸ’₯ Replace option warn_builtin_names with import_names_conflict (#73)
    • 🩹 Attach original __module__ to func_factory registed functions
    • ⬆️ Bump pipda to 0.5.9
    Source code(tar.gz)
    Source code(zip)
  • 0.7.2(Apr 7, 2022)

    • ✨ Allow tidyr.unite() to unite multiple columns into a list, instead of join them (#105)
    • 🩹 Fix typos in argument names of tidyr.pivot_longer() (#104)
    • πŸ› Fix base.sprintf() not working with Series (#102)
    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(Mar 28, 2022)

    • πŸ› Fix settingwithcopywarning in tidyr.pivot_wider()
    • πŸ“Œ Pin deps for docs
    • πŸ’š Don't upload coverage in PR
    • πŸ“ Fix typos in docs (#99, #100) (Thanks to @pdwaggoner)
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Mar 24, 2022)

    • ✨ Support modin as backend :kissing_heart:
    • ✨ Add _return argument for datar.options()
    • πŸ› Fix tidyr.expand() when nesting(f.name) as argument
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Mar 23, 2022)

    Breaking changes

    • 🩹 Make base.ntile() labels 1-based (#92)

    Fixes

    • πŸ› Fix order_by argument for dplyr.lead-lag

    Enhancements

    • πŸš‘ Allow base.paste/paste0() to work with grouped data
    • 🩹 Change dtypes of base.letters/LETTERS/month_abb/month_name

    Housekeeping

    • πŸ“ Update and fix reference maps
    • πŸ“ Add environment.yml for binder to work
    • πŸ“ Update styles for docs
    • πŸ“ Update styles for API doc in notebooks
    • πŸ“ Update README for new description about the project and add examples from StackOverflow
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Mar 16, 2022)

    • ✨ Allow base.c() to handle groupby data
    • πŸš‘ Allow base.diff() to work with groupby data
    • ✨ Allow forcats.fct_inorder() to work with groupby data
    • ✨ Allow base.rep()'s arguments length and each to work with grouped data
    • ✨ Allow base.c() to work with grouped data
    • ✨ Allow base.paste()/base.paste0() to work with grouped data
    • πŸ› Force &/| operators to return boolean data
    • πŸš‘ Fix base.diff() not keep empty groups
    • πŸ› Fix recycling non-ordered grouped data
    • 🩹 Fix dplyr.count()/tally()'s warning about the new name
    • πŸš‘ Make dplyr.n() return groupoed data
    • πŸ› Make dplyr.slice() work better with rows/indices from grouped data
    • 🩹 Make dplyr.ntile() labels 1-based
    • ✨ Add datar.attrgetter(), datar.pd_str(), datar.pd_cat() and datar.pd_dt()
    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Mar 12, 2022)

    • πŸš‘ Fix #87 boolean operator losing index
    • πŸš‘ Fix false alarm from rename()/relocate() for missing grouping variables (#89)
    • ✨ Add base.diff()
    • πŸ“ [doc] Update/Fix doc for case_when (#87)
    • πŸ“ [doc] Fix links in reference map
    • πŸ“ [doc] Update docs for dplyr.base
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Mar 9, 2022)

  • 0.6.0(Mar 7, 2022)

    • Adopt pipda 0.5.7
    • Reimplement the split-apply-combine rule to solve all performance issues
    • Drop support for pandas v1.2, require pandas v1.3+
    • Remove all base0_ options and all indices are now 0-based, except base.seq(), ranks and their variants
    • Remove messy type annotations for now, will add them back in the future
    • Move implementation of data type display for frames in terminal and notebook to pdtypes package
    • Change all arguments end with "_" to arguments start with it to avoid confusion
    • Move module datar.stats to datar.base.stats
    • Default all na_rm arguments to True
    • Rename all ptype arguments for tidyr verbs into dtypes

    See more changes: https://pwwang.github.io/datar/CHANGELOG/#060

    Source code(tar.gz)
    Source code(zip)
  • 0.5.6(Feb 3, 2022)

  • 0.5.5(Dec 28, 2021)

  • 0.5.4(Oct 21, 2021)

  • 0.5.3(Oct 5, 2021)

    • ⚑️ Optimize dplyr.arrange when data are series from the df itself
    • πŸ› Fix sub-df order of apply for grouped df (#63)
    • πŸ“ Update doc for argument by for join functions (#62)
    • πŸ› Fix mean() with option na_rm=False does not work (#65)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.2(Sep 22, 2021)

    More of a maintenance release.

    • πŸ”§ Add metadata for datasets
    • πŸ”Š Send logs to stderr, instead of stdout
    • πŸ“ŒPin dependency versions
    • 🚨 Switch linter to flake8
    • πŸ“ Update some docs to fit datar-cli
    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Sep 16, 2021)

    • Add documentation about "blind" environment (#45, #54, #55)
    • Change base.as_date() to return pandas datetime types instead python datetime types (#56)
    • Add base.as_pd_date() to be an alias of pandas.to_datetime() (#56)
    • Expose trimws to datar.all (#58)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Sep 3, 2021)

    Added:

    • Added forcats (#51 )
    • Added base.is_ordered(), base.nlevels(), base.ordered(), base.rank(), base.order(), base.sort(), base.tabulate(), base.append(), base.prop_table() and base.proportions()
    • Added gss_cat dataset

    Fixed:

    • Fixed an issue when Collection dealing with numpy.int_

    Enhanced:

    • Added base0_ argument for datar.get()
    • Passed __calling_env to registered functions/verbs when used internally (this makes sure the library to be robust in different environments)
    Source code(tar.gz)
    Source code(zip)
Retentioneering 581 Jan 07, 2023
Leverage Twitter API v2 to analyze tweet metrics such as impressions and profile clicks over time.

Tweetmetric Tweetmetric allows you to track various metrics on your most recent tweets, such as impressions, retweets and clicks on your profile. The

Mathis HAMMEL 29 Oct 18, 2022
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021
TheMachineScraper πŸ±β€πŸ‘€ is an Information Grabber built for Machine Analysis

TheMachineScraper πŸ±β€πŸ‘€ is a tool made purely for analysing machine data for any reason.

doop 5 Dec 01, 2022
An Indexer that works out-of-the-box when you have less than 100K stored Documents

U100KIndexer An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with

Jina AI 7 Mar 15, 2022
An Aspiring Drop-In Replacement for NumPy at Scale

Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the f

Legate 502 Jan 03, 2023
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

πŸ“ˆ Statistical Quality Control πŸ“‰ This repo contains a simple but effective tool made using python which can be used for quality control in statistica

SasiVatsal 8 Oct 18, 2022
Vectorizers for a range of different data types

Vectorizers for a range of different data types

Tutte Institute for Mathematics and Computing 69 Dec 29, 2022
follow-analyzer helps GitHub users analyze their following and followers relationship

follow-analyzer follow-analyzer helps GitHub users analyze their following and followers relationship by providing a report in html format which conta

Yin-Chiuan Chen 2 May 02, 2022
This program analyzes a DNA sequence and outputs snippets of DNA that are likely to be protein-coding genes.

This program analyzes a DNA sequence and outputs snippets of DNA that are likely to be protein-coding genes.

1 Dec 28, 2021
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como cΓ³digo Utuilizando um cluster Kubernetes na Azure IngestΓ£o

Otacilio Filho 4 Jan 23, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Gabriele 3 Jul 05, 2022
Handle, manipulate, and convert data with units in Python

unyt A package for handling numpy arrays with units. Often writing code that deals with data that has units can be confusing. A function might return

The yt project 304 Jan 02, 2023
Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Yongxian (Caroline) Lun 1 Dec 27, 2021
A simple and efficient tool to parallelize Pandas operations on all availableΒ CPUs

PandaralΒ·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara

Manu NALEPA 2.8k Dec 31, 2022