Aggregating gridded data (xarray) to polygons

Related tags

Data Analysisxagg
Overview

xagg

Binder

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample code run!

Installation

The easiest way to install xagg is using pip. Beware though - xagg is still a work in progress; I suggest you install it to a virtual environment first (using e.g. venv, or just creating a separate environment in conda for projects using xagg).

pip install xagg

Intro

Science often happens on grids - gridded weather products, interpolated pollution data, night time lights, remote sensing all approximate the continuous real world for reasons of data resolution, processing time, or ease of calculation.

However, living things don't live on grids, and rarely play, act, or observe data on grids either. Instead, humans tend to work on the county, state, township, okrug, or city level; birds tend to fly along complex migratory corridors; and rain- and watersheds follow valleys and mountains.

So, whenever we need to work with both gridded and geographic data products, we need ways of getting them to match up. We may be interested for example what the average temperature over a county is, or the average rainfall rate over a watershed.

Enter xagg.

xagg provides an easy-to-use (2 lines!), standardized way of aggregating raster data to polygons. All you need is some gridded data in an xarray Dataset or DataArray and some polygon data in a geopandas GeoDataFrame. Both of these are easy to use for the purposes of xagg - for example, all you need to use a shapefile is to open it:

import xarray as xr
import geopandas as gpd
 
# Gridded data file (netcdf/climate data)
ds = xr.open_dataset('file.nc')

# Shapefile
gdf = gpd.open_dataset('file.shp')

xagg will then figure out the geographic grid (lat/lon) in ds, create polygons for each pixel, and then generate intersects between every polygon in the shapefile and every pixel. For each polygon in the shapefile, the relative area of each covering pixel is calculated - so, for example, if a polygon (say, a US county) is the size and shape of a grid pixel, but is split halfway between two pixels, the weight for each pixel will be 0.5, and the value of the gridded variables on that polygon will just be the average of both [TO-DO: add visual example of this].

The two lines mentioned before?

import xagg as xa

# Get overlap between pixels and polygons
weightmap = xa.pixel_overlaps(ds,gdf)

# Aggregate data in [ds] onto polygons
aggregated = xa.aggregate(ds,weightmap)

# aggregated can now be converted into an xarray dataset (using aggregated.to_dataset()), 
# or a geopandas geodataframe (using aggregated.to_dataframe()), or directly exported 
# to netcdf, csv, or shp files using aggregated.to_csv()/.to_netcdf()/.to_shp()

Researchers often need to weight your data by more than just its relative area overlap with a polygon (for example, do you want to weight pixels with more population more?). xagg has a built-in support for adding an additional weight grid (another xarray DataArray) into xagg.pixel_overlaps().

Finally, xagg allows for direct exporting of the aggregated data in several commonly used data formats (please open issues if you'd like support for something else!):

  • netcdf
  • csv for STATA, R
  • shp for QGIS, further spatial processing

Best of all, xagg is flexible. Multiple variables in your dataset? xagg will aggregate them all, as long as they have at least lat/lon dimensions. Fields in your shapefile that you'd like to keep? xagg keeps all fields (for example FIPS codes from county datasets) all the way through the final export. Weird dimension names? xagg is trained to recognize all versions of "lat", "Latitude", "Y", "nav_lat", "Latitude_1"... etc. that the author has run into over the years of working with climate data; and this list is easily expandable as a keyword argumnet if needed.

Please contribute - let me know what works and what doesn't, whether you think this is useful, and if so - please share!

Use Cases

Climate econometrics

Many climate econometrics studies use societal data (mortality, crop yields, etc.) at a political or administrative level (for example, counties) but climate and weather data on grids. Oftentimes, further weighting by population or agricultural density is needed.

Area-weighting of pixels onto polygons ensures that aggregating weather and climate data onto polygons occurs in a robust way. Consider a (somewhat contrived) example: an administrative region is in a relatively flat lowlands, but a pixel that slightly overlaps the polygon primarily covers a wholly different climate (mountainous, desert, etc.). Using a simple mask would weight that pixel the same, though its information is not necessarily relevant to the climate of the region. Population-weighting may not always be sufficient either; consider Los Angeles, which has multiple significantly different climates, all with high densities.

xagg allows a simple population and area-averaging, in addition to export functions that will turn the aggregated data into output easily used in STATA or R for further calculations.

Left to do

  • Testing, bug fixes, stability checks, etc.
  • Share widely! I hope this will be helpful to a wide group of natural and social scientists who have to work with both gridded and polygon data!
Comments
  • Speedup for large grids - mod gdf_pixels in create_raster_polgons

    Speedup for large grids - mod gdf_pixels in create_raster_polgons

    In create_raster_polygons, the for loop that assigns individual polygons to gdf_pixels essentially renders xagg unusable for larger high res grids because it goes so slow. Here I propose elimination of the for loop and replacement with a lambda apply. Big improvement for large grids!

    opened by kerriegeil 10
  • dot product implementation

    dot product implementation

    Starting this pull request. This is code that implements a dot-product approach for doing the aggregation. See #2

    This works for my application but I have not run the tests on this yet.

    opened by jsadler2 9
  • work for one geometry?

    work for one geometry?

    I ran into IndexError: single positional indexer is out-of-bounds (Traceback below)

    I have a dataset with one variable over CONUS and I'm trying to weight to one geom e.g. a county.

    I'll try to give make a reproducible example

    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-83-5cd8fd54cbfc> in <module>
          1 weightmap = xa.pixel_overlaps(ds, gdf, subset_bbox=True)
    ----> 2 aggregated = xa.aggregate(ds, weightmap)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/core.py in aggregate(ds, wm)
        434                 #   the grid have just nan values for this variable
        435                 # in both cases; the "aggregated variable" is just a vector of nans.
    --> 436                 if not np.isnan(wm.agg.iloc[poly_idx,:].pix_idxs).all():
        437                     # Get the dimensions of the variable that aren't "loc" (location)
        438                     other_dims = [k for k in np.atleast_1d(ds[var].dims) if k != 'loc']
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key)
        887                     # AttributeError for IntervalTree get_value
        888                     return self.obj._get_value(*key, takeable=self._takeable)
    --> 889             return self._getitem_tuple(key)
        890         else:
        891             # we by definition only have the 0th axis
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
       1448     def _getitem_tuple(self, tup: Tuple):
       1449 
    -> 1450         self._has_valid_tuple(tup)
       1451         with suppress(IndexingError):
       1452             return self._getitem_lowerdim(tup)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
        721         for i, k in enumerate(key):
        722             try:
    --> 723                 self._validate_key(k, i)
        724             except ValueError as err:
        725                 raise ValueError(
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
       1356             return
       1357         elif is_integer(key):
    -> 1358             self._validate_integer(key, axis)
       1359         elif isinstance(key, tuple):
       1360             # a tuple should already have been caught by this point
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_integer(self, key, axis)
       1442         len_axis = len(self.obj._get_axis(axis))
       1443         if key >= len_axis or key < -len_axis:
    -> 1444             raise IndexError("single positional indexer is out-of-bounds")
       1445 
       1446     # -------------------------------------------------------------------
    
    IndexError: single positional indexer is out-of-bounds
    
    opened by raybellwaves 5
  • dot product implementation for overlaps breaks xagg for high res grids

    dot product implementation for overlaps breaks xagg for high res grids

    I'm finding that the implementation of the dot product for computing weighted averages in core.py/aggregrate eats up way too much memory for high res grids. It's the wm.overlap_da that requires way too much memory. I am unable to allocate enough memory to make it through core.py/aggregate for many of the datasets I'm processing on an HPC system. I had no issue with the previous aggregate function before commit 4c5cc6503efde05153181e15bc5f7fe6bb92bd07. Looks like the dot product method is a lot cleaner in the code, but is there another benefit?

    opened by kerriegeil 3
  • work with xr.DataArray's

    work with xr.DataArray's

    In providing an xr.DataArray to xa.pixel_overlaps(da, gdf) you get the Traceback below.

    Couple of ideas for fixes:

    • in the code parse it to an xr.Dataset
    • Don't use .keys() but use .dims() instead
    AttributeError                            Traceback (most recent call last)
    <ipython-input-74-f5cd39618cec> in <module>
    ----> 1 weightmap = xa.pixel_overlaps(da, gdf)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/wrappers.py in pixel_overlaps(ds, gdf_in, weights, weights_target, subset_bbox)
         58     print('creating polygons for each pixel...')
         59     if subset_bbox:
    ---> 60         pix_agg = create_raster_polygons(ds,subset_bbox=gdf_in,weights=weights)
         61     else:
         62         pix_agg = create_raster_polygons(ds,subset_bbox=None,weights=weights)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/core.py in create_raster_polygons(ds, mask, subset_bbox, weights, weights_target)
        148     # Standardize inputs
        149     ds = fix_ds(ds)
    --> 150     ds = get_bnds(ds)
        151     #breakpoint()
        152     # Subset by shapefile bounding box, if desired
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/aux.py in get_bnds(ds, edges, wrap_around_thresh)
        196         # to [0,360], but it's not tested yet.
        197 
    --> 198     if ('lat' not in ds.keys()) | ('lon' not in ds.keys()):
        199         raise KeyError('"lat"/"lon" not found in [ds]. Make sure the '+
        200                        'geographic dimensions follow this naming convention.')
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xarray/core/common.py in __getattr__(self, name)
        237                 with suppress(KeyError):
        238                     return source[name]
    --> 239         raise AttributeError(
        240             "{!r} object has no attribute {!r}".format(type(self).__name__, name)
        241         )
    
    AttributeError: 'DataArray' object has no attribute 'keys'
    
    opened by raybellwaves 2
  • fix export to dataset issue, insert export tests

    fix export to dataset issue, insert export tests

    .to_dataset() was not working due to too many layers of lists in the agg.agg geodataframe. This issue has been fixed by replacing an index with np.squeeze() instead. The broader problem may be that there are too many unecessary layers of lists in the agg.agg geodataframe, which should be simplified in the next round of backend cleanup.

    Furthermore, there are now tests for .to_dataset() and .to_dataframe()

    opened by ks905383 1
  • speed improvement for high res grids in create_raster_polygons

    speed improvement for high res grids in create_raster_polygons

    Hi there, I'm a first timer when it comes to contributing to someone else's repo so please let me know if I need to fix/change anything. I've got a handful of small changes that greatly impact the speed of xagg when using high resolution grids. Planning to submit one at a time when I have the time to spend on it. It may take me a while...

    This first one removes the hard coded 0.1 degree buffer for selecting a subset bounding box in create_raster_polygons. For high res grids this will select a much larger area than desired. The solution I propose is to change the 0.1 degree buffer to twice the max grid spacing.

    opened by kerriegeil 1
  • Rename aux for windows

    Rename aux for windows

    As aux is a protected filename on Windows I could not install the package, and not even clone the repo without renaming the file first. This is a fix inspired by NeuralEnsemble/PyNN#678.

    opened by Hugovdberg 1
  • use aggregated.to_dataset().to_dataframe() within aggregated.to_dataframe()

    use aggregated.to_dataset().to_dataframe() within aggregated.to_dataframe()

    When dealing with time data aggregated.to_dataframe() will return columns as data_var0, data_var1.

    xarray has a method to convert to a dataframe http://xarray.pydata.org/en/stable/generated/xarray.DataArray.to_dataframe.html which moves coords to an multiindex.

    You would just have to add in the geometry and crs from the incoming geopandas to make it a geopandas dataframe.

    opened by raybellwaves 1
  • return geometry key in aggregated.to_dataframe()

    return geometry key in aggregated.to_dataframe()

    When doing aggregated.to_dataframe() it drops the geometry column that is in the original geopandas.DataFrame.

    It would be nice if it was returned to be used for things such as visualization.

    Screen Shot 2021-08-26 at 10 44 18 PM Screen Shot 2021-08-26 at 10 44 34 PM Screen Shot 2021-08-26 at 10 50 34 PM

    Code:

    import geopandas as gpd
    import pooch
    import xagg as xa
    import xarray as xr
    import hvplot.pandas
    
    # Load in example data and shapefiles
    ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
    file = pooch.retrieve(
        "https://pubs.usgs.gov/of/2006/1187/basemaps/continents/continents.zip", None
    )
    continents = gpd.read_file("zip://" + file)
    continents
    
    wm = xa.pixel_overlaps(ds, continents)
    aggregated = xa.aggregate(ds, wm)
    aggregated.to_dataframe()
    
    pd.merge(aggregated.to_dataframe(), continents, on="CONTINENT").hvplot(c="air")
    
    opened by raybellwaves 1
  • fix index error if input gdf has own index [issue #8]

    fix index error if input gdf has own index [issue #8]

    xa.get_pixel_overlaps() creates a poly_idx column in the gdf that takes as its value the index of the input gdf. However, if there is a pre-existing index, this can lead to bad behavior, since poly_idx is used as an .iloc indexer in the gdf. This update instead makes poly_idx np.arange(0,len(gdf)), which will avoid this indexing issue (and hopefully not cause any more? I figured there would've been a reason I used the existing index if not a new one... fingers crossed).

    opened by ks905383 1
  • silence functions

    silence functions

    Hi, thank you a lot for the great package.

    I was wondering if it is possible to add an argument to the functions (pixel_overlaps and aggregate) to silence them if we want? I am doing aggregations for many geometries and sometimes it becomes too crowded, especially if I try to print other things along while the functions are executed.

    Thanks !

    opened by khalilT 0
  • Mistaken use of ds.var() in `core.py`?

    Mistaken use of ds.var() in `core.py`?

    In core.py, there are a few loops of the form: for var in ds.var():.

    This tries to compute a variance across all dimensions, for each variable. Is that the intention? I think you just mean for var in ds:.

    Note that if any variables are of a type for which var cannot be computed (e.g., timedelta64[ns]) then aggregate fails.

    opened by jrising 3
  • Odd errors from using pixel_overlaps with a weights option

    Odd errors from using pixel_overlaps with a weights option

    This issue is sort of three issues that I encountered while trying to solve a problem. Fixes to any of these would work for me.

    I'm trying to use xagg with some fairly large files including a weights file, and I was getting an error during the regridding process:

    >>> weightmap = xa.pixel_overlaps(ds_tas, gdf_regions, weights=ds_pop.Population, subset_bbox=False)
    creating polygons for each pixel...
    lat/lon bounds not found in dataset; they will be created.
    regridding weights to data grid...
    Create weight file: bilinear_1800x3600_1080x2160.nc
    zsh: illegal hardware instruction  python
    

    (at which point, python crashes)

    I decided to do the regridding myself and save the result. Here are what the data file (ds_tas) and weights file (ds_pop) look like:

    >>> ds_tas
    <xarray.Dataset>
    Dimensions:      (band: 12, x: 2160, y: 1080)
    Coordinates:
      * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12
      * x            (x) float64 -179.9 -179.8 -179.6 -179.4 ... 179.6 179.7 179.9
      * y            (y) float64 89.92 89.75 89.58 89.42 ... -89.58 -89.75 -89.92
        spatial_ref  int64 ...
    Data variables:
        band_data    (band, y, x) float32 ...
    
    >>> ds_pop
    <xarray.Dataset>
    Dimensions:     (longitude: 2160, latitude: 1080)
    Coordinates:
      * longitude   (longitude) float64 -179.9 -179.8 -179.6 ... 179.6 179.8 179.9
      * latitude    (latitude) float64 89.92 89.75 89.58 ... -89.58 -89.75 -89.92
    Data variables:
        crs         int32 ...
        Population  (latitude, longitude) float32 ...
    Attributes:
        Conventions:  CF-1.4
        created_by:   R, packages ncdf4 and raster (version 3.4-13)
        date:         2022-02-05 22:14:16
    

    The dimensions line up exactly. But xagg still wanted to regrid my weights file. My guess is that this is because the dimensions are labeled differently (and so an np.allclose fails because taking a difference between the coordinates results in a 2-D matrix).

    So I relabeled my coordinates and dimensions. This results in a new error:

    >>> weightmap = xa.pixel_overlaps(ds_tas, gdf_regions, weights=ds_pop.Population, subset_bbox=False)
    creating polygons for each pixel...
    lat/lon bounds not found in dataset; they will be created.
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/wrappers.py", line 50, in pixel_overlaps
        pix_agg = create_raster_polygons(ds,subset_bbox=None,weights=weights)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/core.py", line 127, in create_raster_polygons
        ds = get_bnds(ds)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/aux.py", line 190, in get_bnds
        bnds_tmp[1:,:] = xr.concat([ds[var]-0.5*ds[var].diff(var),
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/_typed_ops.py", line 209, in __sub__
        return self._binary_op(other, operator.sub)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/dataarray.py", line 3081, in _binary_op
        self, other = align(self, other, join=align_type, copy=False)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/alignment.py", line 349, in align
        f"arguments without labels along dimension {dim!r} cannot be "
    ValueError: arguments without labels along dimension 'lat' cannot be aligned because they have different dimension sizes: {1080, 1079}
    

    To be clear, neither of my datasets has a dimension of size 1079.

    opened by jrising 11
  • weightmap (pixel_overlaps) warnings and errors

    weightmap (pixel_overlaps) warnings and errors

    Hi, thanks for the helpful package.

    On a Windows machine, I'm using the package successfully on ERA5 reanalysis data although I do get a user warning when calling pixel_overlaps. It occurs after the output "calculating overlaps between pixels and output polygons...". The warning is:

    "/home/kgeil/miniconda3/envs/xagg/lib/python3.9/site-packages/xagg/core.py:308: UserWarning: keep_geom_type=True in overlay resulted in 1 dropped geometries of different geometry types than df1 has. Set keep_geom_type=False to retain all geometries overlaps = gpd.overlay(gdf_in.to_crs(epsg_set),'

    When I try generating a weight map with the exact same shapefile but on AVHRR NDVI data instead I get a full error at the same-ish location:

    "ValueError: GeoDataFrame does not support setting the geometry column where the column name is shared by multiple columns."

    It looks like something is going wrong in get_pixel_overlaps around line 323 overlap_info...

    I've tried rewriting the NDVI netcdf to be as identical as possible as the ERA5 file (same coord and dim names, etc) and both files are epsg:4326.

    Any ideas how to get past this error?

    opened by kerriegeil 3
  • Trying to install outside target directory

    Trying to install outside target directory

    Get this error when trying to install with pip on windows. Have tried to install from pypi, github, and zip. Same error in each instance. I've tried with a base python install using virtual env and with conda. ERROR: The zip file (C:\Users\profile\Downloads\xagg-main.zip) has a file (C:\Users\khafen\AppData\Local\Temp\3\pip-req-build-cok0yin6\xagg/aux.py) trying to install outside target directory (C:\Users\profile\AppData\Local\Temp\3\pip-req-build-cok0yin6)

    opened by konradhafen 2
  • add to_geodataframe

    add to_geodataframe

    Closes https://github.com/ks905383/xagg/issues/17

    Open to feedback here.

    I believe https://github.com/ks905383/xagg/blob/main/xagg/classes.py#L62 should say geopandas.GeoDataFrame

    but I was thinking to_dataframe could return a pandas dataframe (no geometry and no crs). and to_geodataframe returns the geomety and crs

    opened by raybellwaves 2
Releases(v0.3.0.2)
  • v0.3.0.2(Apr 10, 2022)

    Bug fixes

    • .to_dataset() functions again
    • .read_wm() is now loaded by default

    What's Changed

    • fix export to dataset issue, insert export tests by @ks905383 in https://github.com/ks905383/xagg/pull/35
    • add read_wm() to init by @ks905383 in https://github.com/ks905383/xagg/pull/36

    Full Changelog: https://github.com/ks905383/xagg/compare/v0.3.0.1...v0.3.0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0.1(Apr 7, 2022)

    Fixes dependency error in setup.py that was preventing publication of v0.3* on conda-forge.

    Full Changelog: https://github.com/ks905383/xagg/compare/v0.3.0...v0.3.0.1

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 2, 2022)

    Performance upgrades

    Various performance upgrades, particularly for working with high resolution grids.

    In create_raster_polygons:

    • replacing the for-loop assigning pixels to polygons with a lambda apply
    • creating flexible buffer for subsetting to bounding box, replacing the hardcoded 0.1 degrees used previously with twice the max grid spacing

    In aggregate:

    • an optional replacement of the aggregating calculation with a dot-product implementation (impl='dot_product' in pixel_overlaps() and aggregate()), which may improve performance in certain situations

    Expanded functionality

    Weightmaps can now be saved using wm.to_file() and loaded using xagg.core.read_wm(), and no longer have to be regenerated with each code run.

    Bug fixes

    Various bug fixes

    What's Changed

    • speed improvement for high res grids in create_raster_polygons by @kerriegeil in https://github.com/ks905383/xagg/pull/29
    • dot product implementation by @jsadler2 in https://github.com/ks905383/xagg/pull/4
    • Speedup for large grids - mod gdf_pixels in create_raster_polgons by @kerriegeil in https://github.com/ks905383/xagg/pull/30
    • implement making dot product optional, restoring default agg behavior by @ks905383 in https://github.com/ks905383/xagg/pull/32
    • Implement a way to save weightmaps (output from pixel_overlaps) by @ks905383 in https://github.com/ks905383/xagg/pull/33

    New Contributors

    • @kerriegeil made their first contribution in https://github.com/ks905383/xagg/pull/29
    • @jsadler2 made their first contribution in https://github.com/ks905383/xagg/pull/4

    Full Changelog: https://github.com/ks905383/xagg/compare/v0.2.6...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.6(Jan 26, 2022)

    Bug fixes:

    • #11 pixel_overlaps no longer changes the gdf_in outside of the function

    Functionality tweaks

    • added agg.to_geodataframe(), similar to agg.to_dataframe(), but keeping the geometries from the original shapefile
    • adapted xarray's ds.to_dataframe() in agg.to_dataframe(), which has better functionality
    • .csvs now export long instead of wide, using the output from ds.to_dataframe() above
    Source code(tar.gz)
    Source code(zip)
  • v0.2.5(Jul 24, 2021)

  • v0.2.4(May 14, 2021)

Owner
Kevin Schwarzwald
Researching climate variability + impacts by profession, urban expansion by studies, and transit/land use policy by interest. Moonlight as rock violinist.
Kevin Schwarzwald
Powerful, efficient particle trajectory analysis in scientific Python.

freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics

Glotzer Group 195 Dec 20, 2022
Hydrogen (or other pure gas phase species) depressurization calculations

HydDown Hydrogen (or other pure gas phase species) depressurization calculations This code is published under an MIT license. Install as simple as: pi

Anders Andreasen 13 Nov 26, 2022
A program that uses an API and a AI model to get info of sotcks

Stock-Market-AI-Analysis I dont mind anyone using this code but please give me credit A program that uses an API and a AI model to get info of stocks

1 Dec 17, 2021
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
a tool that compiles a csv of all h1 program stats

h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati

Evan 40 Oct 27, 2022
PyChemia, Python Framework for Materials Discovery and Design

PyChemia, Python Framework for Materials Discovery and Design PyChemia is an open-source Python Library for materials structural search. The purpose o

Materials Discovery Group 61 Oct 02, 2022
Python tools for querying and manipulating BIDS datasets.

PyBIDS is a Python library to centralize interactions with datasets conforming BIDS (Brain Imaging Data Structure) format.

Brain Imaging Data Structure 180 Dec 18, 2022
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
Sample code for Harry's Airflow online trainng course

Sample code for Harry's Airflow online trainng course You can find the videos on youtube or bilibili. I am working on adding below things: the slide p

102 Dec 30, 2022
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
Falcon: Interactive Visual Analysis for Big Data

Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente

Vega 803 Dec 27, 2022
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
Making the DAEN information accessible.

The purpose of this repository is to make the information on Australian COVID-19 adverse events accessible. The Therapeutics Goods Administration (TGA) keeps a database of adverse reactions to medica

10 May 10, 2022
My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021
An Aspiring Drop-In Replacement for NumPy at Scale

Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the f

Legate 502 Jan 03, 2023
A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
simple way to build the declarative and destributed data pipelines with python

unipipeline simple way to build the declarative and distributed data pipelines. Why you should use it Declarative strict config Scaffolding Fully type

aliaksandr-master 0 Jan 26, 2022