A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

Related tags

Data Analysisxwrf
Overview

xwrf

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of xwrf is to replicate crucial I/O functionality from the wrf-python package in a way that is more convenient for users and provides seamless integration with the rest of the Pangeo software stack.

CI GitHub Workflow Status Code Coverage Status
Docs Documentation Status
License License

This code is highly experimental! Let the buyer beware ⚠️ ;)

Installation

xwrf may be installed with pip:

python -m pip install git+https://github.com/NCAR/xwrf.git

What is it?

The native WRF output files are not CF compliant. This makes these files not the easiest NetCDF files to use with tools like xarray. This package provides a simple interface for reading in the WRF output files into xarray Dataset objects using xarray's flexible and extensible I/O backend API. For example, the following code reads in a WRF output file:

Dimensions: (Time: 1, south_north: 546, west_east: 480) Coordinates: XLONG (south_north, west_east) float32 ... XLAT (south_north, west_east) float32 ... Dimensions without coordinates: Time, south_north, west_east Data variables: Q2 (Time, south_north, west_east) float32 ... PSFC (Time, south_north, west_east) float32 ... Attributes: (12/86) TITLE: OUTPUT FROM WRF V3.3.1 MODEL START_DATE: 2012-04-20_00:00:00 SIMULATION_START_DATE: 2012-04-20_00:00:00 WEST-EAST_GRID_DIMENSION: 481 SOUTH-NORTH_GRID_DIMENSION: 547 BOTTOM-TOP_GRID_DIMENSION: 32 ... ... NUM_LAND_CAT: 24 ISWATER: 16 ISLAKE: -1 ISICE: 24 ISURBAN: 1 ISOILWATER: 14 ">
In [1]: import xarray as xr

In [2]: path = "./tests/sample-data/wrfout_d03_2012-04-22_23_00_00_subset.nc"

In [3]: ds = xr.open_dataset(path, engine="xwrf")

In [4]: # or

In [5]: # ds = xr.open_dataset(path, engine="wrf")

In [6]: ds
Out[6]:
<xarray.Dataset>
Dimensions:  (Time: 1, south_north: 546, west_east: 480)
Coordinates:
    XLONG    (south_north, west_east) float32 ...
    XLAT     (south_north, west_east) float32 ...
Dimensions without coordinates: Time, south_north, west_east
Data variables:
    Q2       (Time, south_north, west_east) float32 ...
    PSFC     (Time, south_north, west_east) float32 ...
Attributes: (12/86)
    TITLE:                            OUTPUT FROM WRF V3.3.1 MODEL
    START_DATE:                      2012-04-20_00:00:00
    SIMULATION_START_DATE:           2012-04-20_00:00:00
    WEST-EAST_GRID_DIMENSION:        481
    SOUTH-NORTH_GRID_DIMENSION:      547
    BOTTOM-TOP_GRID_DIMENSION:       32
    ...                              ...
    NUM_LAND_CAT:                    24
    ISWATER:                         16
    ISLAKE:                          -1
    ISICE:                           24
    ISURBAN:                         1
    ISOILWATER:                      14

In addition to being able to use xr.open_dataset, xwrf also allows reading in multiple WRF output files at once via xr.open_mfdataset function:

ds = xr.open_mfdataset(list_of_files, engine="xwrf", parallel=True,
                       concat_dim="Time", combine="nested")

Why not just a preprocess function?

One can achieve the same functionality with a preprocess function. However, there are some additional I/O features that wrf-python implements under the hood that we think would be worth implementing as part of a backend engine instead of a regular preprocess function.

Comments
  • First Release Blog Post

    First Release Blog Post

    Description

    I think that once we have a first release of xwrf, we should write a blog post demonstrating its use. It would be great if one of our WRF expert collaborators could spearhead this blog. Any volunteers?

    Implementation

    Personally, I think that a Jupyter Notebook is a good medium for a demonstration, and the notebook can be easily converted to a markdown doc for a blog-post.

    Tests

    N/A

    Questions

    Before embarking on this, though, we need to complete the features that we want in the first release. That said, I wouldn't be too overly excited to delay the release. Earlier is better, even if incomplete.

    enhancement 
    opened by kmpaul 32
  • Implementation of salem-style x, y, and z coordinates

    Implementation of salem-style x, y, and z coordinates

    Change Summary

    As alluded to in #2, including dimension coordinates in the grid mapping/projection space is a key feature for integrating with other tools in the ecosystem like metpy and xgcm. In this (draft) PR, I've combined code ported from salem with some of my own one-off scripts and what already exists in xwrf to meet this goal. In particular, this introduces a pyproj dependency (for CRS handling and transforming the domain center point from lon/lat to easting/northing). Matching the assumptions already present in xwrf and salem, this implementation assumes we do not have a moving domain (which simplifies things greatly). Also, this implements the c_grid_axis_shift attr as-needed, so xgcm should be able to interpret our coords automatically, eliminating the need for direct handling (like #5) in xwrf.

    ~~Also, because it existed in salem and my scripts alongside the dimension coordinate handling, I also included my envisioned diagnostic field calculations. These are deliberately limited to only those four fields that require WRF-specific handling:~~

    • ~~ 'T' going to potential temperature has a magic number offset of 300 K~~
    • ~~ 'P' and 'PB' combine to form pressure, and are not otherwise used~~
    • ~~ 'PH' and 'PHB' combine to form geopotential, and are not otherwise used~~
    • ~~ Geopotential to geopotential height conversion depends on a particular value of g (9.81 m s**2) that may not match the value used elsewhere~~

    ~~Unless I'm missing something, any other diagnostics should be derivable using these or other existing fields in a non-WRF-specific way (and so, fit outside of xwrf). If the netcdf4 backend already handles Dask chunks, then this should "just work" as it is currently written. However, I'm not sure how this should behave with respect to lazy-loading when chunks are not specified, so that is definitely a discussion to have in relation to #10.~~

    ~~Right now, no tests are included, as this is just a draft implementation to get the conversation started on how we want to approach these features. So, please do share your thoughts and ask questions!~~

    Related issue number

    • Closes #3
    • Closes #11

    Checklist

    • [x] Unit tests for the changes exist
    • [x] Tests pass on CI
    • [ ] Documentation reflects the changes where applicable
    enhancement 
    opened by jthielen 31
  • First Release?

    First Release?

    Now that we have xwrf in a usable state, should we consider cutting its first release soon (later this week or next week)? We already have the infrastructure in place for automatically publishing the package to PyPI. One missing piece is the documentation. The infrastructure for authoring the docs is already in place (uses markdown via myst + furo theme, and the current template follows this documentation system guide). I am opening this issue to keep track of other outstanding issues that need to be addressed before the first release. Feel free to add to this list (cc @ncar-xdev/xwrf)

    • [x] Update documentation
    • [x] Publish to PyPI
    • [x] Publish to conda-forge
    opened by andersy005 27
  • Tutorial

    Tutorial

    Change Summary

    Tutorial showing xWRF usage.

    Related issue number

    • Towards #69

    Checklist

    • [x] Unit tests for the changes exist
    • [x] Tests pass on CI
    • [x] Documentation reflects the changes where applicable
    opened by lpilz 20
  • Tutorial on xWRF

    Tutorial on xWRF

    What is your issue?

    The aim of this issue is to track the progress in creating a tutorial for xWRF. Here the start of a list of features which are to be presented. Please feel free to add to this list - I'll work on implementing this over coming days.

    • [x] general parsing/coordinate transformation (what does xwrf do?)
    • [x] interface to metpy via unit CF-conventions and pint
    • [x] destaggering data using xgcm
    • [x] vertically interpolating data using xgcm
    • [x] plotting
    opened by lpilz 17
  • Update of tutorials for v0.0.2

    Update of tutorials for v0.0.2

    Change Summary

    Added a tutorial for using xgcm with dask-data.

    Related issue number

    Closes #69

    Checklist

    • [x] Documentation reflects the changes where applicable
    documentation 
    opened by lpilz 13
  • First draft

    First draft "destagger" function

    Change Summary

    Here's an attempt at a "destaggering" function. This is based on the function in WRF-python (https://github.com/NCAR/wrf-python/blob/22fb45c54f5193b849fdff0279445532c1a6c89f/src/wrf/destag.py).

    I've tested in on "east_west_stag" and "north_south_stag" coordinates. The function takes an xarray data-array and guesses the name of the staggered coordinate (it ends in "_stag"). If there is more than one (I don't think there are in WRF?), a NotImplenetedError is raised.

    I'm also not sure if this should ultimately look like this at all, but I wanted to go ahead and throw this code out there.

    Related issue number

    This is related to issue #35

    Checklist

    I don't have any unit tests to check this -- I'm open to ideas on how to make unit tests (do they need to be on "real" data?) Maybe that's a separate issue.

    • [ ] Unit tests for the changes exist
    • [ ] Tests pass on CI
    • [ ] Documentation reflects the changes where applicable

    I'm new to collaborating on open-source projects, and writing code for wide usage, so any feedback is welcome!

    enhancement 
    opened by bsu-wrudisill 13
  • [MISC]: Curate sample datasets

    [MISC]: Curate sample datasets

    What is your issue?

    We currently don't have great sample datasets to use for testing, documentation. It's worth curating exemplar, small data sets. We could emulate the approach used by fatiando/ensaio or xarray tutorial module. These datasets should probably be hosted in a separate GitHub repository.

    • Option 1: A separate data package (xwrf_data)
    import xwrf_data
    import xwrf
    import xarray as xr
    
    fname = xwrf_data.fetch_foo_dataset()
    ds = xr.open_dataset(fname).wrf.diag_and_destagger()
    
    • Option 2: Tutorial module within xwrf
    import xwrf
    import xarray as xr
    
    ds = xwrf.tutorial.open_dataset('foo_dataset').wrf.diag_and_destagger()
    

    Cc @ncar-xdev/xwrf

    enhancement 
    opened by andersy005 12
  • Division of Features in Top-Level API

    Division of Features in Top-Level API

    While detailed API discussions will be ongoing based on https://github.com/NCAR/xwrf/discussions/13 and other issues/discussions that follow from that, https://github.com/NCAR/xwrf/pull/14#issuecomment-977066277 and https://github.com/NCAR/xwrf/pull/14#issuecomment-977157649 raised a more high-level API point that would be good to clear up first: what features go into the xwrf backend, and what goes elsewhere (such as a .wrf accessor)?

    Original comments:


    If so, I think this means we can't have direct Dask operations within the backend, but would rather need to design custom backend arrays that play nicely with the Dask chunking xarray itself does, or re-evaluate the approach for derived quantities so that they are outside the backend. Perhaps the intake-esm approach could help in that regard at least?

    Wouldn't creating custom backend arrays be overkill? Assuming we want to support reading files via the Python-netCDF4 library, we might be able to write a custom data store that borrows from xarray's NetCDF4DataStore: https://github.com/pydata/xarray/blob/5db40465955a30acd601d0c3d7ceaebe34d28d11/xarray/backends/netCDF4_.py#L291. With this custom datastore, we would have more control over what to do with variables, dimensions, attrs before passing them to xarray. Wouldn't this suffice for the data loading (without the derived quantities)?

    I think there's value in keeping the backend plugin simple (e.g. performing simple tasks such as decoding coordinates, fixing attributes/metadata, etc) and everything else outside the backend. Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

    Some of the benefits of deriving quantities outside the backend are that this approach:

    (1) doesn't obfuscate what's going on, (2) gives users the opportunity to fix aspects of the dataset that might be missed by xwrf during data loading before passing this cleaned dataset to the functionality for deriving quantities. (3) removes the requirement for deriving quantities to be a lazy operation i.e. if your dataset is in memory, deriving the quantity is done eagerly...

    Originally posted by @andersy005 in https://github.com/NCAR/xwrf/issues/14#issuecomment-977066277


    Some of the benefits of deriving quantities outside the backend are that this approach:

    Also, Wouldn't it be beneficial for deriving quantities to be backend agnostic? I'm imagining cases in which the data have been post-processed and saved in a different format (e.g. Zarr) and you still want to be able to use the same code for deriving quantities on the fly.

    Originally posted by @andersy005 in https://github.com/NCAR/xwrf/issues/14#issuecomment-977072366


    Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

    This sounds like it factors directly into the "keep the solutions as general as possible (so that maybe also MPAS can profit from it)" discussion. However, I feel that we have to think about the user-perspective too. I don't have any set opinions on this and we should definitely discuss this maybe in a larger group too. Here some thoughts on this so far:

    I think the reason users like wrf-python is because it is an easy one-stop-shop for getting wrf output to work with python - this is especially true because lots of users are scientists and not software engineers or programmers. I personally take from this point that it would be prudent to keep the UX as easy as possible. I think this is what the Backend-approach does really well. Basically users just have to add the engine='xwrf' kwarg and then it just works (TM). Meaning that it provides the users with CF-compliant de-WRFified meteo data. Also, given that the de-WRFification of the variable data is not too difficult (it's basically just adding fields for three variables), I think the overhead in complexity wouldn't be too great. However, while I do see that it breaks the conceptual barrier between data loading (and decoding etc.) and computation, this breakage would be required in order to provide the user with meteo data rather than raw wrf fields.

    @andersy005 do you already have some other ideas on how one could handle this elegantly?

    Also, should we move this discussion to a separate issue maybe?

    Originally posted by @lpilz in https://github.com/NCAR/xwrf/issues/14#issuecomment-977157649

    opened by jthielen 10
  • Coordinate UX

    Coordinate UX

    I think this is pretty straightforward as we just need the lat, lon and time coordinates, all other can be discarded. Unstaggering will be done in the variable initialization. However, we should be aware of moving-nest runs and keep the time-dependence of lat and lon for these occasions.

    enhancement 
    opened by lpilz 9
  • Create xWRF logo

    Create xWRF logo

    What is your issue?

    It would be nice to have a minimalistic logo for the project. Does anyone have or know someone with design skills? :). This would be good for the overall branding of the project once we start advertising the project after the first release

    • https://github.com/ncar-xdev/xwrf/issues/51

    Cc @ncar-xdev/xwrf

    opened by andersy005 8
  • [Bug]: ValueError when using MetPy to calculate geostrophic winds

    [Bug]: ValueError when using MetPy to calculate geostrophic winds

    What happened?

    I'm trying to use the MetPy function mpcalc.geostrophic_wind() to calculate geostrophic winds from a wrfout file.

    I'm getting "ValueError: Must provide dx/dy arguments or input DataArray with latitude/longitude coordinates", along with a warning, "warnings.warn('More than one ' + axis + ' coordinate present for variable'".

    I don't know what's causing the problem.

    Minimal Complete Verifiable Example

    import metpy.calc as mpcalc
    import xarray as xr
    import xwrf
    
    # Open the NetCDF file
    filename = "wrfout_d01_2016-10-04_12:00:00"
    ds = xr.open_dataset(filename).xwrf.postprocess()
    
    # Extract the geopotential height
    z = ds['geopotential_height']
    
    # Compute the geostrophic wind
    geo_wind_u, geo_wind_v = mpcalc.geostrophic_wind(z)
    

    Relevant log output

    /mnt/iusers01/fatpou01/sees01/w34926hb/.conda/envs/metpy_env/lib/python3.9/site-packages/metpy/xarray.py:355: UserWarning: More than one latitude coordinate present for variable "geopotential_height".
      warnings.warn('More than one ' + axis + ' coordinate present for variable'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/mnt/iusers01/fatpou01/sees01/w34926hb/.conda/envs/metpy_env/lib/python3.9/site-packages/metpy/xarray.py", line 1508, in wrapper
        raise ValueError('Must provide dx/dy arguments or input DataArray with '
    ValueError: Must provide dx/dy arguments or input DataArray with latitude/longitude coordinates.
    

    Environment

    System Information
    ------------------
    xWRF commit : None
    python      : 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50)
    [GCC 10.3.0]
    python-bits : 64
    OS          : Linux
    OS-release  : 3.10.0-1127.19.1.el7.x86_64
    machine     : x86_64
    processor   : x86_64
    byteorder   : little
    LC_ALL      : None
    LANG        : en_GB.UTF-8
    LOCALE      : ('en_GB', 'UTF-8')
    
    Installed Python Packages
    -------------------------
    cf_xarray   : 0.7.5
    dask        : 2022.11.0
    donfig      : 0.7.0
    matplotlib  : 3.6.2
    metpy       : 1.3.1
    netCDF4     : 1.6.2
    numpy       : 1.23.5
    pandas      : 1.5.1
    pint        : 0.20.1
    pooch       : v1.6.0
    pyproj      : 3.4.0
    xarray      : 2022.11.0
    xgcm        : 0.8.0
    xwrf        : 0.0.2
    

    Anything else we need to know?

    No response

    bug waiting for response 
    opened by starforge 3
  • [MISC]: Plot in metpy tutorial is missing

    [MISC]: Plot in metpy tutorial is missing

    What is your issue?

    On https://xwrf.readthedocs.io/en/latest/tutorials/metpy.html, the Skew-T plot is missing. @andersy005 is this an intermittent sphinx issue or do we have some malconfiguration somewhere?

    opened by lpilz 1
  • More comprehensive unit harmonization

    More comprehensive unit harmonization

    Change Summary

    Unit harmonization is improved by:

    • using a better map parsed from WRF Registries (yes, all of them, but not WPS)
      • translations are generated manually using custom external tool
      • includes all versions from WRFv4.0 onwards
      • makes bracket cleaning superfluous
    • extracting this map from the config yaml to avoid clutter

    Related issue number

    Checklist

    • [x] Unit tests for the changes exist
    • [x] Tests pass on CI
    • [x] Documentation reflects the changes where applicable
    enhancement 
    opened by lpilz 4
  • [FEATURE]: Add functionality to organize WRF data into a DataTree

    [FEATURE]: Add functionality to organize WRF data into a DataTree

    Description

    WRF output can easily have a couple hundred data variables in a dataset, which is not ideal for interactive exploration of a dataset's contents. With DataTree, we would have a tree-like hierarchical data structure for xarray which could be used for this.

    From @lpilz in https://github.com/xarray-contrib/xwrf/issues/10:

    • Which diagnostics do we want to provide and do we want to expose them in a DataTree eventually?

    One suggestion might be:

    DataTree("root")
    |-- DataNode("2d_variables")
    |   |-- DataArrayNode("sea_surface_temperature")
    |   |-- DataArrayNode("surface_temperature")
    |   |-- DataArrayNode("surface_air_pressure")
    |   |-- DataArrayNode("air_pressure_at_sea_level")
    |   |-- DataArrayNode("air_temperature_at_2m") (?)
    |   ....
    |-- DataNode("3d_variables")
        |-- DataArrayNode("air_temperature")
        |-- DataArrayNode("air_pressure")
        |-- DataArrayNode("northward_wind")
        |-- DataArrayNode("eastward_wind")
        ....
    

    Implementation

    This would likely become a new accessor method, such as .xwrf.organize().

    Tests

    After xwrf.postprocess(), we have a post processed dataset (with likely many data variables). Then, after xwrf.organize(), we would have a DataTree with (a yet to be decided) tree-like grouping of data variables. Calling xwrf.organize() without xwrf.postprocess() would fail.

    Questions

    What form of heirarchy would we want to have and how deep?

    • 2d_variables vs. 3d_variables?
    • semantic grouping of variables, such as thermodynamic, grid_metrics, kinematic, accumulated, etc.?
    • Parse the WRF Registry somehow and assign groups based on that?
    • some other strategy?
    enhancement 
    opened by jthielen 0
  • [META]: Support for unexpected/non-pristine wrfout datasets

    [META]: Support for unexpected/non-pristine wrfout datasets

    What is your issue?

    As encountered in #36 and https://github.com/xarray-contrib/xwrf-data/pull/34 (and perhaps elsewhere), there may be several unexpected factors (old versions, tweaked registries, subsetting, etc.) that could result in xWRF's standard functionality being unsupported or failing. While it is definitely something not to prioritize for immediate releases, it would still be nice to make as many subsets of xWRF functionality available to users whose WRF datasets "break" xWRF's norms as possible. So, I propose this to be a meta-issue to

    • track such unexpected/non-pristine examples
    • work towards features to enable extended compatibility and/or custom application of atomized functionality outside of the standard postprocess()
    • discuss any high-level design strategies to improve the experience of xWRF in these situations

    Running list of sub-issues

    (feel free to add/modify)

    • [ ] Missing latitude/longitude coordinates (xref #36)
      • Could be addressed by (one or both of)
        • Convenience methods to merge in coordinates from geo_em files
        • Recompute lat/lon from projection coordinates
    • [ ] Dataset grid definition attributes partially invalid due to spatial subsetting prior to postprocessing (xref https://github.com/xarray-contrib/xwrf-data/pull/34; local issue TBD)
      • Could be addressed by (one or both of)
        • Reference lat/lon being derived from XLAT/XLONG corner(s) rather than CEN_LON/CEN_LAT attrs
        • Require user input of needed info if some sanity check fails (which would also lead to support for completely missing attrs, not just CEN_LON/CEN_LAT being rendered invalid)
    enhancement 
    opened by jthielen 0
  • [MISC]: More careful consideration of different xarray options

    [MISC]: More careful consideration of different xarray options

    What is your issue?

    Test expected results under different xarray options

    In the spirit of improving the quality of our tests (xref #60), it would be nice to implement tests where different relevant xarray options are enabled (using set_options as a context manager). This would likely make it easier to catch issues like #96 .

    Xarray options in issue reports

    Not sure the best way to do this (bundle into xwrf.show_versions()? Add another copy-paste box to the issue template?), but it could help with debugging if we knew the state of xarray.get_options.

    maintenance 
    opened by jthielen 0
Releases(v0.0.2)
  • v0.0.2(Sep 21, 2022)

    What's Changed

    • Add destaggering functionality by @jthielen in https://github.com/xarray-contrib/xwrf/pull/93
    • Fix destagger attrs by @lpilz in https://github.com/xarray-contrib/xwrf/pull/97
    • Fix staggered coordinate destaggering for dataarray destagger method by @jthielen in https://github.com/xarray-contrib/xwrf/pull/101
    • Added earth-relative wind field calculation to base diagnostics by @lpilz in https://github.com/xarray-contrib/xwrf/pull/100
    • Clean up _destag_variable with respect to types and terminology by @jthielen in https://github.com/xarray-contrib/xwrf/pull/103
    • Changed wrfout file (cf. xwrf-data/#34) by @lpilz in https://github.com/xarray-contrib/xwrf/pull/102
    • More unit harmonization by @lpilz in https://github.com/xarray-contrib/xwrf/pull/105
    • Fixing a further coords attrs fail. by @lpilz in https://github.com/xarray-contrib/xwrf/pull/107
    • Clear c_grid_axis_shift from attrs when destaggering by @jthielen in https://github.com/xarray-contrib/xwrf/pull/106
    • Update of tutorials for v0.0.2 by @lpilz in https://github.com/xarray-contrib/xwrf/pull/89

    Full Changelog: https://github.com/xarray-contrib/xwrf/compare/v0.0.1...v0.0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Sep 9, 2022)

    This is the first packaged release of xWRF (a lightweight interface for working with the Weather Research and Forecasting (WRF) model output in xarray). Features in this release include:

    • A xwrf Dataset accessor with a postprocess method that can perform the following operations
      • Rename dimensions to match the CF conventions.
      • Rename variables to match the CF conventions.
      • Rename variable attributes to match the CF conventions.
      • Convert units to Pint-friendly units.
      • Decode times.
      • Include projection coordinates.
      • Collapse time dimension.
    • A tutorial module with several sample datasets
    • Documentation with several examples/tutorials

    Thank you to the following contributors for their efforts towards this release!

    • @andersy005
    • @lpilz
    • @jthielen
    • @kmpaul
    • @dcherian
    • @jukent

    Full Changelog: https://github.com/xarray-contrib/xwrf/commits/v0.0.1

    Source code(tar.gz)
    Source code(zip)
Owner
National Center for Atmospheric Research
NCAR is sponsored by the National Science Foundation and managed by the University Corporation for Atmospheric Research.
National Center for Atmospheric Research
Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible

Salad Dais 6 Sep 01, 2022
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little

Himanshu Kumar singh 2 Dec 04, 2021
Tools for analyzing data collected with a custom unity-based VR for insects.

unityvr Tools for analyzing data collected with a custom unity-based VR for insects. Organization: The unityvr package contains the following submodul

Hannah Haberkern 1 Dec 14, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Jan 03, 2023
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Débora Mendes de Azevedo 1 Feb 03, 2022
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

Matrix Profile Foundation 302 Dec 29, 2022
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
A Python adaption of Augur to prioritize cell types in perturbation analysis.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Theis Lab 2 Mar 29, 2022
Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

Machine Learning and Data Analytics Lab FAU 3 Dec 07, 2022
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022
Functional tensors for probabilistic programming

Funsor Funsor is a tensor-like library for functions and distributions. See Functional tensors for probabilistic programming for a system description.

208 Dec 29, 2022
Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required)

Binomial Option Pricing Calculator Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required) Background A derivative is a fi

sammuhrai 1 Nov 29, 2021
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

Tobias Krabel 863 Jan 08, 2023
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Cloudera 759 Jan 07, 2023
Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

Oxford Nanopore Technologies 72 Jan 05, 2023
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
Instant search for and access to many datasets in Pyspark.

SparkDataset Provides instant access to many datasets right from Pyspark (in Spark DataFrame structure). Drop a star if you like the project. 😃 Motiv

Souvik Pratiher 31 Dec 16, 2022
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien

ROOT 2k Dec 29, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022