Publish Xarray Datasets via a REST API.

Overview

Xpublish

Publish Xarray Datasets via a REST API.

GitHub Workflow Status Documentation Status Binder

Serverside: Publish a Xarray Dataset through a rest API

ds.rest.serve(host="0.0.0.0", port=9000)

Client-side: Connect to a published dataset

The published dataset can be accessed from various kinds of client applications. Here is an example of directly accessing the data from within Python:

import xarray as xr
import zarr
from fsspec.implementations.http import HTTPFileSystem

fs = HTTPFileSystem()
http_map = fs.get_mapper('http://0.0.0.0:9000')

# open as a zarr group
zg = zarr.open_consolidated(http_map, mode='r')

# or open as another Xarray Dataset
ds = xr.open_zarr(http_map, consolidated=True)

Why?

Xpublish lets you serve/share/publish Xarray Datasets via a web application.

The data and/or metadata in the Xarray Datasets can be exposed in various forms through pluggable REST API endpoints. Efficient, on-demand delivery of large datasets may be enabled with Dask on the server-side.

We are exploring applications of Xpublish that include:

  • publish on-demand or derived data products
  • turning xarray objects into streaming services (e.g. OPeNDAP)

How?

Under the hood, Xpublish is using a web app (FastAPI) that is exposing a REST-like API with builtin and/or user-defined endpoints.

For example, Xpublish provides by default a minimal Zarr compatible REST-like API with the following endpoints:

  • .zmetadata: returns Zarr-formatted metadata keys as json strings.
  • var/0.0.0: returns a variable data chunk as a binary string.
Comments
  • Refactor routes

    Refactor routes

    First step towards addressing #25.

    This moves all path operation functions out of RestAccessor and creates instead fastapi.APIRouter instances in a new routers sub-package. Each module in routers contains a APIRouter instance dedicated to a specific part of the API.

    Each function operates on the served dataset by overriding the get_dataset dependency for RestAccessor.app.

    TODO:

    • [x] move zarr-specific path operation functions after #21
    • ~~maybe refactor tests (if directly testing APIRouter instances is possible and a good idea)~~
    opened by benbovy 11
  • Publishing a collection of datasets

    Publishing a collection of datasets

    It would be great if we could publish multiple datasets on the same server.

    I'm thinking of something like this:

    xpublish.serve(
        {'ds1': xarray.Dataset(...), 'ds2': xarray.Dataset(...)},
        host="127.0.0.1",
        port=9000
    )
    

    or

    # will launch the server
    ds1.rest.serve(host="127.0.0.1", port=9000, name="ds1")
    
    # same host/port -> will reuse the server
    ds2.rest.serve(host="127.0.0.1", port=9000, name="ds2")
    

    Would there be any technical challenge in supporting this?

    This will certainly break the current API end points, unless both cases (single dataset vs collection of datasets) are supported (perhaps not on the same running server).

    For the case of multiple datasets, all the current end points could for example have the prefix /datasets/<name>/. Some additional end points may be useful for listing the datasets in the collection.

    opened by benbovy 7
  • Fix tests with last Xarray versions

    Fix tests with last Xarray versions

    I guess the failing roundtrip tests are related to https://github.com/pydata/xarray/pull/2844 but I'm not sure what to do here to fix it. Any idea @jhamman @andersy005?

    opened by benbovy 6
  • AttributeError: 'Dataset' object has no attribute 'rest'

    AttributeError: 'Dataset' object has no attribute 'rest'

    Hello,

    xpublish looks very promising and I want to use it for serving a few datasets in an experiment. I've installed xpublish in a conda environment

    I do run into the exception

    AttributeError: 'Dataset' object has no attribute 'rest' when running the simple script:

    #!/opt/anaconda/envs/env_xpublish/bin/python
    
    import click
    import sys
    import pandas as pd
    import numpy as np
    import xarray as xr
    import xpublish
    
    ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 5))},
                     coords={'x': [10, 20, 30, 40],
                            'y': pd.date_range('2000-01-01', periods=5),
                            'z': ('x', list('abcd'))})
    
    
    ds.rest.serve(host='0.0.0.0', port=9000)
    

    Any help/tips is really appreciated

    question 
    opened by fabricebrito 6
  • Flexible routes

    Flexible routes

    Overview

    This PR modifies xpublish to be able to server multiple datasets based on @benbovy prototype.

    This is an attempt to address #23 and #25.

    Notes

    Further analysis needs to be done to see if dask is working correctly and also caching, otherwise, it seems to work to serve multiple datasets.

    opened by lsetiawan 5
  • Doc fixes, tweaks and improvements

    Doc fixes, tweaks and improvements

    A couple of comments:

    • The rest accessor API is now documented using sphinx-autosummary-accessors.

    • I replaced the ipython directives by regular python code blocks. I don't think using ipython directives are worth relying on ipython + all xpublish's runtime dependencies for building the docs, given that we don't really leverage the interactive output here. I'm not against reverting this change in case anyone has objections.

    opened by benbovy 4
  • Move this project to a new GitHub organization?

    Move this project to a new GitHub organization?

    Recently, @lsetiawan and @benbovy have been making contributions to this repository. Would now be a good time to move the repository to a GitHub organization. I think xarray-contrib is a logical place but Pangeo would also be fine by me.

    opened by jhamman 4
  • Add init app method for custom app config

    Add init app method for custom app config

    Overview

    Adding init_app method to set additional configuration to FastAPI configuration to allow more control to app and expand.

    Need this for sub-application to build proxying for multiple datasets: https://fastapi.tiangolo.com/advanced/sub-applications-proxy/

    opened by lsetiawan 4
  • 🐛Do not fail if not a git repo when retrieving system information

    🐛Do not fail if not a git repo when retrieving system information

    otherwise this may happen on the /versions route:

      File "/home/mah/.local/lib/python3.7/site-packages/xpublish/utils/info.py", line 38, in get_sys_info
        blob.append(('commit', commit))
    UnboundLocalError: local variable 'commit' referenced before assignment
    
    opened by mhaberler 3
  • Fix single dataset openapi specs (API docs)

    Fix single dataset openapi specs (API docs)

    This fixes FastAPI's generated API docs in the case where a single dataset is published. The dataset_id parameter is not shown anymore.

    The fix is based on FastAPI docs: https://fastapi.tiangolo.com/advanced/extending-openapi/. I had to do some tricks to support a bunch of changes in last FastAPI versions, though. Hopefully it won't change too often in the future.

    This fix won't be needed anymore if it is addressed upstream (see https://github.com/tiangolo/fastapi/issues/1594).

    opened by benbovy 3
  • use the released version from pypi

    use the released version from pypi

    I just released sphinx-autosummary-accessors 0.1, so that should be preferred over installing from github. This also pins sphinx to sphinx>=3.1 which fixes the incomplete summary for callable accessors.

    opened by keewis 3
  • Extendable entrypoint plugins

    Extendable entrypoint plugins

    Another variation on #140 with a few of the ideas from the discussion there and #139.

    Plugin routers are now nested under a parent Plugin class which now acts as a way to combine multiple related pieces of functionality together (say db management routes and a CLI). This allows new plugin functionality to be added in other plugins or Xpublish related libraries without requiring the parent Plugin class to define everything.

    Plugins are loaded from the xpublish.plugin entrypoint group. Plugins can be manually configured via the plugins argument to xpublish.Rest. The specifics of plugin loading can be changed by overriding the .setup_plugins() method.

    Some other xpublish.Rest functionality has been refactored out into separate methods to allow easier overriding for instance making a SingleDatasetRest class that will allow simplifying xpublish.Rest.

    The ds.rest accessor has been move out into it's own file.

    opened by abkfenris 0
  • Entry point plugins

    Entry point plugins

    Builds on top of @benbovy's work in building router factories in https://github.com/xarray-contrib/xpublish/pull/89 to build a plugin system, to try to implement some of my thoughts in https://github.com/xarray-contrib/xpublish/discussions/139

    The plugin system uses entry points, which are most commonly used for console or GUI scripts. The entry_point group is xpublish.plugin. Right now plugins can provide dataset specific and general (app) routes, with default prefixes and tags for both.

    Xpublish will by default load plugins via the entry point. Additionally, plugins can also be loaded directly via the init, as well as being disabled, or configured. The existing dataset router pattern also still works, so that folks aren't forced into using plugins as the only way to extend functionality.

    It runs against the existing test suite, but I haven't implemented any new tests or docs yet.

    Entry point reference:

    • https://setuptools.pypa.io/en/latest/userguide/entry_point.html
    • https://packaging.python.org/en/latest/specifications/entry-points/
    • https://amir.rachum.com/amp/blog/2017/07/28/python-entry-points.html
    opened by abkfenris 5
  • 2022-12-09 Xpublish & ZarrDAP meeting notes

    2022-12-09 Xpublish & ZarrDAP meeting notes

    On 2022-12-09 we met to discuss various Xarray based data server projects. Discussion post announcing meeting

    Purpose: Meetup to discuss progress and plans for OpenDAP, WMS and other API layers on top of the Xarray/Dask (aka Pangeo) Python stack, such as:

    • Xpublish
      • xreds built on top of Xpublish
    • ZarrDAP
      • Implements OPeNDAP and a custom HTML ZarrDAP Catalog, from which it generates an Intake catalog.

    Attendees:

    • Rich Signell / USGS / @rsignell-usgs
    • Alex Kerney / Gulf of Maine Research Institute & NorthEast Regional Association of Coastal and Ocean Observing Systems / @abkfenris
    • Anthony Aufdenkampe / LimnoTech / @aufdenkampe
      • Helping USGS NHGF to configure pygeoapi-edr (+ZarrDAP or Xpublish) against the same stac to document XYZT zarr data in S3
    • Joe Hamman / Earthmover / @jhamman
      • started Xpublish
    • Filipe Fernandes / IOOS / @ocefpaf
    • Don Setiawan / UW OOI Regional Cabled Array @lsetiawan
    • Jonathan Joyce / RPS Group / @jonmjoyce
    • Matthew Iannucci / RPS Group / @mpiannucci
    • Dave Blodgett / USGS Water /
    • Andrew Buddenberg / NOAA/NCEI
      • thinks he's in charge of ZarrDAP now
    • Shane Mill / NOAA/NWS / @ShaneMill1
    • Steve Olson / NOAA/NWS / @solson-nws
      • Implementing EDR
    • Jon Blower / National Oceanography Centre, UK / @jonblower
    • Chad Whitney / NOAA/NCEI
    • Paul Tomasula / LimnoTech / @ptomasula
    • Sarah Jordan / LimnoTech / @sjordan29
    • Xavier Nogueira / LimnoTech / @xaviernogueira
    • Dave Stuebe
    • Michah Wengren / IOOS / @mwengren
    • Patrick Tripp / RPSgroup / @patrick-tripp

    Agenda & Notes

    • Intros
      • (Go around by order in attendee list, probably 1-3 min each)
      • who are you, where do you work, background in the space.
    • Why are you/org intrested in working on Discussion & Python
      • Xpublish (Matt)): need a THREDDS replacement (not cloud-ready) data servers?
      • ZarrDAP
        • Chad: Andrew just open-sourced ZarrDAP, but introduced a bug that they need to fix
        • Andrew: We're tired of THREDDS
          • Mark Capece connected ERDAP to ZarrDAP and got a fantastic speedup.
          • Alex's experimentation with replacing Xpublish dataset loading with opening any ERDDAP GridDAP dataset: https://xpublish.onrender.com/docs & https://github.com/abkfenris/xpublish-erddap
        • Dave B: THREDDS team is well-aware of thes issues.
          • THREDDS team taking apart to build microservices from allll THREDDS functionality
          • Key issue with THREDDS is cost of S3 egress fees
          • We need ...
      • PyGeoAPI-EDR
        • Shane building AWS scaling capabilities, which he wants to contribute to PyGeoAPI-EDR
          • AWS API Gateway + Lambda & Fargate, reaching out to ECS.
      • Xpublish update from Joe.
        • Very open to others working on it. Such as Benoit Bovey
        • Could still benefit from more active developers
        • We need example arcitectures that use Xpublish
        • Perhaps a router plugin interface would be useful
      • (similar round robin)
    • What are folks working on?
      • (we can start round robin, but this can move into more of a discussion, we will want to keep moving so we don't get bogged down in any one avenue of work)
      • Demos?
    • How can we work together, rather than duplicate each others efforts?
      • Can XPublish & ZarrDAP efforts or codebase be "merged"?
        • Matt: interesting to see that Xpublish & ZarrDAP seem to have almost identical approaches for accessing the data despite being developed totally independently
      • Alex's vision for Xpublish
        • Make it modular. Maybe a core/plugin/distro interface
          • Xpublish becomes the core, similar to the Linux kernel with a standard set of interfaces for routers and data loaders to interact with.
          • Routers are plugins, so that 'router' interfaces are all separate repos, such as:
            • OpenDAP (via https://github.com/gulfofmaine/xpublish-opendap)
            • EDR (via https://github.com/gulfofmaine/xpublish-edr)
            • WMS (via https://github.com/asascience-open/xpublish-wms)
          • Various deployments will assemble different router and data loading plugins for various use cases.
          • I'll post a more full fledged write up of my idea in the discussions
        • Andrew: Are you suggesting that ZarrDAP be rewritten to be plugin to XPublish? Alex: maybe...
          • Alex: I've made a very alpha OpenDAP Xpublish router ( https://github.com/gulfofmaine/xpublish-opendap ), but you've tested it much more. I'm thinking that you refactor onto Xpublish and adapt your data loading into xpublish.get_dataset. It also means that as we create new Xpublish routers, you can get those for free
      • Caching discussion
        • Dave S: demo of real-time Forecast Model Run Collection (FMRC) for HRRR, with caching using fsspec 'simplecache' command
        • Will post PR for adding the core parts of the HRRR aggregation to https://github.com/asascience-open/nextgen-dmac

    Action items

    • Move conversation to XPublish repo, which is followed a bunch of additional people not on this call.
    • Try to get a regular meeting going. Possibly under the Pangeo umbrella?
    opened by abkfenris 4
  • asyncio.run() cannot be called from a running event loop

    asyncio.run() cannot be called from a running event loop

    Hi, I get this error when executing rest.serve() with

    RuntimeError: asyncio.run() cannot be called from a running event loop
    sys:1: RuntimeWarning: coroutine 'Server.serve' was never awaited
    

    I have xpublish 0.2.0, xarray 2022.6.0, uvicorn 0.18.3 I don't have asyncio in my conda list, is it expected ?

    Actually I need some more explanations on how it works. I already have a code using fastAPI and the uvicorn server installed.

    Should I launch uvicorn for rest.serve() to work ? I get this error when server is switched off too (when I run it in spyder in a quite new environment).

    Thank you

    opened by pierreloicq 2
  • OpenDAP endpoint

    OpenDAP endpoint

    I recently learned about zarrdap. ZarrDAP is a FastAPI project that provides access to Zarr and NetCDF data in remote object storage using the Open-source Project for a Network Data Access Protocol (OPeNDAP).

    This has me wondering if we can plug in the xarray opendap handler here. @markccapece, wondering if you have run across xpublish before and if you have thoughts on how the xarray handler in zarrdap could be used outside of zarrdap?

    xref: #50

    opened by jhamman 3
Releases(0.2.0)
Owner
xarray-contrib
xarray compatible projects
xarray-contrib
FastAPI Admin Dashboard based on FastAPI and Tortoise ORM.

FastAPI ADMIN 中文文档 Introduction FastAPI-Admin is a admin dashboard based on fastapi and tortoise-orm. FastAPI-Admin provide crud feature out-of-the-bo

long2ice 1.6k Dec 31, 2022
Money Transaction is a system based on the recent famous FastAPI.

moneyTransfer Overview Money Transaction is a system based on the recent famous FastAPI. techniques selection System's technique selection is as follo

2 Apr 28, 2021
A simple docker-compose app for orchestrating a fastapi application, a celery queue with rabbitmq(broker) and redis(backend)

fastapi - celery - rabbitmq - redis - Docker A simple docker-compose app for orchestrating a fastapi application, a celery queue with rabbitmq(broker

Kartheekasasanka Kaipa 83 Dec 19, 2022
python template private service

Template for private python service This is a cookiecutter template for an internal REST API service, written in Python, inspired by layout-golang. Th

UrvanovCompany 15 Oct 02, 2022
FastAPI Learning Example,对应中文视频学习教程:https://space.bilibili.com/396891097

视频教学地址 中文学习教程 1、本教程每一个案例都可以独立跑,前提是安装好依赖包。 2、本教程并未按照官方教程顺序,而是按照实际使用顺序编排。 Video Teaching Address FastAPI Learning Example 1.Each case in this tutorial c

381 Dec 11, 2022
Piccolo Admin provides a simple yet powerful admin interface on top of Piccolo tables

Piccolo Admin Piccolo Admin provides a simple yet powerful admin interface on top of Piccolo tables - allowing you to easily add / edit / filter your

188 Jan 09, 2023
Install multiple versions of r2 and its plugins via Pip on any system!

r2env This repository contains the tool available via pip to install and manage multiple versions of radare2 and its plugins. r2-tools doesn't conflic

radare org 18 Oct 11, 2022
An image validator using FastAPI.

fast_api_image_validator An image validator using FastAPI.

Kevin Zehnder 7 Jan 06, 2022
All of the ad-hoc things you're doing to manage incidents today, done for you, and much more!

About What's Dispatch? Put simply, Dispatch is: All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other th

Netflix, Inc. 3.7k Jan 05, 2023
A simple api written in python/fastapi that serves movies from a cassandra table.

A simple api written in python/fastapi that serves movies from a cassandra table. 1)clone the repo 2)rename sample_global_config_.py to global_config.

Sreeraj 1 Aug 26, 2021
Recommend recipes based on what ingredients you have at home

🌱 MyChef 📦 Overview MyChef is an application that helps you decide what meal to make based on what you have at home. Simply enter in ingredients you

Logan Connolly 44 Nov 08, 2022
Analytics service that is part of iter8. Robust analytics and control to unleash cloud-native continuous experimentation.

iter8-analytics iter8 enables statistically robust continuous experimentation of microservices in your CI/CD pipelines. For in-depth information about

16 Oct 14, 2021
FastAPI Boilerplate

FastAPI Boilerplate Features SQlAlchemy session Custom user class Top-level dependency Dependencies for specific permissions Celery SQLAlchemy for asy

Hide 417 Jan 07, 2023
Twitter API with fastAPI

Twitter API with fastAPI Content Forms Cookies and headers management Files edition Status codes HTTPExceptions Docstrings or documentation Deprecate

Juan Agustin Di Pasquo 1 Dec 21, 2021
First API using FastApi

First API using FastApi Made this Simple Api to store and Retrive Student Data of My College Ncc-Bim To View All the endpoits Visit /docs To Run Local

Sameer Joshi 2 Jun 21, 2022
Monitor Python applications using Spring Boot Admin

Pyctuator Monitor Python web apps using Spring Boot Admin. Pyctuator supports Flask, FastAPI, aiohttp and Tornado. Django support is planned as well.

SolarEdge Technologies 145 Dec 28, 2022
Web Version of avatarify to democratize even further

Web-avatarify for image animations This is the code base for this website and its backend. This aims to bring technology closer to everyone, just by a

Carlos Andrés Álvarez Restrepo 66 Nov 09, 2022
FastAPI构建的API服务

使用FastAPI 构建的商城项目API 学习FastAPI 构建项目目录 构建项目接口: 对应博客:https://www.charmcode.cn/article/2020-06-08_vue_mall_api 声明 此项目已经不再维护, 可以参考我另外一个项目https://github.co

王小右 64 Oct 04, 2022
Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.

Introduction Apiestas is a project composed of a backend powered by the awesome framework FastAPI and a crawler powered by Scrapy. This project has fo

Fran Lozano 54 Dec 13, 2022
Async and Sync wrapper client around httpx, fastapi, date stuff

lazyapi Async and Sync wrapper client around httpx, fastapi, and datetime stuff. Motivation This library is forked from an internal project that works

2 Apr 19, 2022