Summary statistics of geospatial raster datasets based on vector geometries.

Overview

rasterstats

BuildStatus

rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal statistics and interpolated point queries. The command-line interface allows for easy interoperability with other GeoJSON tools.

Documentation

For details on installation and usage, visit the documentation at http://pythonhosted.org/rasterstats.

What does it do?

Given a vector layer and a raster band, calculate the summary statistics of each vector geometry. For example, with a polygon vector layer and a digital elevation model (DEM) raster, compute the mean elevation of each polygon.

zones elevation

Command Line Quick Start

The command line interfaces to zonalstats and point_query are rio subcommands which read and write geojson features

$ fio cat polygon.shp | rio zonalstats -r elevation.tif

$ fio cat points.shp | rio pointquery -r elevation.tif

See the CLI Docs. for more detail.

Python Quick Start

For zonal statistics

>>> from rasterstats import zonal_stats
>>> stats = zonal_stats("tests/data/polygons.shp", "tests/data/slope.tif")
>>> stats[0].keys()
dict_keys(['min', 'max', 'mean', 'count'])
>>> [f['mean'] for f in stats]
[14.660084635416666, 56.60576171875]

and for point queries

>>> from rasterstats import point_query
>>> point = {'type': 'Point', 'coordinates': (245309.0, 1000064.0)}
>>> point_query(point, "tests/data/slope.tif")
[74.09817594635244]

Issues

Find a bug? Report it via github issues by providing

  • a link to download the smallest possible raster and vector dataset necessary to reproduce the error
  • python code or command to reproduce the error
  • information on your environment: versions of python, gdal and numpy and system memory
Comments
  • Wrong mean/std stats

    Wrong mean/std stats

    Hi,

    I think there is an error in calculating mean and standard deviation in rasterstats 0.10.3 Here is the statistics calculated in ArcMap for my region:
    Band COUNT MIN MAX MEAN STD MEDIAN 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 51615 3675 52475 10 151057 14939 44531 37989 1722 38133 11 151057 10063 56708 48523 3707 49289 12 151057 13024 46146 34272 5463 34959 13 151057 23169 59005 46253 3292 46804

    And here is the same statistics calculated using rasterstats (have a look at mean and std values for bands 9-13):
    Band count min max mean std percentile_50 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 23182 28669 52475 10 151057 14939 44531 9556 28485 38133 11 151057 10063 56708 20090 28673 49289 12 151057 13024 46146 5839 28953 34959 13 151057 23169 59005 17820 28623 46804

    The code to reproduce the results using rasterstats: for i in range(1, 14): stats = zonal_stats('F:/test.shp', "F:/test_2.dat", stats=['count','min', 'max', 'mean', 'std', 'percentile_50'], all_touched=False, band_num=i, nodata=0)

    Here is the data: https://www.dropbox.com/sh/e8g0jwr1ci0vp2r/AAATlwU2eFjtbtMCO3R5SB1Fa?dl=0

    Can you please let me know what the problem could be?

    opened by yurithefury 17
  • Geopandas and Zonal Statistic error

    Geopandas and Zonal Statistic error

    I run this code

    import geopandas as gpd
    from rasterstats import zonal_stats
    
    geodf = gpd.read_file("SHP/shape.shp")
    zonal_stats(geodf,"raster.tif")
    

    but I get this error:

    ParseException: Unknown type: 'FID'
    ParseException: Unexpected EOF parsing WKB
    Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2820, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
    File "<ipython-input-43-3d15c4dddd4f>", line 1, in <module>
    zonal_stats(geodf,"raster.tif",stats='mean')
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 21, in zonal_stats
    return list(gen_zonal_stats(*args, **kwargs))
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 128, in gen_zonal_stats
    for i, feat in enumerate(features_iter):
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 107, in <genexpr>
    features_iter = (parse_feature(x) for x in obj)
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 70, in parse_feature
    raise ValueError("Can't parse %s as a geojson Feature object" % obj)
    ValueError: Can't parse FID as a geojson Feature object
    

    The data frame looks like this:

    In[43]: geodf
    Out[42]: 
         FID                                      geometry
    
    0     0  POLYGON ((7.662965420348624 45.05600185397847,...
    1     1  POLYGON ((7.665210146514162 45.05607590098681,...
    2     2  POLYGON ((7.663353281768154 45.05601113630467,...
    3     3  POLYGON ((7.660229137844505 45.0570570693801, ...
    4     4  POLYGON ((7.660047499767138 45.0572381357898, ...
    ....
    

    Any suggestion how to solve this problem?

    bug 
    opened by LorenzoBottaccioli 14
  • Add groupby arg to gen_zonal_stats

    Add groupby arg to gen_zonal_stats

    This PR adds a groupby argument to gen_zonal_stats inspired by the ArcGIS gp.ZonalStatistics zone_field argument. The effect is that stats are computed by group, instead of by each individual feature.

    The groupby arg takes a string or a function. If the value is a string, it is assumed to be a field present in each feature properties. If the value is a function, the function's return value will be the groupby key.

    Caveats:

    • The major caveat of this feature is that groups containing overlapping geometry will yield unexpected results as all features are rasterized together...not individually. There are currently no warnings given for this use-case.
    • zone_arrays will be larger as the unioned bounds of the grouped features will make potentially large intermediate rasters
    • All zones will need to be able to fit into memory

    Also included in this PR is an example shapefile for use with testing.

    opened by brendancol 11
  • Partial polygon overlap in raster stats

    Partial polygon overlap in raster stats

    I am using the raster_stats module in python to calculate the mean carbon value for different sets of polygons. Here is the code that I am using:

    africa_sat = 'E:/Research_Work/data/processing_data/carbon_satcchi/data/raw_data/africa_carbon_1km.tif'
    name_clipshp = 'data/grid_clipped/' + name_hol + '_grid_cl.shp' # create object to point to clipped grid file
    
    africa_stats = raster_stats(name_clipshp, africa_sat,stats=['mean'])
    

    I have approximately 260 grid surface areas that I am employing. The grid areas have between ~10,000 polygons and ~40,000 polygons in them. Here is the error message I get for some of grids:

    Traceback (most recent call last): File "", line 16, in File "C:\Python27\lib\site-packages\rasterstats\main.py", line 153, in raster_stats rvds.SetGeoTransform(new_gt) AttributeError: 'NoneType' object has no attribute 'SetGeoTransform'

    I do not get this error message, if the polygon overlaps completely with the raster image (tiff file) or if the polygon does not at all overlap with the polygon. However, if only a part of the polygon overlaps with the raster I get the error that is above.

    Here is a link to a raster and shapefile that created the problem for me:

    https://www.dropbox.com/sh/c80v8v576imd09f/7pSGN-p9lS

    opened by engelmannjens 10
  • Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Hi Matthew,

    I was trying to do the basic zonal statistics and I cannot get around this error. I tried the process with different shapefiles (with polygons). The result was the same. I am running it with Jupyter-Lab

    Here is my env configuration with anaconda: python 3.7.7 rasterio 1.1.4 py37h02db82b_0 conda-forge rasterstats 0.14.0 py_0 requests 2.23.0 pyh8c360ce_2 conda-forge rioxarray 0.0.26 py_0 conda-forge xarray 0.15.1

    Here is my code:

    shapefile=r'G:\gadm\version36 simplified\gadm36_2_simplified.shp' temp_tif=r'E:\weather\data\gpm\late_source\temp.tif' from rasterstats import zonal_stats print(shapefile, temp_tif) zonal_stats(shapefile,temp_tif)

    Using those files: shapefile.zip temp.zip

    Returning this error: error.txt

    Objective: I would like to use the gid_2 as the code to output the zonal statistics for each polygon

    Thanks in advance

    opened by marcello-moreira 9
  • AttributeError when running zonal_stats

    AttributeError when running zonal_stats

    I'm trying to run zonal_stats and I'm getting the following error in the _init_ function of io.py (line 242): AttributeError: 'GeneratorContextManager' object has no attribute 'transform'

    I've recreated the error with a variety of raster and vector data on a fresh virtual environment with rasterstats version 0.13.0. I've experimented with using different combinations of rasterstats/rasterio versions but I'm still seeing the issue. Any insight would be greatly appreciated.

    Thanks!

    opened by HenryW95 9
  • Exit code -1073741819 (0xC0000005)

    Exit code -1073741819 (0xC0000005)

    Hi all, In django console, a simple code as :

    from rasterstats import zonal_stats, point_query
    stats = zonal_stats('Good Raster/wetlands.json', 'Good Raster/tempRaster.tif')
    

    I received : Process finished with exit code -1073741819 (0xC0000005) (Python Crash)

    Here is the files, (geojson and raster): https://www.dropbox.com/sh/gacm4iu00aama4v/AADUm0RD0LYD_0fUV6Mxo3QFa?dl=0

    Version : Windows 10 pro 64 bit PyCharm 2016.3.2 Python 2.7.12 Django 1.10.4 gdal 2.1.3 numpy 1.12.0+mkl rasterio 0.36.0 rasterstats 0.12

    Update: The entire Zonal Statistics project, with OGR, OSR, Gdal, numpy, geojson that I code myself, work good but too dam slow. I need a efficient Zonal Statistic with geojson and a raster. This lib could save me but a critical error occurred when I use it without any other code or import.

    Work in a virtual environment created by Pycharm. All in 32 bit

    Update2 error django console

    When ran into Shell from terminal, I got some error details.

    FATAL: CPLMalloc(): Out of memory allocating 516 bytes. Python crash

    error shell terminal

    Alex

    opened by Neavrys 8
  • Geo raster mini rasters

    Geo raster mini rasters

    @perrygeo Matthew I have changed the file so as to allow an option to use GeoRasters as output if georasters is installed on the system . GeoRasters have a function to write GeoTiffs. They should make life in general easier for the user...let me know if you want to incorporate this version.

    opened by ozak 8
  • New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    This will allow users to generate stats using unprojected / WGS84 data while still accounting for cell area in the calculations used for the "mean" stat. It is just a weighted mean where every cell inside a feature is given a weight based on the latitude of the cells centroid using the haversine function.

    opened by sgoodm 7
  • Rasterstats install on Anaconda

    Rasterstats install on Anaconda

    I seem to be having trouble installing rasterstats successfully in the windows 64bit environment (Windows 7 or Server 2012). I am currently using Anaconda (python 2.7,also tried 3.4) to install, with either the rasterio install instructions (pip install GDAL-2.0.2-cp27-none-win32.whl and rasterio-0.34.0-cp27-cp27m-win32.whl) or via conda forge (https://github.com/conda-forge/rasterio-feedstock).

    In both cases gdal/rasterio will run as expected but when I try to install rasterstats (via rasterstats-0.10.3-py2-none-any.whl or pip install rasterstats) I receive an error "GeoVersion" or "GdalVersion" not found, accompanyied with a "python setup.py egg error code 1".

    I've checked that the corresponding package is installed and the version is up to date. Any ideas what may be causing the install issue?

    Thanks for your time. -Bob

    opened by ghost 7
  • memoryerror exception on raster file in specific projection

    memoryerror exception on raster file in specific projection

    I have a very simple test case, a shapefile, and a raster file in a latlon grid projection. When running stats function it generates an exception:

    python(36508,0x7fff7a4cc000) malloc: *** mach_vm_map(size=362320535494656) failed (error code=3)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "./create_stats.py", line 56, in <module>
        main()
      File "./create_stats.py", line 52, in main
        create_stats(r, p, date)
      File "./create_stats.py", line 22, in create_stats
        geojson = zonal_stats(shp_file, tif_file, stats='mean sum', geojson_out=True)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 21, in zonal_stats
        return list(gen_zonal_stats(*args, **kwargs))
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 136, in gen_zonal_stats
        fsrc = rast.read(bounds=geom_bounds)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/io.py", line 288, in read
        new_array = self.src.read(self.band, window=win, boundless=True, masked=masked)
      File "rasterio/_io.pyx", line 793, in rasterio._io.RasterReader.read (rasterio/_io.c:10053)
    MemoryError
    

    The test data are in http://oka.ags.io/scratch/

    failing code is

    stats = zonal_stats("RRD.shp", "RRD_rice_2014157.tif", stats='mean sum', geojson_out=True)
    

    working code on reprojected data works

    stats = zonal_stats("RRD_sinu.shp", "RRD_rice_2014157_sinu.tif", stats='mean sum', geojson_out=True)
    

    version is '0.10.0'

    opened by demiurg 7
  • ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    Welcome to the python-rasterstats issue tracker. Thanks for putting together a bug report! By following the template below, we'll be better able to reproduce the problem on our end.

    If you don't have a bug specifically but a general support question, please visit https://gis.stackexchange.com/

    Describe the bug

    /usr/local/lib/python3.8/dist-packages/rasterstats/main.py:156: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.

    The warning that appears in the provided code refers to a change in the way that Shapely, a geometry processing library, handles the 'type' attribute of geometries. Instead of using the 'type' attribute, it is recommended to use the 'geom_type' attribute. To fix this warning, the code can be modified to use the 'geom_type' attribute instead of the 'type' attribute when necessary.

    To Reproduce Steps to reproduce the behavior:

    1. How did you install rasterstats and its dependencies?
    2. What datasets are necessary to reproduce the bug? Please provide links to example data if necessary.
    3. What code is necessary to reproduce the bug? Provide the code directly below or provide links to it.
    # Code to reproduce the error
    zonal_stats = rasterstats.zonal_stats(
        output_geojson, vi_array, 
        affine=affine, nodata=nodata,
        stats="min max mean ",
        geojson_out=True,
    ) 
    

    Feature Requests

    python-rasterstats is not currently accepting any feature requests via the issue tracker. If you'd like to add a backwards-compatible feature, please open a pull request - it doesn't need to be 100% ready but should include a working proof-of-concept, tests, and should not break the existing API.

    opened by marianobonelli 0
  • segfault when importing rasterio

    segfault when importing rasterio

    I'm not sure what is happening but as soon as I import rasterio I get a segfault (using the test data)

    >>> import rasterio # this will cause the segfault
    >>> from rasterstats import zonal_stats
    >>> 
    >>> stats = zonal_stats(
    ...     "polygons.shp",
    ...     "slope.tif",
    ... )
    [1]    3980 segmentation fault  python3
    

    name = "rasterstats" version = "0.17.0"

    name = "rasterio" version = "1.3.4"

    opened by TimMcCauley 1
  • Performance optimizations

    Performance optimizations

    A small collection of performance enhancements aimed at reducing the number of times the data is traversed.

    Also improved percentile computation by computing them all at once instead of one at a time (and avoid recreating the compressed array each time).

    opened by groutr 1
  • Make int64 casting optional

    Make int64 casting optional

    With 'large' raster datasets of integer datatype, rasterstats wastefully casts the ndarray read from the raster up to int64. In the extreme case, this uses 8 times more memory than necessary if the underlying datatype of the raster dataset is Byte.

    At least for categorical rasters and when using categorical=True, it seems desirable to disable this behaviour to save memory.

    In a not particularly extreme case, this saved us on the order of 20G of memory:

    The raster datasets covers all of Scotland and is of the Byte type. The individual features in the vector dataset are the Scottish country boundaries and therefore cover a decent portion of the raster area (and their bounding box an even larger portion).

    The raster dataset is 46478 * 69365 pixels, 1 byte per pixel = 3.00GiB uncompressed in memory if stored in an ndarray of dtype int8. It would be 24GiB in int64.

    opened by maxfenv 3
  • Resolve shapely deprecation warnings; use pytest.importorskip

    Resolve shapely deprecation warnings; use pytest.importorskip

    This PR resolves a few shapely deprecation warnings:

    • Use shapely.errors.ShapelyError (ReadingError was also only from shapely.errors)
    • Use geom.geom_type instead of geom.type
    • Fix inconsistent dimensionality error with test polygon
    • Use pytest.importorskip feature
    opened by mwtoews 1
  • missing `click` dependency

    missing `click` dependency

    Hi, this is more a proposal than a bug report. Since python-rasterstats directly imports click: https://github.com/perrygeo/python-rasterstats/blob/5da3110230df44e228ba21f16a9d31ab7e84e0d5/src/rasterstats/cli.py#L6

    and since there's a known issue with click<7.1 (#254) , adding a click>=7.1 in requirements.txt would be great.

    The following is a real case scenario in which it would come in handy:

    on a centos8/rocky8 server:

    $ python3 -mvenv --system-site-packages env
    $ env/bin/pip install rasterstats
    # ... various logs ...
    
    $ env/bin/python -c "import rasterstats"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/__init__.py", line 4, in <module>
        from rasterstats import cli
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/cli.py", line 28, in <module>
        @cligj.use_rs_opt
      File "/usr/lib/python3.6/site-packages/click/decorators.py", line 170, in decorator
        _param_memo(f, OptionClass(param_decls, **attrs))
      File "/usr/lib/python3.6/site-packages/click/core.py", line 1510, in __init__
        raise TypeError('Got secondary option for non boolean flag.')
    TypeError: Got secondary option for non boolean flag.
    

    (then after some googling I saw #254 and forced the update of click)

    opened by brancomat 2
Releases(0.17.0)
Owner
Matthew Perry
Matthew Perry
Logging the position of the car on an sdcard

audi-mmi-3g-gps-logging Logging the position of the car on an sdcard, startup script origin not clear to me, logging setup and time change is what I d

2 May 31, 2022
Code and coordinates for Matt's 2021 xmas tree

xmastree2021 Code and coordinates for Matt's 2021 xmas tree This repository contains the code and coordinates used for Matt's 2021 Christmas tree, as

Stand-up Maths 117 Jan 01, 2023
Spatial Interpolation Toolbox is a Python-based GUI that is able to interpolate spatial data in vector format.

Spatial Interpolation Toolbox This is the home to Spatial Interpolation Toolbox, a graphical user interface (GUI) for interpolating geographic vector

Michael Ward 2 Nov 01, 2021
A compilation of several single-beam bathymetry surveys of the Caribbean

Caribbean - Single-beam bathymetry This dataset is a compilation of several single-beam bathymetry surveys of the Caribbean ocean displaying a wide ra

Fatiando a Terra Datasets 0 Jan 20, 2022
Evaluation of file formats in the context of geo-referenced 3D geometries.

Geo-referenced Geometry File Formats Classic geometry file formats as .obj, .off, .ply, .stl or .dae do not support the utilization of coordinate syst

Advanced Information Systems and Technology 11 Mar 02, 2022
Stitch image tiles into larger composite TIFs

untiler Utility to take a directory of {z}/{x}/{y}.(jpg|png) tiles, and stitch into a scenetiff (tif w/ exact merc tile bounds). Future versions will

Mapbox 38 Dec 16, 2022
geobeam - adds GIS capabilities to your Apache Beam and Dataflow pipelines.

geobeam adds GIS capabilities to your Apache Beam pipelines. What does geobeam do? geobeam enables you to ingest and analyze massive amounts of geospa

Google Cloud Platform 61 Nov 08, 2022
PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Background Activation Suppression for Weakly Supervised Object Localization PyTorch implementation of ''Background Activation Suppression for Weakly S

34 Dec 27, 2022
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment

Qiusheng Wu 1.4k Jan 02, 2023
Specification for storing geospatial vector data (point, line, polygon) in Parquet

GeoParquet About This repository defines how to store geospatial vector data (point, lines, polygons) in Apache Parquet, a popular columnar storage fo

Open Geospatial Consortium 449 Dec 27, 2022
ESMAC diags - Earth System Model Aerosol-Cloud Diagnostics Package

Earth System Model Aerosol-Cloud Diagnostics Package This Earth System Model (ES

Pacific Northwest National Laboratory 1 Jan 04, 2022
Read and write rasters in parallel using Rasterio and Dask

dask-rasterio dask-rasterio provides some methods for reading and writing rasters in parallel using Rasterio and Dask arrays. Usage Read a multiband r

Dymaxion Labs 85 Aug 30, 2022
This app displays interesting statistical weather records and trends which can be used in climate related research including study of global warming.

This app displays interesting statistical weather records and trends which can be used in climate related research including study of global warming.

0 Dec 27, 2021
Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

Apoorva Lal 5 May 18, 2022
A Python tool to display geolocation information in the traceroute.

IP2Trace Python IP2Trace Python is a Python tool allowing user to get IP address information such as country, region, city, latitude, longitude, zip c

IP2Location 22 Jan 08, 2023
Asynchronous Client for the worlds fastest in-memory geo-database Tile38

This is an asynchonous Python client for Tile38 that allows for fast and easy interaction with the worlds fastest in-memory geodatabase Tile38.

Ben 53 Dec 29, 2022
A python package that extends Google Earth Engine.

A python package that extends Google Earth Engine GitHub: https://github.com/davemlz/eemont Documentation: https://eemont.readthedocs.io/ PyPI: https:

David Montero Loaiza 307 Jan 01, 2023
python toolbox for visualizing geographical data and making maps

geoplotlib is a python toolbox for visualizing geographical data and making maps data = read_csv('data/bus.csv') geoplotlib.dot(data) geoplotlib.show(

Andrea Cuttone 976 Dec 11, 2022
Implemented a Google Maps prototype that provides the shortest route in terms of distance

Implemented a Google Maps prototype that provides the shortest route in terms of distance, the fastest route, the route with the fewest turns, and a scenic route that avoids roads when provided a sou

1 Dec 26, 2021
Imperial Valley Geomorphology Map

Roughly maps the extent of basins, basin edges, and mountains in the Imperial Valley by grouping terrain classes from the Iwahashi et al. 2021 California terrian classification model.

0 Dec 13, 2022