Summary statistics of geospatial raster datasets based on vector geometries.

Overview

rasterstats

BuildStatus

rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal statistics and interpolated point queries. The command-line interface allows for easy interoperability with other GeoJSON tools.

Documentation

For details on installation and usage, visit the documentation at http://pythonhosted.org/rasterstats.

What does it do?

Given a vector layer and a raster band, calculate the summary statistics of each vector geometry. For example, with a polygon vector layer and a digital elevation model (DEM) raster, compute the mean elevation of each polygon.

zones elevation

Command Line Quick Start

The command line interfaces to zonalstats and point_query are rio subcommands which read and write geojson features

$ fio cat polygon.shp | rio zonalstats -r elevation.tif

$ fio cat points.shp | rio pointquery -r elevation.tif

See the CLI Docs. for more detail.

Python Quick Start

For zonal statistics

>>> from rasterstats import zonal_stats
>>> stats = zonal_stats("tests/data/polygons.shp", "tests/data/slope.tif")
>>> stats[0].keys()
dict_keys(['min', 'max', 'mean', 'count'])
>>> [f['mean'] for f in stats]
[14.660084635416666, 56.60576171875]

and for point queries

>>> from rasterstats import point_query
>>> point = {'type': 'Point', 'coordinates': (245309.0, 1000064.0)}
>>> point_query(point, "tests/data/slope.tif")
[74.09817594635244]

Issues

Find a bug? Report it via github issues by providing

  • a link to download the smallest possible raster and vector dataset necessary to reproduce the error
  • python code or command to reproduce the error
  • information on your environment: versions of python, gdal and numpy and system memory
Comments
  • Wrong mean/std stats

    Wrong mean/std stats

    Hi,

    I think there is an error in calculating mean and standard deviation in rasterstats 0.10.3 Here is the statistics calculated in ArcMap for my region:
    Band COUNT MIN MAX MEAN STD MEDIAN 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 51615 3675 52475 10 151057 14939 44531 37989 1722 38133 11 151057 10063 56708 48523 3707 49289 12 151057 13024 46146 34272 5463 34959 13 151057 23169 59005 46253 3292 46804

    And here is the same statistics calculated using rasterstats (have a look at mean and std values for bands 9-13):
    Band count min max mean std percentile_50 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 23182 28669 52475 10 151057 14939 44531 9556 28485 38133 11 151057 10063 56708 20090 28673 49289 12 151057 13024 46146 5839 28953 34959 13 151057 23169 59005 17820 28623 46804

    The code to reproduce the results using rasterstats: for i in range(1, 14): stats = zonal_stats('F:/test.shp', "F:/test_2.dat", stats=['count','min', 'max', 'mean', 'std', 'percentile_50'], all_touched=False, band_num=i, nodata=0)

    Here is the data: https://www.dropbox.com/sh/e8g0jwr1ci0vp2r/AAATlwU2eFjtbtMCO3R5SB1Fa?dl=0

    Can you please let me know what the problem could be?

    opened by yurithefury 17
  • Geopandas and Zonal Statistic error

    Geopandas and Zonal Statistic error

    I run this code

    import geopandas as gpd
    from rasterstats import zonal_stats
    
    geodf = gpd.read_file("SHP/shape.shp")
    zonal_stats(geodf,"raster.tif")
    

    but I get this error:

    ParseException: Unknown type: 'FID'
    ParseException: Unexpected EOF parsing WKB
    Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2820, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
    File "<ipython-input-43-3d15c4dddd4f>", line 1, in <module>
    zonal_stats(geodf,"raster.tif",stats='mean')
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 21, in zonal_stats
    return list(gen_zonal_stats(*args, **kwargs))
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 128, in gen_zonal_stats
    for i, feat in enumerate(features_iter):
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 107, in <genexpr>
    features_iter = (parse_feature(x) for x in obj)
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 70, in parse_feature
    raise ValueError("Can't parse %s as a geojson Feature object" % obj)
    ValueError: Can't parse FID as a geojson Feature object
    

    The data frame looks like this:

    In[43]: geodf
    Out[42]: 
         FID                                      geometry
    
    0     0  POLYGON ((7.662965420348624 45.05600185397847,...
    1     1  POLYGON ((7.665210146514162 45.05607590098681,...
    2     2  POLYGON ((7.663353281768154 45.05601113630467,...
    3     3  POLYGON ((7.660229137844505 45.0570570693801, ...
    4     4  POLYGON ((7.660047499767138 45.0572381357898, ...
    ....
    

    Any suggestion how to solve this problem?

    bug 
    opened by LorenzoBottaccioli 14
  • Add groupby arg to gen_zonal_stats

    Add groupby arg to gen_zonal_stats

    This PR adds a groupby argument to gen_zonal_stats inspired by the ArcGIS gp.ZonalStatistics zone_field argument. The effect is that stats are computed by group, instead of by each individual feature.

    The groupby arg takes a string or a function. If the value is a string, it is assumed to be a field present in each feature properties. If the value is a function, the function's return value will be the groupby key.

    Caveats:

    • The major caveat of this feature is that groups containing overlapping geometry will yield unexpected results as all features are rasterized together...not individually. There are currently no warnings given for this use-case.
    • zone_arrays will be larger as the unioned bounds of the grouped features will make potentially large intermediate rasters
    • All zones will need to be able to fit into memory

    Also included in this PR is an example shapefile for use with testing.

    opened by brendancol 11
  • Partial polygon overlap in raster stats

    Partial polygon overlap in raster stats

    I am using the raster_stats module in python to calculate the mean carbon value for different sets of polygons. Here is the code that I am using:

    africa_sat = 'E:/Research_Work/data/processing_data/carbon_satcchi/data/raw_data/africa_carbon_1km.tif'
    name_clipshp = 'data/grid_clipped/' + name_hol + '_grid_cl.shp' # create object to point to clipped grid file
    
    africa_stats = raster_stats(name_clipshp, africa_sat,stats=['mean'])
    

    I have approximately 260 grid surface areas that I am employing. The grid areas have between ~10,000 polygons and ~40,000 polygons in them. Here is the error message I get for some of grids:

    Traceback (most recent call last): File "", line 16, in File "C:\Python27\lib\site-packages\rasterstats\main.py", line 153, in raster_stats rvds.SetGeoTransform(new_gt) AttributeError: 'NoneType' object has no attribute 'SetGeoTransform'

    I do not get this error message, if the polygon overlaps completely with the raster image (tiff file) or if the polygon does not at all overlap with the polygon. However, if only a part of the polygon overlaps with the raster I get the error that is above.

    Here is a link to a raster and shapefile that created the problem for me:

    https://www.dropbox.com/sh/c80v8v576imd09f/7pSGN-p9lS

    opened by engelmannjens 10
  • Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Hi Matthew,

    I was trying to do the basic zonal statistics and I cannot get around this error. I tried the process with different shapefiles (with polygons). The result was the same. I am running it with Jupyter-Lab

    Here is my env configuration with anaconda: python 3.7.7 rasterio 1.1.4 py37h02db82b_0 conda-forge rasterstats 0.14.0 py_0 requests 2.23.0 pyh8c360ce_2 conda-forge rioxarray 0.0.26 py_0 conda-forge xarray 0.15.1

    Here is my code:

    shapefile=r'G:\gadm\version36 simplified\gadm36_2_simplified.shp' temp_tif=r'E:\weather\data\gpm\late_source\temp.tif' from rasterstats import zonal_stats print(shapefile, temp_tif) zonal_stats(shapefile,temp_tif)

    Using those files: shapefile.zip temp.zip

    Returning this error: error.txt

    Objective: I would like to use the gid_2 as the code to output the zonal statistics for each polygon

    Thanks in advance

    opened by marcello-moreira 9
  • AttributeError when running zonal_stats

    AttributeError when running zonal_stats

    I'm trying to run zonal_stats and I'm getting the following error in the _init_ function of io.py (line 242): AttributeError: 'GeneratorContextManager' object has no attribute 'transform'

    I've recreated the error with a variety of raster and vector data on a fresh virtual environment with rasterstats version 0.13.0. I've experimented with using different combinations of rasterstats/rasterio versions but I'm still seeing the issue. Any insight would be greatly appreciated.

    Thanks!

    opened by HenryW95 9
  • Exit code -1073741819 (0xC0000005)

    Exit code -1073741819 (0xC0000005)

    Hi all, In django console, a simple code as :

    from rasterstats import zonal_stats, point_query
    stats = zonal_stats('Good Raster/wetlands.json', 'Good Raster/tempRaster.tif')
    

    I received : Process finished with exit code -1073741819 (0xC0000005) (Python Crash)

    Here is the files, (geojson and raster): https://www.dropbox.com/sh/gacm4iu00aama4v/AADUm0RD0LYD_0fUV6Mxo3QFa?dl=0

    Version : Windows 10 pro 64 bit PyCharm 2016.3.2 Python 2.7.12 Django 1.10.4 gdal 2.1.3 numpy 1.12.0+mkl rasterio 0.36.0 rasterstats 0.12

    Update: The entire Zonal Statistics project, with OGR, OSR, Gdal, numpy, geojson that I code myself, work good but too dam slow. I need a efficient Zonal Statistic with geojson and a raster. This lib could save me but a critical error occurred when I use it without any other code or import.

    Work in a virtual environment created by Pycharm. All in 32 bit

    Update2 error django console

    When ran into Shell from terminal, I got some error details.

    FATAL: CPLMalloc(): Out of memory allocating 516 bytes. Python crash

    error shell terminal

    Alex

    opened by Neavrys 8
  • Geo raster mini rasters

    Geo raster mini rasters

    @perrygeo Matthew I have changed the file so as to allow an option to use GeoRasters as output if georasters is installed on the system . GeoRasters have a function to write GeoTiffs. They should make life in general easier for the user...let me know if you want to incorporate this version.

    opened by ozak 8
  • New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    This will allow users to generate stats using unprojected / WGS84 data while still accounting for cell area in the calculations used for the "mean" stat. It is just a weighted mean where every cell inside a feature is given a weight based on the latitude of the cells centroid using the haversine function.

    opened by sgoodm 7
  • Rasterstats install on Anaconda

    Rasterstats install on Anaconda

    I seem to be having trouble installing rasterstats successfully in the windows 64bit environment (Windows 7 or Server 2012). I am currently using Anaconda (python 2.7,also tried 3.4) to install, with either the rasterio install instructions (pip install GDAL-2.0.2-cp27-none-win32.whl and rasterio-0.34.0-cp27-cp27m-win32.whl) or via conda forge (https://github.com/conda-forge/rasterio-feedstock).

    In both cases gdal/rasterio will run as expected but when I try to install rasterstats (via rasterstats-0.10.3-py2-none-any.whl or pip install rasterstats) I receive an error "GeoVersion" or "GdalVersion" not found, accompanyied with a "python setup.py egg error code 1".

    I've checked that the corresponding package is installed and the version is up to date. Any ideas what may be causing the install issue?

    Thanks for your time. -Bob

    opened by ghost 7
  • memoryerror exception on raster file in specific projection

    memoryerror exception on raster file in specific projection

    I have a very simple test case, a shapefile, and a raster file in a latlon grid projection. When running stats function it generates an exception:

    python(36508,0x7fff7a4cc000) malloc: *** mach_vm_map(size=362320535494656) failed (error code=3)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "./create_stats.py", line 56, in <module>
        main()
      File "./create_stats.py", line 52, in main
        create_stats(r, p, date)
      File "./create_stats.py", line 22, in create_stats
        geojson = zonal_stats(shp_file, tif_file, stats='mean sum', geojson_out=True)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 21, in zonal_stats
        return list(gen_zonal_stats(*args, **kwargs))
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 136, in gen_zonal_stats
        fsrc = rast.read(bounds=geom_bounds)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/io.py", line 288, in read
        new_array = self.src.read(self.band, window=win, boundless=True, masked=masked)
      File "rasterio/_io.pyx", line 793, in rasterio._io.RasterReader.read (rasterio/_io.c:10053)
    MemoryError
    

    The test data are in http://oka.ags.io/scratch/

    failing code is

    stats = zonal_stats("RRD.shp", "RRD_rice_2014157.tif", stats='mean sum', geojson_out=True)
    

    working code on reprojected data works

    stats = zonal_stats("RRD_sinu.shp", "RRD_rice_2014157_sinu.tif", stats='mean sum', geojson_out=True)
    

    version is '0.10.0'

    opened by demiurg 7
  • ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    Welcome to the python-rasterstats issue tracker. Thanks for putting together a bug report! By following the template below, we'll be better able to reproduce the problem on our end.

    If you don't have a bug specifically but a general support question, please visit https://gis.stackexchange.com/

    Describe the bug

    /usr/local/lib/python3.8/dist-packages/rasterstats/main.py:156: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.

    The warning that appears in the provided code refers to a change in the way that Shapely, a geometry processing library, handles the 'type' attribute of geometries. Instead of using the 'type' attribute, it is recommended to use the 'geom_type' attribute. To fix this warning, the code can be modified to use the 'geom_type' attribute instead of the 'type' attribute when necessary.

    To Reproduce Steps to reproduce the behavior:

    1. How did you install rasterstats and its dependencies?
    2. What datasets are necessary to reproduce the bug? Please provide links to example data if necessary.
    3. What code is necessary to reproduce the bug? Provide the code directly below or provide links to it.
    # Code to reproduce the error
    zonal_stats = rasterstats.zonal_stats(
        output_geojson, vi_array, 
        affine=affine, nodata=nodata,
        stats="min max mean ",
        geojson_out=True,
    ) 
    

    Feature Requests

    python-rasterstats is not currently accepting any feature requests via the issue tracker. If you'd like to add a backwards-compatible feature, please open a pull request - it doesn't need to be 100% ready but should include a working proof-of-concept, tests, and should not break the existing API.

    opened by marianobonelli 0
  • segfault when importing rasterio

    segfault when importing rasterio

    I'm not sure what is happening but as soon as I import rasterio I get a segfault (using the test data)

    >>> import rasterio # this will cause the segfault
    >>> from rasterstats import zonal_stats
    >>> 
    >>> stats = zonal_stats(
    ...     "polygons.shp",
    ...     "slope.tif",
    ... )
    [1]    3980 segmentation fault  python3
    

    name = "rasterstats" version = "0.17.0"

    name = "rasterio" version = "1.3.4"

    opened by TimMcCauley 1
  • Performance optimizations

    Performance optimizations

    A small collection of performance enhancements aimed at reducing the number of times the data is traversed.

    Also improved percentile computation by computing them all at once instead of one at a time (and avoid recreating the compressed array each time).

    opened by groutr 1
  • Make int64 casting optional

    Make int64 casting optional

    With 'large' raster datasets of integer datatype, rasterstats wastefully casts the ndarray read from the raster up to int64. In the extreme case, this uses 8 times more memory than necessary if the underlying datatype of the raster dataset is Byte.

    At least for categorical rasters and when using categorical=True, it seems desirable to disable this behaviour to save memory.

    In a not particularly extreme case, this saved us on the order of 20G of memory:

    The raster datasets covers all of Scotland and is of the Byte type. The individual features in the vector dataset are the Scottish country boundaries and therefore cover a decent portion of the raster area (and their bounding box an even larger portion).

    The raster dataset is 46478 * 69365 pixels, 1 byte per pixel = 3.00GiB uncompressed in memory if stored in an ndarray of dtype int8. It would be 24GiB in int64.

    opened by maxfenv 3
  • Resolve shapely deprecation warnings; use pytest.importorskip

    Resolve shapely deprecation warnings; use pytest.importorskip

    This PR resolves a few shapely deprecation warnings:

    • Use shapely.errors.ShapelyError (ReadingError was also only from shapely.errors)
    • Use geom.geom_type instead of geom.type
    • Fix inconsistent dimensionality error with test polygon
    • Use pytest.importorskip feature
    opened by mwtoews 1
  • missing `click` dependency

    missing `click` dependency

    Hi, this is more a proposal than a bug report. Since python-rasterstats directly imports click: https://github.com/perrygeo/python-rasterstats/blob/5da3110230df44e228ba21f16a9d31ab7e84e0d5/src/rasterstats/cli.py#L6

    and since there's a known issue with click<7.1 (#254) , adding a click>=7.1 in requirements.txt would be great.

    The following is a real case scenario in which it would come in handy:

    on a centos8/rocky8 server:

    $ python3 -mvenv --system-site-packages env
    $ env/bin/pip install rasterstats
    # ... various logs ...
    
    $ env/bin/python -c "import rasterstats"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/__init__.py", line 4, in <module>
        from rasterstats import cli
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/cli.py", line 28, in <module>
        @cligj.use_rs_opt
      File "/usr/lib/python3.6/site-packages/click/decorators.py", line 170, in decorator
        _param_memo(f, OptionClass(param_decls, **attrs))
      File "/usr/lib/python3.6/site-packages/click/core.py", line 1510, in __init__
        raise TypeError('Got secondary option for non boolean flag.')
    TypeError: Got secondary option for non boolean flag.
    

    (then after some googling I saw #254 and forced the update of click)

    opened by brancomat 2
Releases(0.17.0)
Owner
Matthew Perry
Matthew Perry
A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets

Notebooks A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets This repository provides tools

NASA Jet Propulsion Laboratory 27 Oct 25, 2022
Django model field that can hold a geoposition, and corresponding widget

django-geoposition A model field that can hold a geoposition (latitude/longitude), and corresponding admin/form widget. Prerequisites Starting with ve

Philipp Bosch 324 Oct 17, 2022
Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.

Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.

Ayush Mishra 3 May 06, 2022
Record railway train route profile with GNSS tools

Train route profile recording with GNSS technology based on ARDUINO platform Project target Develop GNSS recording tools based on the ARDUINO platform

tomcom 1 Jan 01, 2022
A public data repository for datasets created from TransLink GTFS data.

TransLink Spatial Data What: TransLink is the statutory public transit authority for the Metro Vancouver region. This GitHub repository is a collectio

Henry Tang 3 Jan 14, 2022
Python script to locate mobile number

Python script to locate mobile number How to use this script run the command to install the required libraries pip install -r requirements.txt run the

Shekhar Gupta 8 Oct 10, 2022
A multi-page streamlit app for the geospatial community.

A multi-page streamlit app for the geospatial community.

Qiusheng Wu 522 Dec 30, 2022
Zora is a python program that searches for GeoLocation info for given CIDR networks , with options to search with API or without API

Zora Zora is a python program that searches for GeoLocation info for given CIDR networks , with options to search with API or without API Installing a

z3r0day 1 Oct 26, 2021
Advanced raster and geometry manipulations

buzzard In a nutshell, the buzzard library provides powerful abstractions to manipulate together images and geometries that come from different kind o

Earthcube Lab 30 Jun 20, 2022
WhiteboxTools Python Frontend

whitebox-python Important Note This repository is related to the WhiteboxTools Python Frontend only. You can report issues to this repo if you have pr

Qiusheng Wu 304 Dec 15, 2022
prettymaps - A minimal Python library to draw customized maps from OpenStreetMap data.

A small set of Python functions to draw pretty maps from OpenStreetMap data. Based on osmnx, matplotlib and shapely libraries.

Marcelo de Oliveira Rosa Prates 9k Jan 08, 2023
Python package for earth-observing satellite data processing

Satpy The Satpy package is a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data f

PyTroll 882 Dec 27, 2022
:earth_asia: Python Geocoder

Python Geocoder Simple and consistent geocoding library written in Python. Table of content Overview A glimpse at the API Forward Multiple results Rev

Denis 1.5k Jan 02, 2023
Python bindings and utilities for GeoJSON

geojson This Python library contains: Functions for encoding and decoding GeoJSON formatted data Classes for all GeoJSON Objects An implementation of

Jazzband 765 Jan 06, 2023
Tool to suck data from ArcGIS Server and spit it into PostgreSQL

chupaESRI About ChupaESRI is a Python module/command line tool to extract features from ArcGIS Server map services. Name? Think "chupacabra" or "Chupa

John Reiser 34 Dec 04, 2022
Color correction plugin for rasterio

rio-color A rasterio plugin for applying basic color-oriented image operations to geospatial rasters. Goals No heavy dependencies: rio-color is purpos

Mapbox 111 Nov 15, 2022
Python interface to PROJ (cartographic projections and coordinate transformations library)

pyproj Python interface to PROJ (cartographic projections and coordinate transformations library). Documentation Stable: http://pyproj4.github.io/pypr

832 Dec 31, 2022
Summary statistics of geospatial raster datasets based on vector geometries.

rasterstats rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal stat

Matthew Perry 437 Dec 23, 2022
User friendly Rasterio plugin to read raster datasets.

rio-tiler User friendly Rasterio plugin to read raster datasets. Documentation: https://cogeotiff.github.io/rio-tiler/ Source Code: https://github.com

372 Dec 23, 2022
GetOSM is an OpenStreetMap tile downloader written in Python that is agnostic of GUI frameworks.

GetOSM GetOSM is an OpenStreetMap tile downloader written in Python that is agnostic of GUI frameworks. It is used with tkinter by ProjPicker. Require

Huidae Cho 3 May 20, 2022