Specification for storing geospatial vector data (point, line, polygon) in Parquet

Overview

GeoParquet

About

This repository defines how to store geospatial vector data (point, lines, polygons) in Apache Parquet, a popular columnar storage format for tabular data - see this vendor explanation for more on what that means. Our goal is to standardize how geospatial data is represented in Parquet to further geospatial interoperability among tools using Parquet today, and hopefully help push forward what's possible with 'cloud-native geospatial' workflows.

Warning: This is not (yet) a stable specification that can be relied upon. All 0.X releases are made to gather wider feedback, and we anticipate that some things may change. For now we reserve the right to make changes in backwards incompatible ways (though will try not to), see the versioning section below for more info. If you are excited about the potential please collaborate with us by building implementations, sounding in on the issues and contributing PR's!

Early contributors include developers from GeoPandas, GeoTrellis, OpenLayers, Vis.gl, Voltron Data, Microsoft, Carto, Azavea, Planet & Unfolded. Anyone is welcome to join us, by building implementations, trying it out, giving feedback through issues and contributing to the spec via pull requests. Initial work started in the geo-arrow-spec GeoPandas repository, and that will continue on Arrow work in a compatible way, with this specification focused solely on Parquet.

Goals

There are a few core goals driving the initial development.

  • Establish a great geospatial format for workflows that excel with columnar data - Most data science and 'business intelligence' workflows have been moving towards columnar data, but current geospatial formats can not be as efficiently loaded as other data. So we aim to bring geospatial data best practices to one of the most popular formats, and hopefully establish a good pattern for how to do so.
  • Introduce columnar data formats to the geospatial world - And most of the geospatial world is not yet benefitting from all the breakthroughs in data analysis in the broader IT world, so we are excited to enable interesting geospatial analysis with a wider range of tools.
  • Enable interoperability among cloud data warehouses - BigQuery, Snowflake, Redshift and others all support spatial operations but importing and exporting data with existing formats can be problematic. All support and often recommend Parquet, so defining a solid GeoParquet can help enable interoperability.
  • Persist geospatial data from Apache Arrow - GeoParquet is developed in parallel with a GeoArrow spec, to enable cross-language in-memory analytics of geospatial information with Arrow. Parquet is already well-supported by Arrow as the key on disk persistance format.

And our broader goal is to innovate with 'cloud-native vector' providing a stable base to try out new ideas for cloud-native & streaming workflows.

Features

A quick overview of what geoparquet supports (or at least plans to support).

  • Multiple spatial reference systems - Many tools will use GeoParquet for high-performance analysis, so it's important to be able to use data in its native projection. But we do provide a clear default recommendation to better enable interoperability, giving a clear target for implementations that don't want to worry about projections.
  • Multiple geometry columns - There is a default geometry column, but additional geometry columns can be included.
  • Great compression / small files - Parquet is designed to compress very well, so data benefits by taking up less disk space & being more efficient over the network.
  • Work with both planar and spherical coordinates - Most cloud data warehouses support spherical coordinates, and so GeoParquet aims to help persist those and be clear about what is supported.
  • Great at read-heavy analytic workflows - Columnar formats enable cheap reading of a subset of columns, and Parquet in particular enables efficient filtering of chunks based on column statistics, so the format will perform well in a variety of modern analytic workflows.
  • Support for data partitioning - Parquet has a nice ability to partition data into different files for efficiency, and we aim to enable geospatial partitions.
  • Enable spatial indices - To enable top performance a spatial index is essential. This will be the focus of a future release.

It should be noted what GeoParquet is less good for. The biggest one is that it is not a good choice for write-heavy interactions. A row-based format will work much better if it is backing a system that is constantly updating the data and adding new data.

Roadmap

Our aim is to get to a 1.0.0 within 'months', not years. The rough plan is:

  • 0.1 - Get the basics established, provide a target for implementations to start building against.
  • 0.2 / 0.3 - Feedback from implementations, 3D coordinates support, geometry types, crs optional.
  • 0.x - Several iterations based on feedback from implementations, spatial index best practices.
  • 1.0.0-RC.1 - Aim for this when there are at least 6 implementations that all work interoperably and all feel good about the spec.
  • 1.0.0 - Once there are 12(?) implementations in diverse languages we will lock in for 1.0

Our detailed roadmap is in the Milestones and we'll aim to keep it up to date.

Versioning

After we reach version 1.0 we will follow SemVer, so at that point any breaking change will require the spec to go to 2.0.0. Currently implementors should expect breaking changes, though at some point, hopefully relatively soon (0.4?), we will declare that we don't think there will be any more potential breaking changes. Though the full commitment to that won't be made until 1.0.0.

Current Implementations & Examples

Examples of geoparquet files following the current spec can be found in the examples/ folder. There is also a larger sample dataset nz-building-outlines.parquet available on Google Cloud Storage.

Currently known libraries that can read and write GeoParquet files:

Comments
  • Define polygon orientation rules

    Define polygon orientation rules

    I think the standard should define polygon orientation.

    1. Spherical edges case

    With spherical edges on sphere, there is an ambiguity in polygon definition, if the system allows polygons larger than hemisphere.

    A sequence of vertices that define a polygon boundary can define either polygon to the left of that line, or to the right of the line. E.g. global coastal line can define either continents or oceans. Systems that support polygons larger than hemisphere usually use orientation rule to solve this ambiguity. E.g. MS SQL, Google BigQuery interpret the side to the left of the line as the content of the ring.

    2. Planar edges case

    Planar case does not have such ambiguity, but it is still good idea to have specific rule.

    E.g. GeoJson RFC defines a rule consistent with the rule above:

       o  Polygon rings MUST follow the right-hand rule for orientation
          (counterclockwise external rings, clockwise internal rings).
    
    opened by mentin 35
  • Script to write nz-building-outlines to geoparquet 0.4.0

    Script to write nz-building-outlines to geoparquet 0.4.0

    @cholmes asked me to write a script to update the nz-building-outlines file to Parquet version 0.4.0. This reads from the GeoPackage version of the data, which you can download from here (1.3GB).

    Part of this is derived from the GeoPandas code here. But this additionally ~~uses pyogrio~~ (reverted because it failed to build on CI) and pygeos to try and speed things up a little bit. It's probably not a bad thing to have a Python script here because the GeoPandas release schedule is likely slower than our release schedule (at least thus far).

    You can install the new dependencies with poetry install and run it with

    poetry run python write_nz_building_outline.py \
        --input nz-building-outlines.gpkg \
        --output nz-building-outlines.parquet \
        --compression SNAPPY
    

    This takes about 5 minutes on my computer. With Snappy compression, the 1.3GB GeoPackage file became 410MB in Parquet. (375MB with ZSTD compression).

    To see the CLI options you can run

    poetry run python write_nz_building_outline.py --help
    

    Closes https://github.com/opengeospatial/geoparquet/issues/42

    opened by kylebarron 21
  • Add validator script for Python based on JSON Schema

    Add validator script for Python based on JSON Schema

    Implements https://github.com/opengeospatial/geoparquet/issues/23.

    This PR adds a draft for a basic validator using Python. It checks the metadata of a geoparquet file using JSON Schema, but it can be extended to include specific custom validations.

    Example

    Try to validate this wrong metadata

    metadata = {
        "version": "0.x.0",
        "primary_column": "geom",
        "columns": {
            "geometry": {
                "encoding": "WKT",
                "edges": "",
                "bbox": [180, 90, 200, -90],
            },
        },
    }
    
    $ python3 validator/validate.py examples/example-wrong.parquet
    Validating file...
    - [ERROR] $.version: '0.x.0' does not match '^0\\.1\\.[0-9]+$'
              INFO: The version of the geoparquet metadata standard used when writing.
    - [ERROR] $.columns.geometry.encoding: 'WKT' is not one of ['WKB']
              INFO: Name of the geometry encoding format. Currently only 'WKB' is supported.
    - [ERROR] $.columns.geometry.edges: '' is not one of ['planar', 'spherical']
              INFO: Name of the coordinate system for the edges. Must be one of 'planar' or 'spherical'. The default value is 'planar'.
    - [ERROR] $.columns.geometry.bbox[2]: 200 is greater than the maximum of 180
              INFO: The eastmost constant longitude line that bounds the rectangle (xmax).
    - [ERROR] $.columns.geometry: 'crs' is a required property
    - [ERROR] $.primary_column: must be in $.columns
    This is an invalid GeoParquet file.
    

    The output for a correct GeoParquet file

    $ python3 validator/validate.py examples/example.parquet
    Validating file...
    This is a valid GeoParquet file.
    
    opened by Jesus89 17
  • Correct the 'bbox' description for 3D geometries

    Correct the 'bbox' description for 3D geometries

    In https://github.com/opengeospatial/geoparquet/pull/45 we explicitly allowed 3D geometries, but in that PR I forgot to update the bbox description to reflect this.

    opened by jorisvandenbossche 16
  • Advertizing geometry field

    Advertizing geometry field "type" in Column metadata ?

    A common use case if for a geometry column to hold a single geometry type (Point, LineString, Polygon, ...) for all its records. It could be good to have an optional "type" field name under https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata to capture that when it is known.

    This would help for example conversion between GeoParquet and GIS formats (shapefiles, geopackage) that have typically this information in their metadata.

    Values for type could be the ones accepted by GeoJSON: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection (that would be extended to CircularString, CompoundCurve, CurvePolygon, MultiCurve, MultiSurface, PolyhedralSurface, TIN if we support ISO WKB, and with Z, M or ZM suffixes for other dimensionalities)

    What to do when there is mixed geometry types ?

    • do not set "type"
    • set "type": "mixed"
    • set "type": array of values, e.g. [ "Polygon", "MultiPolygon" ] (this one would be typical when converting from shapefiles where the polygonal type can hold both polygons and multipolygons )
    opened by rouault 15
  • geoparquet coordinate reference system default / format / extension

    geoparquet coordinate reference system default / format / extension

    There are a lot of options for how we approach coordinate reference systems.

    • GeoJSON only allows 4326 data. They started with more options, but then narrowed it down.
    • Simple Features for SQL defines an 'SRID table' where you are supposed to map number id's to crs well known text. PostGIS uses the same srid's as epsg, but oracle doesn't.
    • WKT doesn't include projection info, but that's seen as a weakness, and one main reason why ewkt came. I believe ewkt just assumes epsg / postgis default srid table, so it's not fully interoperable with Oracle.
    • STAC uses epsg, but you can set it to null and provide projjson or crs-wkt
    • OGC API - Features core specifies just WGS-84 (long, lat), using a URI like http://www.opengis.net/def/crs/OGC/1.3/CRS84, see crs info

    And there's obviously more.

    My general take is that we should have a default, and expect most things to use that. But should specify it in a way that it could be an extension in the future. So we shouldn't just say 'everything is 4326' just in the spec, but should have a field that says this field is always 4326 for the core spec, but in the future that field could have other values.

    So I think we do the first version with just 4326, and then when people ask for more we can have an extension.

    One thing I'm not sure about is whether we should use 'epsg' as the field. EPSG covers most projections people want, but not all. In geopackage they just create a whole srid table to then refer to, so the SRID's used are defined. Usually the full epsg database is included, but then users can add other options.

    One potential option would be to follow ogc api - features and use URI's. I'm not sure how widely accepted that approach is, like if the full epsg database is already referenced online. So instead of 'epsg' as the field we'd have 'crs', and it's a string URI.

    opened by cholmes 14
  • Spherical - orientation required, smaller-of-two-possible, or just recommended?

    Spherical - orientation required, smaller-of-two-possible, or just recommended?

    It looks like there are a couple options for the case where edges is spherical:

    • If orientation is spherical than counterclockwise orientation is required. (Or say that if it is not set then the default is counterclockwise instead of null - effectively the same, but maybe slightly better?)
    • If spherical and orientation is left blank then have implementations use the 'smaller-of-two-possible' rule, as used by bigquery, sqlserver.

    We could also just 'recommend' its use, and not mention the smaller-of-two-possible rule. Though that seems far from ideal for me, as it doesn't tell implementations what to do if they get spherical data without it set.

    Currently in main it does say use the smaller-of-two-possible rule, but it is likely poorly described, as I wrote it and was just trying to capture I don't 100% understand.

    In #80 @jorisvandenbossche removed the rule of two possible thing. Which I think is totally fine. But I'd like us to make an explicit decision about it.

    Originally posted by @mentin in https://github.com/opengeospatial/geoparquet/issues/46#issuecomment-1105505534

    opened by cholmes 12
  • Add basic valid and invalid tests for the json schema

    Add basic valid and invalid tests for the json schema

    Closes https://github.com/opengeospatial/geoparquet/issues/135 (rework of https://github.com/opengeospatial/geoparquet/pull/134 to focus on json schema for now)

    opened by jorisvandenbossche 11
  • Array of geometry types

    Array of geometry types

    This renames the geometry_type property to geometry_types and updates the JSON type to be an array of strings (instead of a string or array of strings). The spec language has been updated to reflect that an empty array indicates that the geometry types are not known. The schema now enforces that the items in the list are one of the expected types and allows the Z suffix ~for everything except GeometryCollection~.

    I updated the example.parquet file to use geometry_types instead of geometry_type. (I followed the readme, but am struggling to run the validator, so admit I haven't yet done that.)

    I bumped the schema version to 0.5.0-dev. Ideally this would happen as part of the process for releasing a tag (create release branch, update version, create tag, generate release, bump version back to X.Y.Z-dev, merge branch).

    Fixes #132. Fixes #133.

    opened by tschaub 11
  • Feature identifiers

    Feature identifiers

    Has there been discussion around including an id_column or something similar in the file metadata? I think it would assist in round-tripping features from other formats if it were known which column represented the feature identifier.

    It looks like GDAL has a FID layer creation option. But I'm assuming that the information about which column was used when writing would be lost when reading from the parquet file (@rouault would need to confirm).

    I grant that this doesn't feel "geo" specific, and there may be existing conventions in Parquet that would be appropriate.

    opened by tschaub 11
  • How to deal with dynamic CRS or CRS with ensemble datums (such as EPSG:4326)?

    How to deal with dynamic CRS or CRS with ensemble datums (such as EPSG:4326)?

    From https://github.com/opengeospatial/geoparquet/pull/25#issuecomment-1059016020. The spec has a required crs field that stores a WKT2:2019 string representation of the Coordinate Reference System.

    We currently recommend using EPSG:4326 for the widest interoperability of the written files. However, this is a dynamic CRS, and in addition uses an ensemble datum. See https://gdal.org/user/coordinate_epoch.html for some context. In summary, when using coordinates with a dynamic CRS, you also need to know the point in time of the observation to know the exact location.

    Some discussion topics related to this:

    • How do we deal with a dynamic CRS? We should probably give the option to include the "coordinate epoch" in the metadata (the point in time at which the coordinate is valid)
      • This coordinate epoch is not part of the actual CRS definition, so I think the most straightforward option is to specify an additional (optional) "epoch" field in the column metadata (next to "crs") that holds the epoch value as a decimal year (eg 2021.3).
      • This means we would only support a constant epoch per file. This is in line with the initial support for other formats in GDAL, and we can always later think about expanding this (eg additional field in the parquet file that has a epoch per geometry, or per vertex)
    opened by jorisvandenbossche 11
  • value not present vs null

    value not present vs null

    From experience in STAC and openEO, it seems some implementations/programming languages have a hard time to distinguish between "not present" and "null". Specifying different meanings for null and not present as for crs (unknown / CRS:84) might be a bad idea. Therefore, I'm putting out for discussion whether it's a good idea to do this and maybe instead use "unknown" as string or so instead of null?

    opened by m-mohr 1
  • Add test data covering various options in the spec

    Add test data covering various options in the spec

    Related to https://github.com/opengeospatial/geoparquet/issues/123

    This is incomplete (more parameters should be added) and still draft (the script should be cleaned-up, ensure to add this to the CI to check the generated files, validate with json schema, etc). But wanted to open a PR already to see where and how we want to go with this.

    This is a script that writes a bunch of .parquet files, and then also saves the metadata as a separate json file (extracted from the .parquet files using the existing script scripts/update_example_schemas.py).

    opened by jorisvandenbossche 0
  • Validator improvements

    Validator improvements

    For 1.0.0 we should have a validator that:

    1. Tests not just the metadata but looks at the data itself to make sure it matches the metadata
    2. Is user-friendly, not requiring python. Ideally a web-page and/or an easily installable binary.

    This could be building on the current validator in this repo, or could be a new project we reference, but we want to be sure something exists, so putting this issue in to track it.

    opened by cholmes 4
  • Example data to test implementations

    Example data to test implementations

    One idea that @jorisvandenbossche suggested is that we should have a set of data that shows the range of the specification, that implementors can use to make sure they're handling right, and which could be the basis of 'integration' testing.

    This would include geoparquet files that have a variety of projections, geometry types (including multiple geometry types as in #119), plus things like multiple geometry columns, different edges values, etc. It could also be good to have a set of 'bad' files that can also be tested against.

    opened by cholmes 0
  • Consider externalizability of metadata

    Consider externalizability of metadata

    When [Geo]Parquet files/sources are used within systems that treat them as tables (like Spark, Trino/Presto, Athena, etc.), basic Parquet metadata is tracked in a "catalog" (e.g., a Hive-compatible catalog like AWS Glue Catalog). The engine being used for querying uses metadata to limit the parts of files (and files themselves) that are scanned, but they only expose the columnar content that's present, not the metadata. In some cases, metadata can be queried from the catalogs (e.g., from Athena, but the catalogs need additional work to support the metadata specified by GeoParquet (and this largely hasn't been done yet).

    In the meantime, I'm curious if it makes sense to take the same metadata that's contained in the footer and externalize it into an alternate file (which could be serialized as Parquet, Avro, JSON, etc.). This would allow the query engines to register the metadata as a separate "table" (query-able as a standard source vs. requiring catalog support) and surface/take advantage of "table"-level information like CRS at query-time. At the moment, the CRS of a geometry column is something that needs to be determined out of band.

    This is somewhat similar to #79, in that it doesn't look at GeoParquet sources as "files" ("tables" are often backed by many files), and could be seen as another reason to (de-)duplicate data from file footers into something that covers the whole set.

    /cc @jorisvandenbossche and @kylebarron, since we talked a bit about this at FOSS4G.

    opened by mojodna 4
Releases(v1.0.0-beta.1)
  • v1.0.0-beta.1(Dec 15, 2022)

    We're getting close to the first stable GeoParquet release! We may have one or two more beta releases after gathering feedback from implementors, but we don't have any other planned breaking changes before 1.0.0. Please give it a try and create issues with your feedback.

    The 1.0.0-beta.1 release includes a number of metadata changes since the previous 0.4.0 release. One breaking change is that the previous geometry_type metadata property is now named geometry_types. The value is always an array of string geometry types (instead of sometimes a single string and sometimes an array). In addition, we've clarified a number of things in the specification and tightened up the JSON schema to improve validation. See below for a complete list of changes.

    What's Changed

    • Add GeoParquet.jl, an implementation in Julia by @evetion in https://github.com/opengeospatial/geoparquet/pull/109
    • add R examples and geoarrow by @yeelauren in https://github.com/opengeospatial/geoparquet/pull/108
    • fix lint error by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/125
    • Include suggestion about feature identifiers by @tschaub in https://github.com/opengeospatial/geoparquet/pull/121
    • Consistent spelling of GeoParquet by @tschaub in https://github.com/opengeospatial/geoparquet/pull/142
    • Add link to gpq by @tschaub in https://github.com/opengeospatial/geoparquet/pull/139
    • Correct the 'bbox' description for 3D geometries by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/88
    • Clarify nesting and repetition of geometry columns by @tschaub in https://github.com/opengeospatial/geoparquet/pull/138
    • Clarify that bbox follows column's CRS by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/143
    • Clarify geographic vs non-geographic bbox values by @tschaub in https://github.com/opengeospatial/geoparquet/pull/145
    • Array of geometry types by @tschaub in https://github.com/opengeospatial/geoparquet/pull/140
    • More consistent spelling and punctuation for JSON types by @tschaub in https://github.com/opengeospatial/geoparquet/pull/149
    • Add Apache Sedona to known libraries by @jiayuasu in https://github.com/opengeospatial/geoparquet/pull/150
    • schema.json: update reference to projjson schema to v0.5 by @rouault in https://github.com/opengeospatial/geoparquet/pull/151
    • Require minimum length of 1 for primary_geometry #129 by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/153
    • Clean-up spec and JSON Schema by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/131
    • Clean-up README by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/156
    • The default value of the crs field is required #152 by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/154
    • Require at least one column by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/158
    • Add basic valid and invalid tests for the json schema by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/141
    • Refer to RFC 2119 for definition of requirement levels by @tschaub in https://github.com/opengeospatial/geoparquet/pull/160
    • Read version number from the schema by @tschaub in https://github.com/opengeospatial/geoparquet/pull/159

    New Contributors

    • @evetion made their first contribution in https://github.com/opengeospatial/geoparquet/pull/109
    • @yeelauren made their first contribution in https://github.com/opengeospatial/geoparquet/pull/108
    • @tschaub made their first contribution in https://github.com/opengeospatial/geoparquet/pull/121
    • @jiayuasu made their first contribution in https://github.com/opengeospatial/geoparquet/pull/150
    • @m-mohr made their first contribution in https://github.com/opengeospatial/geoparquet/pull/153

    Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.4.0...v1.0.0-beta.1

    Source code(tar.gz)
    Source code(zip)
    geoparquet.md(14.78 KB)
    schema.json(2.15 KB)
  • v0.4.0(May 26, 2022)

    What's Changed

    • Allow the "crs" to be unknown ("crs": null) by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/94
    • Use PROJJSON instead of WKT2:2019 by @brendan-ward in https://github.com/opengeospatial/geoparquet/pull/96
    • Move JSON Schema definition to format-specs/ by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/93
    • Bump json schema version to 0.4.0 by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/104
    • Updates for 0.4.0 release by @cholmes in https://github.com/opengeospatial/geoparquet/pull/105
    • Script to write nz-building-outlines to geoparquet 0.4.0 by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/87
    • Example: Use total_bounds for finding bounds of GeoDataFrame by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/91
    • README.md: mentions GDAL as implementation by @rouault in https://github.com/opengeospatial/geoparquet/pull/100

    New Contributors

    • @brendan-ward made their first contribution in https://github.com/opengeospatial/geoparquet/pull/96

    Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 27, 2022)

    What's Changed

    • New orientation field to specify winding order for polygons by @felixpalmer @jorisvandenbossche and @cholmes in https://github.com/opengeospatial/geoparquet/pull/74, https://github.com/opengeospatial/geoparquet/pull/80, and https://github.com/opengeospatial/geoparquet/pull/83.

    Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.2.0...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 19, 2022)

    What's Changed

    • Add Apache license by @cholmes in https://github.com/opengeospatial/geoparquet/pull/38
    • Expand WKB encoding to ISO WKB to support 3D geometries by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/45
    • CRS field is now optional (with default to OGC:CRS84) by @alasarr in https://github.com/opengeospatial/geoparquet/pull/60
    • Add a "geometry_type" field per column by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/51
    • Add "epoch" field to optionally specify the coordinate epoch for a dynamic CRS by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/49
    • Add section on winding order by @felixpalmer in https://github.com/opengeospatial/geoparquet/pull/59
    • Add validator script for Python based on JSON Schema by @Jesus89 in https://github.com/opengeospatial/geoparquet/pull/58
    • Script to store JSON copy of metadata next to example Parquet files by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/68
    • Readme enhancements by @jzb in https://github.com/opengeospatial/geoparquet/pull/53
    • geoparquet.md: refer to OGC spec for WKB instead of ISO by @rouault in https://github.com/opengeospatial/geoparquet/pull/54
    • Update validator with the latest spec changes by @Jesus89 https://github.com/opengeospatial/geoparquet/pull/70

    New Contributors

    • @cayetanobv made their first contribution in https://github.com/opengeospatial/geoparquet/pull/57
    • @rouault made their first contribution in https://github.com/opengeospatial/geoparquet/pull/54
    • @jzb made their first contribution in https://github.com/opengeospatial/geoparquet/pull/53
    • @felixpalmer made their first contribution in https://github.com/opengeospatial/geoparquet/pull/59

    Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.1.0...v0.2.0

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Mar 8, 2022)

    Initial Release of GeoParquet

    This is our first release of the GeoParquet specification. It should provide a clear target for implementations interested in providing support for geospatial vector data with Parquet, as we iterate to a stable 1.0.0 spec.

    Initial work started in the geo-arrow-spec GeoPandas repository, and that will continue on Arrow work in a compatible way, with this specification focused solely on Parquet.

    What's Changed

    • Update geoparquet spec by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/2
    • Attempt to align with geoarrow spec by @cholmes in https://github.com/opengeospatial/geoparquet/pull/4
    • Align "geo" key in metadata and example by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/5
    • Clarify the Parquet FileMetadata value formatting (UTF8 string, JSON-encoded) by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/6
    • Clarify that WKB means "standard" WKB enconding by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/16
    • More explicitly mention the metadata is stored in the parquet FileMetaData by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/20
    • Readme enhancements by @cholmes in https://github.com/opengeospatial/geoparquet/pull/19
    • Optional column metadata field to store bounding box information by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/21
    • Clarify that additional top-level fields in the JSON metadata are allowed by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/28
    • CRS spec definition for version 0.1 by @alasarr in https://github.com/opengeospatial/geoparquet/pull/25
    • Update example parquet file by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/24
    • Clean up TODOs in geoparquet.md by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/31
    • "edges" field spec definition for version 0.1 by @Jesus89 in https://github.com/opengeospatial/geoparquet/pull/27
    • Add known libraries that support GeoParquet to README by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/29
    • Updated warning in readme by @cholmes in https://github.com/opengeospatial/geoparquet/pull/33

    New Contributors

    • @TomAugspurger made their first contribution in https://github.com/opengeospatial/geoparquet/pull/2
    • @cholmes made their first contribution in https://github.com/opengeospatial/geoparquet/pull/4
    • @jorisvandenbossche made their first contribution in https://github.com/opengeospatial/geoparquet/pull/5
    • @alasarr made their first contribution in https://github.com/opengeospatial/geoparquet/pull/25
    • @Jesus89 made their first contribution in https://github.com/opengeospatial/geoparquet/pull/27

    Full Changelog: https://github.com/opengeospatial/geoparquet/commits/v0.1.0

    Source code(tar.gz)
    Source code(zip)
Owner
Open Geospatial Consortium
Open Geospatial Consortium
Python Data. Leaflet.js Maps.

folium Python Data, Leaflet.js Maps folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js

6k Jan 02, 2023
Python bindings to libpostal for fast international address parsing/normalization

pypostal These are the official Python bindings to https://github.com/openvenues/libpostal, a fast statistical parser/normalizer for street addresses

openvenues 651 Dec 16, 2022
Code and coordinates for Matt's 2021 xmas tree

xmastree2021 Code and coordinates for Matt's 2021 xmas tree This repository contains the code and coordinates used for Matt's 2021 Christmas tree, as

Stand-up Maths 117 Jan 01, 2023
Use Mapbox GL JS to visualize data in a Python Jupyter notebook

Location Data Visualization library for Jupyter Notebooks Library documentation at https://mapbox-mapboxgl-jupyter.readthedocs-hosted.com/en/latest/.

Mapbox 620 Dec 15, 2022
Get-countries-info - A python code that fetches data of any country

Country-info A python code getting countries information including country's map

CODE 2 Feb 21, 2022
Spectral decomposition for characterizing long-range interaction profiles in Hi-C maps

Inspectral Spectral decomposition for characterizing long-range interaction prof

Nezar Abdennur 6 Dec 13, 2022
A trivia questions about Europe

EUROPE TRIVIA QUIZ IN PYTHON Project Outline Ask user if he / she knows more about Europe. If yes show the Trivia main screen, else show the end Trivi

David Danso 1 Nov 17, 2021
Create Siege configuration files from Cloud Optimized GeoTIFF.

cogeo-siege Documentation: Source Code: https://github.com/developmentseed/cogeo-siege Description Create siege configuration files from Cloud Optimiz

Development Seed 3 Dec 01, 2022
Summary statistics of geospatial raster datasets based on vector geometries.

rasterstats rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal stat

Matthew Perry 437 Dec 23, 2022
Evaluation of file formats in the context of geo-referenced 3D geometries.

Geo-referenced Geometry File Formats Classic geometry file formats as .obj, .off, .ply, .stl or .dae do not support the utilization of coordinate syst

Advanced Information Systems and Technology 11 Mar 02, 2022
Deal with Bing Maps Tiles and Pixels / WGS 84 coordinates conversions, and generate grid Shapefiles

PyBingTiles This is a small toolkit in order to deal with Bing Tiles, used i.e. by Facebook for their Data for Good datasets. Install Clone this repos

Shoichi 1 Dec 08, 2021
Introduction to Geospatial Analysis in Python

Introduction to Geospatial Analysis in Python This repository is in support of a talk on geospatial data. Data To recreate all of the examples, the da

Dillon Gardner 6 Oct 19, 2022
Global topography (referenced to sea-level) in a 10 arcminute resolution grid

Earth - Topography grid at 10 arc-minute resolution Global 10 arc-minute resolution grids of topography (ETOPO1 ice-surface) referenced to mean sea-le

Fatiando a Terra Datasets 1 Jan 20, 2022
peartree: A library for converting transit data into a directed graph for sketch network analysis.

peartree 🍐 🌳 peartree is a library for converting GTFS feed schedules into a representative directed network graph. The tool uses Partridge to conve

Kuan Butts 183 Dec 29, 2022
Asynchronous Client for the worlds fastest in-memory geo-database Tile38

This is an asynchonous Python client for Tile38 that allows for fast and easy interaction with the worlds fastest in-memory geodatabase Tile38.

Ben 53 Dec 29, 2022
A public data repository for datasets created from TransLink GTFS data.

TransLink Spatial Data What: TransLink is the statutory public transit authority for the Metro Vancouver region. This GitHub repository is a collectio

Henry Tang 3 Jan 14, 2022
Download and process satellite imagery in Python using Sentinel Hub services.

Description The sentinelhub Python package allows users to make OGC (WMS and WCS) web requests to download and process satellite images within your Py

Sentinel Hub 659 Dec 23, 2022
QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios

QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios, that I wrote in Python 3 for my own research and visualisation. It is currently unfinis

Samuel Low 9 Aug 23, 2022
A Jupyter - Leaflet.js bridge

ipyleaflet A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook. Usage Selecting a basemap for a leaflet map: Loading a geojso

Jupyter Widgets 1.3k Dec 27, 2022
EOReader is a multi-satellite reader allowing you to open optical and SAR data.

Remote-sensing opensource python library reading optical and SAR sensors, loading and stacking bands, clouds, DEM and index.

ICube-SERTIT 152 Dec 30, 2022