Functions¶
Interoperability with other Python geospatial libraries (Shapely, GeoPandas) and in-memory geospatial formats (WKB, WKT).
geoarrow.rust.core ¶
read_pyogrio ¶
read_pyogrio(
path_or_buffer: Path | str | bytes,
/,
layer: int | str | None = None,
encoding: str | None = None,
columns: Sequence[str] | None = None,
read_geometry: bool = True,
skip_features: int = 0,
max_features: int | None = None,
where: str | None = None,
bbox: Tuple[float, float, float, float] | Sequence[float] | None = None,
mask=None,
fids=None,
sql: str | None = None,
sql_dialect: str | None = None,
return_fids=False,
batch_size=65536,
**kwargs,
) -> Table
Read from an OGR data source to an Arrow Table
Parameters:
-
path_or_buffer
(Path | str | bytes
) –A dataset path or URI, or raw buffer.
-
layer
(int | str | None
, default:None
) –If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.
-
encoding
(str | None
, default:None
) –If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.
-
columns
(Sequence[str] | None
, default:None
) –List of column names to import from the data source. Column names must exactly match the names in the data source, and will be returned in the order they occur in the data source. To avoid reading any columns, pass an empty list-like.
-
read_geometry
(bool
, default:True
) –If True, will read geometry into a GeoSeries. If False, a Pandas DataFrame will be returned instead. Default:
True
. -
skip_features
(int
, default:0
) –Number of features to skip from the beginning of the file before returning features. If greater than available number of features, an empty DataFrame will be returned. Using this parameter may incur significant overhead if the driver does not support the capability to randomly seek to a specific feature, because it will need to iterate over all prior features.
-
max_features
(int | None
, default:None
) –Number of features to read from the file. Default:
None
. -
where
(str | None
, default:None
) –Where clause to filter features in layer by attribute values. If the data source natively supports SQL, its specific SQL dialect should be used (eg. SQLite and GeoPackage:
SQLITE
, PostgreSQL). If it doesn't, theOGRSQL WHERE
syntax should be used. Note that it is not possible to overrule the SQL dialect, this is only possible when you use thesql
parameter.Examples:
"ISO_A3 = 'CAN'"
,"POP_EST > 10000000 AND POP_EST < 100000000"
-
bbox
(Tuple[float, float, float, float] | Sequence[float] | None
, default:None
) –If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned. Cannot be combined with
mask
keyword. -
mask
–Shapely geometry, optional (default:
None
) If present, will be used to filter records whose geometry intersects this geometry. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this geometry will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of this geometry will be returned. Requires Shapely >= 2.0. Cannot be combined withbbox
keyword. -
fids
–array-like, optional (default:
None
) Array of integer feature id (FID) values to select. Cannot be combined with other keywords to select a subset (skip_features
,max_features
,where
,bbox
,mask
, orsql
). Note that the starting index is driver and file specific (e.g. typically 0 for Shapefile and 1 for GeoPackage, but can still depend on the specific file). The performance of reading a large number of features usings FIDs is also driver specific. -
sql
(str | None
, default:None
) –The SQL statement to execute. Look at the sql_dialect parameter for more information on the syntax to use for the query. When combined with other keywords like
columns
,skip_features
,max_features
,where
,bbox
, ormask
, those are applied after the SQL query. Be aware that this can have an impact on performance, (e.g. filtering with thebbox
ormask
keywords may not use spatial indexes). Cannot be combined with thelayer
orfids
keywords. -
sql_dialect
–str, optional (default:
None
) The SQL dialect the SQL statement is written in. Possible values:- None: if the data source natively supports SQL, its specific SQL dialect
will be used by default (eg. SQLite and Geopackage:
SQLITE
, PostgreSQL). If the data source doesn't natively support SQL, theOGRSQL
dialect is the default. 'OGRSQL'
: can be used on any data source. Performance can suffer when used on data sources with native support for SQL.'SQLITE'
: can be used on any data source. All spatialite functions can be used. Performance can suffer on data sources with native support for SQL, except for Geopackage and SQLite as this is their native SQL dialect.
- None: if the data source natively supports SQL, its specific SQL dialect
will be used by default (eg. SQLite and Geopackage:
Returns:
-
Table
–Table
from_geopandas ¶
from_geopandas(input: GeoDataFrame) -> Table
Create a GeoArrow Table from a GeoPandas GeoDataFrame.
Notes:¶
- Currently this will always generate a non-chunked GeoArrow array. This is partly because pyarrow.Table.from_pandas always creates a single batch.
Parameters:
-
input
(GeoDataFrame
) –
Returns:
-
Table
–A GeoArrow Table
from_shapely ¶
from_shapely(input, *, crs: CRSInput | None = None) -> NativeArray
Create a GeoArrow array from an array of Shapely geometries.
Notes:¶
- Currently this will always generate a non-chunked GeoArrow array.
-
Under the hood, this will first call
shapely.to_ragged_array
, falling back toshapely.to_wkb
if necessary.This is because
to_ragged_array
is the fastest approach but fails on mixed-type geometries. It supports combining Multi-* geometries with non-multi-geometries in the same array, so you can combine e.g. Point and MultiPoint geometries in the same array, butto_ragged_array
doesn't work if you have Point and Polygon geometries in the same array.
Args:
input: Any array object accepted by Shapely, including numpy object arrays and
geopandas.GeoSeries
.
Returns:
A GeoArrow array
from_wkb ¶
from_wkb(
input: ArrowArrayExportable | ArrowStreamExportable,
*,
coord_type: CoordType | CoordTypeT = CoordType.Interleaved
) -> NativeArray | ChunkedNativeArray
Parse an Arrow BinaryArray from WKB to its GeoArrow-native counterpart.
This will handle both ISO and EWKB flavors of WKB. Any embedded SRID in EWKB-flavored WKB will be ignored.
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –An Arrow array of Binary type holding WKB-formatted geometries.
Other Parameters:
-
coord_type
(CoordType | CoordTypeT
) –Specify the coordinate type of the generated GeoArrow data.
Returns:
-
NativeArray | ChunkedNativeArray
–A GeoArrow-native geometry array
from_wkt ¶
from_wkt(
input: ArrowArrayExportable | ArrowStreamExportable,
*,
coord_type: CoordType | CoordTypeT = CoordType.Interleaved
) -> NativeArray | ChunkedNativeArray
Parse an Arrow StringArray from WKT to its GeoArrow-native counterpart.
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –An Arrow array of string type holding WKT-formatted geometries.
Other Parameters:
-
coord_type
(CoordType | CoordTypeT
) –Specify the coordinate type of the generated GeoArrow data.
Returns:
-
NativeArray | ChunkedNativeArray
–A GeoArrow-native geometry array
to_geopandas ¶
to_geopandas(input: ArrowStreamExportable) -> GeoDataFrame
Convert a GeoArrow Table to a GeoPandas GeoDataFrame.
Notes:¶
- This is an alias to GeoDataFrame.from_arrow.
Args: input: A GeoArrow Table.
Returns:
-
GeoDataFrame
–the converted GeoDataFrame
to_shapely ¶
to_shapely(
input: ArrowArrayExportable | ArrowStreamExportable,
) -> NDArray[object_]
Convert a GeoArrow array to a numpy array of Shapely objects
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –input geometry array
Returns:
to_wkb ¶
to_wkb(input: ArrowArrayExportable) -> NativeArray
Encode a GeoArrow-native geometry array to a WKBArray, holding ISO-formatted WKB geometries.
Parameters:
-
input
(ArrowArrayExportable
) –A GeoArrow-native geometry array
Returns:
-
NativeArray
–An array with WKB-formatted geometries
to_wkt ¶
to_wkt(
input: ArrowArrayExportable | ArrowStreamExportable,
) -> Array | ChunkedArray
Encode a geometry array to WKT.
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –An Arrow array of string type holding WKT-formatted geometries.
Returns:
-
Array | ChunkedArray
–A GeoArrow-native geometry array
Table functions¶
geoarrow.rust.core ¶
geometry_col ¶
geometry_col(
input: ArrowArrayExportable | ArrowStreamExportable,
) -> NativeArray | ChunkedNativeArray
Access the geometry column of a Table or RecordBatch
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –The Arrow RecordBatch or Table to extract the geometry column from.
Returns:
-
NativeArray | ChunkedNativeArray
–A geometry array or chunked array.
CRS Access¶
geoarrow.rust.core.get_crs ¶
get_crs(
data: ArrowArrayExportable | ArrowStreamExportable | ArrowSchemaExportable,
/,
column: str | None = None,
) -> CRS | None
Get the CRS from a GeoArrow object.
Parameters:
-
data
(ArrowArrayExportable | ArrowStreamExportable | ArrowSchemaExportable
) –A GeoArrow object. This can be an Array, ChunkedArray, ArrayReader, RecordBatchReader, Table, Field, or Schema.
-
column
(str | None
, default:None
) –The name of the geometry column to retrieve, if there's more than one. For Schema, Table, and RecordBatchReader inputs, there may be more than one geometry column included. If there are multiple geometry columns, you must pass this
column
parameter. If there is only one geometry column, it will be inferred. Defaults to None.
Raises:
-
ValueError
–If no geometry column could be found.
Returns:
-
CRS | None
–a pyproj CRS object.