I/O¶
Read and write to files on disk and databases like PostGIS.
geoarrow.rust.core ¶
ObjectStore ¶
ObjectStore(root: str, options: Optional[Dict[str, str]] = None)
A generic object store interface for uniformly interacting with AWS S3, Google Cloud Storage, and Azure Blob Storage.
To create, pass a bucket path plus authentication options into the constructor. Currently, authentication credentials are not found automatically.
Examples:
Reading from a FlatGeobuf file from an S3 bucket.
from geoarrow.rust.core import ObjectStore, read_flatgeobuf
options = {
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_region": "..."
}
fs = ObjectStore('s3://bucket', options=options)
table = read_flatgeobuf("path/in/bucket.fgb", fs=fs)
read_csv
builtin
¶
read_csv(file: str | Path | BinaryIO, geometry_column_name: str, *, batch_size: int = 65536) -> GeoTable
Read a CSV file from a path on disk into a GeoTable.
Parameters:
-
file
(str | Path | BinaryIO
) –the path to the file or a Python file object in binary read mode.
-
geometry_column_name
(str
) –the name of the geometry column within the CSV.
-
batch_size
(int
, default:65536
) –the number of rows to include in each internal batch of the table.
Returns:
-
GeoTable
–Table from CSV file.
read_flatgeobuf
builtin
¶
read_flatgeobuf(file: Union[str, Path, BinaryIO], *, fs: Optional[ObjectStore] = None, batch_size: int = 65536, bbox: Tuple[float, float, float, float] | None = None) -> GeoTable
Read a FlatGeobuf file from a path on disk or a remote location into a GeoTable.
Example:
Reading from a local path:
from geoarrow.rust.core import read_flatgeobuf
table = read_flatgeobuf("path/to/file.fgb")
Reading from a Python file object:
from geoarrow.rust.core import read_flatgeobuf
with open("path/to/file.fgb", "rb") as file:
table = read_flatgeobuf(file)
Reading from an HTTP(S) url:
from geoarrow.rust.core import read_flatgeobuf
url = "http://flatgeobuf.org/test/data/UScounties.fgb"
table = read_flatgeobuf(url)
Reading from a remote file on an S3 bucket.
from geoarrow.rust.core import ObjectStore, read_flatgeobuf
options = {
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_region": "..."
}
fs = ObjectStore('s3://bucket', options=options)
table = read_flatgeobuf("path/in/bucket.fgb", fs=fs)
Parameters:
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary read mode.
Other Parameters:
-
fs
(Optional[ObjectStore]
) –an ObjectStore instance for this url. This is required only if the file is at a remote location.
-
batch_size
(int
) –the number of rows to include in each internal batch of the table.
-
bbox
(Tuple[float, float, float, float] | None
) –A spatial filter for reading rows, of the format (minx, miny, maxx, maxy). If set to
None
, no spatial filtering will be performed.
Returns:
-
GeoTable
–Table from FlatGeobuf file.
read_flatgeobuf_async
builtin
¶
read_flatgeobuf_async(path: str, *, fs: Optional[ObjectStore] = None, batch_size: int = 65536, bbox: Tuple[float, float, float, float] | None = None) -> GeoTable
Read a FlatGeobuf file from a url into a GeoTable.
Example:
Reading from an HTTP(S) url:
from geoarrow.rust.core import read_flatgeobuf_async
url = "http://flatgeobuf.org/test/data/UScounties.fgb"
table = await read_flatgeobuf_async(url)
Reading from an S3 bucket:
from geoarrow.rust.core import ObjectStore, read_flatgeobuf_async
options = {
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_region": "..."
}
fs = ObjectStore('s3://bucket', options=options)
table = await read_flatgeobuf_async("path/in/bucket.fgb", fs=fs)
Parameters:
-
path
(str
) –the url or relative path to a remote FlatGeobuf file. If an argument is passed for
fs
, this should be a path fragment relative to the root passed to theObjectStore
constructor.
Other Parameters:
-
fs
(Optional[ObjectStore]
) –an ObjectStore instance for this url. This is required for non-HTTP urls.
-
batch_size
(int
) –the number of rows to include in each internal batch of the table.
-
bbox
(Tuple[float, float, float, float] | None
) –A spatial filter for reading rows, of the format (minx, miny, maxx, maxy). If set to
None
, no spatial filtering will be performed.
Returns:
-
GeoTable
–Table from FlatGeobuf file.
read_geojson
builtin
¶
read_geojson(file: Union[str, Path, BinaryIO], *, batch_size: int = 65536) -> GeoTable
Read a GeoJSON file from a path on disk into a GeoTable.
Parameters:
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary read mode.
-
batch_size
(int
, default:65536
) –the number of rows to include in each internal batch of the table.
Returns:
-
GeoTable
–Table from GeoJSON file.
read_geojson_lines
builtin
¶
read_geojson_lines(file: Union[str, Path, BinaryIO], *, batch_size: int = 65536) -> GeoTable
Read a newline-delimited GeoJSON file from a path on disk into a GeoTable.
This expects a GeoJSON Feature on each line of a text file, with a newline character separating each Feature.
Parameters:
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary read mode.
Returns:
-
GeoTable
–Table from GeoJSON file.
read_ipc
builtin
¶
read_ipc(file: Union[str, Path, BinaryIO]) -> GeoTable
read_ipc_stream
builtin
¶
read_ipc_stream(file: Union[str, Path, BinaryIO]) -> GeoTable
read_parquet
builtin
¶
read_parquet(path: str, *, fs: Optional[ObjectStore] = None, batch_size: int = 65536) -> GeoTable
Read a GeoParquet file from a path on disk into a GeoTable.
Example:
Reading from a local path:
from geoarrow.rust.core import read_parquet
table = read_parquet("path/to/file.parquet")
Reading from an HTTP(S) url:
from geoarrow.rust.core import read_parquet
url = "https://raw.githubusercontent.com/opengeospatial/geoparquet/v1.0.0/examples/example.parquet"
table = read_parquet(url)
Reading from a remote file on an S3 bucket.
from geoarrow.rust.core import ObjectStore, read_parquet
options = {
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_region": "..."
}
fs = ObjectStore('s3://bucket', options=options)
table = read_parquet("path/in/bucket.parquet", fs=fs)
Parameters:
-
path
(str
) –the path to the file
-
batch_size
(int
, default:65536
) –the number of rows to include in each internal batch of the table.
Returns:
-
GeoTable
–Table from GeoParquet file.
read_parquet_async
builtin
¶
read_parquet_async(path: str, *, fs: Optional[ObjectStore] = None, batch_size: int = 65536) -> GeoTable
Read a GeoParquet file from a path on disk into a GeoTable.
Examples:
Reading from an HTTP(S) url:
from geoarrow.rust.core import read_parquet_async
url = "https://raw.githubusercontent.com/opengeospatial/geoparquet/v1.0.0/examples/example.parquet"
table = await read_parquet_async(url)
Reading from a remote file on an S3 bucket.
from geoarrow.rust.core import ObjectStore, read_parquet_async
options = {
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_region": "..."
}
fs = ObjectStore('s3://bucket', options=options)
table = await read_parquet_async("path/in/bucket.parquet", fs=fs)
Parameters:
-
path
(str
) –the path to the file
-
batch_size
(int
, default:65536
) –the number of rows to include in each internal batch of the table.
Returns:
-
GeoTable
–Table from GeoParquet file.
read_postgis
builtin
¶
read_postgis(connection_url: str, sql: str) -> Optional[GeoTable]
read_postgis_async
builtin
¶
read_postgis_async(connection_url: str, sql: str) -> Optional[GeoTable]
read_pyogrio
builtin
¶
read_pyogrio(path_or_buffer: Path | str | bytes, /, layer: int | str | None = None, encoding: str | None = None, columns: Sequence[str] | None = None, read_geometry: bool = True, skip_features: int = 0, max_features: int | None = None, where: str | None = Ellipsis, bbox: Tuple[float, float, float, float] | Sequence[float] | None = None, mask=None, fids=None, sql: str | None = None, sql_dialect: str | None = None, return_fids=False, batch_size=65536, **kwargs) -> GeoTable
Read from an OGR data source to a GeoTable
Parameters:
-
path_or_buffer
(Path | str | bytes
) –A dataset path or URI, or raw buffer.
-
layer
(int | str | None
, default:None
) –If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.
-
encoding
(str | None
, default:None
) –If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.
-
columns
(Sequence[str] | None
, default:None
) –List of column names to import from the data source. Column names must exactly match the names in the data source, and will be returned in the order they occur in the data source. To avoid reading any columns, pass an empty list-like.
-
read_geometry
(bool
, default:True
) –If True, will read geometry into a GeoSeries. If False, a Pandas DataFrame will be returned instead. Default:
True
. -
skip_features
(int
, default:0
) –Number of features to skip from the beginning of the file before returning features. If greater than available number of features, an empty DataFrame will be returned. Using this parameter may incur significant overhead if the driver does not support the capability to randomly seek to a specific feature, because it will need to iterate over all prior features.
-
max_features
(int | None
, default:None
) –Number of features to read from the file. Default:
None
. -
where
(str | None
, default:Ellipsis
) –Where clause to filter features in layer by attribute values. If the data source natively supports SQL, its specific SQL dialect should be used (eg. SQLite and GeoPackage:
SQLITE
, PostgreSQL). If it doesn't, theOGRSQL WHERE
syntax should be used. Note that it is not possible to overrule the SQL dialect, this is only possible when you use thesql
parameter.Examples:
"ISO_A3 = 'CAN'"
,"POP_EST > 10000000 AND POP_EST < 100000000"
-
bbox
(Tuple[float, float, float, float] | Sequence[float] | None
, default:None
) –If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned. Cannot be combined with
mask
keyword. -
mask
–Shapely geometry, optional (default:
None
) If present, will be used to filter records whose geometry intersects this geometry. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this geometry will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of this geometry will be returned. Requires Shapely >= 2.0. Cannot be combined withbbox
keyword. -
fids
–array-like, optional (default:
None
) Array of integer feature id (FID) values to select. Cannot be combined with other keywords to select a subset (skip_features
,max_features
,where
,bbox
,mask
, orsql
). Note that the starting index is driver and file specific (e.g. typically 0 for Shapefile and 1 for GeoPackage, but can still depend on the specific file). The performance of reading a large number of features usings FIDs is also driver specific. -
sql
(str | None
, default:None
) –The SQL statement to execute. Look at the sql_dialect parameter for more information on the syntax to use for the query. When combined with other keywords like
columns
,skip_features
,max_features
,where
,bbox
, ormask
, those are applied after the SQL query. Be aware that this can have an impact on performance, (e.g. filtering with thebbox
ormask
keywords may not use spatial indexes). Cannot be combined with thelayer
orfids
keywords. -
sql_dialect
–str, optional (default:
None
) The SQL dialect the SQL statement is written in. Possible values:- None: if the data source natively supports SQL, its specific SQL dialect
will be used by default (eg. SQLite and Geopackage:
SQLITE
, PostgreSQL). If the data source doesn't natively support SQL, theOGRSQL
dialect is the default. 'OGRSQL'
: can be used on any data source. Performance can suffer when used on data sources with native support for SQL.'SQLITE'
: can be used on any data source. All spatialite functions can be used. Performance can suffer on data sources with native support for SQL, except for Geopackage and SQLite as this is their native SQL dialect.
- None: if the data source natively supports SQL, its specific SQL dialect
will be used by default (eg. SQLite and Geopackage:
Returns:
-
GeoTable
–Table
write_csv
builtin
¶
write_csv(table: ArrowStreamExportable, file: str | Path | BinaryIO) -> None
Write a GeoTable to a CSV file on disk.
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(str | Path | BinaryIO
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None
write_flatgeobuf
builtin
¶
write_flatgeobuf(table: ArrowStreamExportable, file: str | Path | BinaryIO, *, write_index: bool = True) -> None
Write a GeoTable to a FlatGeobuf file on disk.
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(str | Path | BinaryIO
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None
write_geojson
builtin
¶
write_geojson(table: ArrowStreamExportable, file: Union[str, Path, BinaryIO]) -> None
Write a GeoTable to a GeoJSON file on disk.
Note that the GeoJSON specification mandates coordinates to be in the WGS84 (EPSG:4326) coordinate system, but this function will not automatically reproject into WGS84 for you.
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None
write_geojson_lines
builtin
¶
write_geojson_lines(table: ArrowStreamExportable, file: Union[str, Path, BinaryIO]) -> None
Write a GeoTable to a newline-delimited GeoJSON file on disk.
Note that the GeoJSON specification mandates coordinates to be in the WGS84 (EPSG:4326) coordinate system, but this function will not automatically reproject into WGS84 for you.
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None
write_ipc
builtin
¶
write_ipc(table: ArrowStreamExportable, file: Union[str, Path, BinaryIO]) -> None
Write a GeoTable to an Arrow IPC (Feather v2) file on disk.
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None
write_ipc_stream
builtin
¶
write_ipc_stream(table: ArrowStreamExportable, file: Union[str, Path, BinaryIO]) -> None
Write a GeoTable to an Arrow IPC stream
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None
write_parquet
builtin
¶
write_parquet(table: ArrowStreamExportable, file: str) -> None
Write a GeoTable to a GeoParquet file on disk.
Parameters:
-
table
(ArrowStreamExportable
) –the table to write.
-
file
(str
) –the path to the file or a Python file object in binary write mode.
Returns:
-
None
–None