FlatGeobuf¶
Read and write FlatGeobuf files.
geoarrow.rust.io.read_flatgeobuf ¶
read_flatgeobuf(
path: Union[str, Path, BinaryIO],
*,
store: Optional[ObjectStore] = None,
batch_size: int = 65536,
bbox: (
Tuple[int | float, int | float, int | float, int | float] | None
) = None,
coord_type: CoordTypeInput | None = None,
use_view_types: bool = True,
max_scan_records: int | None = 1000,
read_geometry: bool = True,
columns: Optional[Sequence[str]] = None
) -> Table
Read a FlatGeobuf file from a path on disk or a remote location into an Arrow Table.
Example:
Parameters:
-
path
(Union[str, Path, BinaryIO]
) –the path to the file or a Python file object in binary read mode.
Other Parameters:
-
store
(Optional[ObjectStore]
) –an ObjectStore instance for this url. This is required only if the file is at a remote location and if the store cannot be inferred.
-
batch_size
(int
) –the number of rows to include in each internal batch of the table.
-
bbox
(Tuple[int | float, int | float, int | float, int | float] | None
) –A spatial filter for reading rows, of the format
(minx, miny, maxx, maxy)
. If set toNone
, no spatial filtering will be performed. -
coord_type
(CoordTypeInput | None
) –The GeoArrow coordinate type to use for generated geometries. The default is to use "separated" coordinates.
-
use_view_types
(bool
) –If
True
, load string and binary columns into Arrow string view and binary view data types. These are more efficient but less widely supported than the older string and binary data types. Defaults toTrue
. -
max_scan_records
(int | None
) –The maximum number of records to scan for schema inference. If set to
None
, all records will be scanned. Defaults to 1000.Most FlatGeobuf files have a schema defined in the header metadata. But for files that do not have a known schema, we need to scan some initial records to infer a schema. Reading will fail if a new property with an unknown name is found that was not in the schema. Thus, scanning fewer records will be faster, but could fail later if the inferred schema was not complete.
-
read_geometry
(bool
) –If
True
, read the geometry column. IfFalse
, the geometry column will be omitted from the result. Defaults toTrue
. -
columns
(Optional[Sequence[str]]
) –An optional list of property column names to include in the result. This is separate from the geometry column, which you can turn on/off with
read_geometry
. IfNone
, all columns will be included. Defaults toNone
.
Examples:
Reading from a local path:
from geoarrow.rust.io import read_flatgeobuf
table = read_flatgeobuf("path/to/file.fgb")
Reading from a Python file object:
from geoarrow.rust.io import read_flatgeobuf
with open("path/to/file.fgb", "rb") as file:
table = read_flatgeobuf(file)
Reading from an HTTP(S) url:
from geoarrow.rust.io import read_flatgeobuf
url = "http://flatgeobuf.org/test/data/UScounties.fgb"
table = read_flatgeobuf(url)
Reading from a remote file with specified credentials. You can pass any store
constructed from obstore
, including from S3Store
,
GCSStore
, AzureStore
,
HTTPStore
or
LocalStore
.
from geoarrow.rust.io import read_flatgeobuf
from obstore.store import S3Store
store = S3Store(
"bucket-name",
access_key_id="...",
secret_access_key="...",
region="..."
)
table = read_flatgeobuf("path/in/bucket.fgb", store=store)
Returns:
-
Table
–Table from FlatGeobuf file.
geoarrow.rust.io.read_flatgeobuf_async
async
¶
read_flatgeobuf_async(
path: str,
*,
store: Optional[ObjectStore] = None,
batch_size: int = 65536,
bbox: (
Tuple[int | float, int | float, int | float, int | float] | None
) = None,
coord_type: CoordTypeInput | None = None,
use_view_types: bool = True,
max_scan_records: int | None = 1000,
read_geometry: bool = True,
columns: Optional[Sequence[str]] = None
) -> Table
Read a FlatGeobuf file from a url into an Arrow Table.
Parameters:
-
path
(str
) –the url or relative path to a remote FlatGeobuf file. If an argument is passed for
store
, this should be a path fragment relative to the prefix of the store.
Other Parameters:
-
store
(Optional[ObjectStore]
) –an ObjectStore instance for this url. This is required only if the file is at a remote location and if the store cannot be inferred.
-
batch_size
(int
) –the number of rows to include in each internal batch of the table.
-
bbox
(Tuple[int | float, int | float, int | float, int | float] | None
) –A spatial filter for reading rows, of the format
(minx, miny, maxx, maxy)
. If set toNone
, no spatial filtering will be performed. -
coord_type
(CoordTypeInput | None
) –The GeoArrow coordinate type to use for generated geometries. The default is to use "separated" coordinates.
-
use_view_types
(bool
) –If
True
, load string and binary columns into Arrow string view and binary view data types. These are more efficient but less widely supported than the older string and binary data types. Defaults toTrue
. -
max_scan_records
(int | None
) –The maximum number of records to scan for schema inference. If set to
None
, all records will be scanned. Defaults to 1000.Most FlatGeobuf files have a schema defined in the header metadata. But for files that do not have a known schema, we need to scan some initial records to infer a schema. Reading will fail if a new property with an unknown name is found that was not in the schema. Thus, scanning fewer records will be faster, but could fail later if the inferred schema was not complete.
-
read_geometry
(bool
) –If
True
, read the geometry column. IfFalse
, the geometry column will be omitted from the result. Defaults toTrue
. -
columns
(Optional[Sequence[str]]
) –An optional list of property column names to include in the result. This is separate from the geometry column, which you can turn on/off with
read_geometry
. IfNone
, all columns will be included. Defaults toNone
.
Examples:
Reading from an HTTP(S) url:
from geoarrow.rust.io import read_flatgeobuf_async
url = "http://flatgeobuf.org/test/data/UScounties.fgb"
table = await read_flatgeobuf_async(url)
Reading from an S3 bucket:
from geoarrow.rust.io import ObjectStore, read_flatgeobuf_async
from obstore.store import S3Store
store = S3Store(
"bucket-name",
access_key_id="...",
secret_access_key="...",
region="..."
)
table = await read_flatgeobuf_async("path/in/bucket.fgb", store=store)
Returns:
-
Table
–Table from FlatGeobuf file.
geoarrow.rust.io.write_flatgeobuf ¶
write_flatgeobuf(
table: ArrowStreamExportable,
file: str | Path | BinaryIO,
*,
write_index: bool = True,
promote_to_multi: bool = True,
title: str | None = None,
description: str | None = None,
metadata: str | None = None,
name: str | None = None
) -> None
Write to a FlatGeobuf file on disk.
Parameters:
-
table
(ArrowStreamExportable
) –the Arrow RecordBatch, Table, or RecordBatchReader to write.
-
file
(str | Path | BinaryIO
) –the path to the file or a Python file object in binary write mode.
Other Parameters:
-
write_index
(bool
) –whether to write a spatial index in the FlatGeobuf file. Defaults to True.
-
title
(str | None
) –Dataset title. Defaults to
None
. -
description
(str | None
) –Dataset description (intended for free form long text).
-
metadata
(str | None
) –Dataset metadata (intended to be application specific).
-
name
(str | None
) –the string passed to
FgbWriter::create
and is what OGR observes as the layer name of the file. By default, this will try to use the file name, but can be overrided.