Skip to content

FlatGeobuf

Read and write FlatGeobuf files.

geoarrow.rust.io.read_flatgeobuf

read_flatgeobuf(
    path: Union[str, Path, BinaryIO],
    *,
    store: Optional[ObjectStore] = None,
    batch_size: int = 65536,
    bbox: (
        Tuple[int | float, int | float, int | float, int | float] | None
    ) = None,
    coord_type: CoordTypeInput | None = None,
    use_view_types: bool = True,
    max_scan_records: int | None = 1000,
    read_geometry: bool = True,
    columns: Optional[Sequence[str]] = None
) -> Table

Read a FlatGeobuf file from a path on disk or a remote location into an Arrow Table.

Example:

Parameters:

  • path (Union[str, Path, BinaryIO]) –

    the path to the file or a Python file object in binary read mode.

Other Parameters:

  • store (Optional[ObjectStore]) –

    an ObjectStore instance for this url. This is required only if the file is at a remote location and if the store cannot be inferred.

  • batch_size (int) –

    the number of rows to include in each internal batch of the table.

  • bbox (Tuple[int | float, int | float, int | float, int | float] | None) –

    A spatial filter for reading rows, of the format (minx, miny, maxx, maxy). If set to None, no spatial filtering will be performed.

  • coord_type (CoordTypeInput | None) –

    The GeoArrow coordinate type to use for generated geometries. The default is to use "separated" coordinates.

  • use_view_types (bool) –

    If True, load string and binary columns into Arrow string view and binary view data types. These are more efficient but less widely supported than the older string and binary data types. Defaults to True.

  • max_scan_records (int | None) –

    The maximum number of records to scan for schema inference. If set to None, all records will be scanned. Defaults to 1000.

    Most FlatGeobuf files have a schema defined in the header metadata. But for files that do not have a known schema, we need to scan some initial records to infer a schema. Reading will fail if a new property with an unknown name is found that was not in the schema. Thus, scanning fewer records will be faster, but could fail later if the inferred schema was not complete.

  • read_geometry (bool) –

    If True, read the geometry column. If False, the geometry column will be omitted from the result. Defaults to True.

  • columns (Optional[Sequence[str]]) –

    An optional list of property column names to include in the result. This is separate from the geometry column, which you can turn on/off with read_geometry. If None, all columns will be included. Defaults to None.

Examples:

Reading from a local path:

from geoarrow.rust.io import read_flatgeobuf
table = read_flatgeobuf("path/to/file.fgb")

Reading from a Python file object:

from geoarrow.rust.io import read_flatgeobuf

with open("path/to/file.fgb", "rb") as file:
    table = read_flatgeobuf(file)

Reading from an HTTP(S) url:

from geoarrow.rust.io import read_flatgeobuf

url = "http://flatgeobuf.org/test/data/UScounties.fgb"
table = read_flatgeobuf(url)

Reading from a remote file with specified credentials. You can pass any store constructed from obstore, including from S3Store, GCSStore, AzureStore, HTTPStore or LocalStore.

from geoarrow.rust.io import read_flatgeobuf
from obstore.store import S3Store

store = S3Store(
    "bucket-name",
    access_key_id="...",
    secret_access_key="...",
    region="..."
)
table = read_flatgeobuf("path/in/bucket.fgb", store=store)

Returns:

  • Table

    Table from FlatGeobuf file.

geoarrow.rust.io.read_flatgeobuf_async async

read_flatgeobuf_async(
    path: str,
    *,
    store: Optional[ObjectStore] = None,
    batch_size: int = 65536,
    bbox: (
        Tuple[int | float, int | float, int | float, int | float] | None
    ) = None,
    coord_type: CoordTypeInput | None = None,
    use_view_types: bool = True,
    max_scan_records: int | None = 1000,
    read_geometry: bool = True,
    columns: Optional[Sequence[str]] = None
) -> Table

Read a FlatGeobuf file from a url into an Arrow Table.

Parameters:

  • path (str) –

    the url or relative path to a remote FlatGeobuf file. If an argument is passed for store, this should be a path fragment relative to the prefix of the store.

Other Parameters:

  • store (Optional[ObjectStore]) –

    an ObjectStore instance for this url. This is required only if the file is at a remote location and if the store cannot be inferred.

  • batch_size (int) –

    the number of rows to include in each internal batch of the table.

  • bbox (Tuple[int | float, int | float, int | float, int | float] | None) –

    A spatial filter for reading rows, of the format (minx, miny, maxx, maxy). If set to None, no spatial filtering will be performed.

  • coord_type (CoordTypeInput | None) –

    The GeoArrow coordinate type to use for generated geometries. The default is to use "separated" coordinates.

  • use_view_types (bool) –

    If True, load string and binary columns into Arrow string view and binary view data types. These are more efficient but less widely supported than the older string and binary data types. Defaults to True.

  • max_scan_records (int | None) –

    The maximum number of records to scan for schema inference. If set to None, all records will be scanned. Defaults to 1000.

    Most FlatGeobuf files have a schema defined in the header metadata. But for files that do not have a known schema, we need to scan some initial records to infer a schema. Reading will fail if a new property with an unknown name is found that was not in the schema. Thus, scanning fewer records will be faster, but could fail later if the inferred schema was not complete.

  • read_geometry (bool) –

    If True, read the geometry column. If False, the geometry column will be omitted from the result. Defaults to True.

  • columns (Optional[Sequence[str]]) –

    An optional list of property column names to include in the result. This is separate from the geometry column, which you can turn on/off with read_geometry. If None, all columns will be included. Defaults to None.

Examples:

Reading from an HTTP(S) url:

from geoarrow.rust.io import read_flatgeobuf_async

url = "http://flatgeobuf.org/test/data/UScounties.fgb"
table = await read_flatgeobuf_async(url)

Reading from an S3 bucket:

from geoarrow.rust.io import ObjectStore, read_flatgeobuf_async
from obstore.store import S3Store

store = S3Store(
    "bucket-name",
    access_key_id="...",
    secret_access_key="...",
    region="..."
)
table = await read_flatgeobuf_async("path/in/bucket.fgb", store=store)

Returns:

  • Table

    Table from FlatGeobuf file.

geoarrow.rust.io.write_flatgeobuf

write_flatgeobuf(
    table: ArrowStreamExportable,
    file: str | Path | BinaryIO,
    *,
    write_index: bool = True,
    promote_to_multi: bool = True,
    title: str | None = None,
    description: str | None = None,
    metadata: str | None = None,
    name: str | None = None
) -> None

Write to a FlatGeobuf file on disk.

Parameters:

  • table (ArrowStreamExportable) –

    the Arrow RecordBatch, Table, or RecordBatchReader to write.

  • file (str | Path | BinaryIO) –

    the path to the file or a Python file object in binary write mode.

Other Parameters:

  • write_index (bool) –

    whether to write a spatial index in the FlatGeobuf file. Defaults to True.

  • title (str | None) –

    Dataset title. Defaults to None.

  • description (str | None) –

    Dataset description (intended for free form long text).

  • metadata (str | None) –

    Dataset metadata (intended to be application specific).

  • name (str | None) –

    the string passed to FgbWriter::create and is what OGR observes as the layer name of the file. By default, this will try to use the file name, but can be overrided.