(no title)
kylebarron | 1 year ago
A quick benchmark [0] shows that saving to GeoPackage, FlatGeobuf, and GeoParquet are roughly 10x faster than saving to CSV. Additionally, the CSV is much larger than any other format.
[0]: https://gist.github.com/kylebarron/f632bbf95dbb81c571e4e64cd...
culebron21|1 year ago
I guess your dataset is countries borders, isn't it? Something that 1) has few records and makes a small r-tree, and 2) contains linestrings/polygons that can be densified, similar to Google Polyline algorithm.
But a lot of geospatial data is just sets of points. For instance: housing per entire country (couple of million points). Address database (IIRC 20+M points). Or GPS logs of multiple users, received from logging database, ordered by time, not assembled in tracks -- several million per day.
For such datasets, use CSV, don't abuse indexed formats. (Unless you store it for a long time and actually use the index for spatial search, multiple times.)
kylebarron|1 year ago
You need to use pyogrio [1], its vectorized counterpart, instead. Make sure you use `engine="pyogrio"` when calling `to_file` [2]. Fiona does a loop in Python, while pyogrio is exclusively compiled. So pyogrio is usually about 10-15x faster than fiona. Soon, in pyogrio version 0.8, it will be another ~2-4x faster than pyogrio is now [3].
[0]: https://github.com/Toblerity/Fiona
[1]: https://github.com/geopandas/pyogrio
[2]: https://geopandas.org/en/stable/docs/reference/api/geopandas...
[3]: https://github.com/geopandas/pyogrio/pull/346