4 years ago one of my clients wanted to "donate" a system for the local fire department to help them do a quick proximity search to find the fire hydrants and quickly choose the healthy one near the fire.
And since it was charity and had a bunch of private data Google was not an option ($$$$), so I (just a full-stack developer back then) was like "listen I have no idea what is this GIS stuff, but I'll give it a try", after a quick research boom PostGIS, reading the docs and testing things, plus QGIS to visualize and to help understand it better 10/10!
That was my unintentional "career" shift, thanks to PostGIS I'm now a senior dev at the largest food delivery company (local), specialized in GIS and realtime data driven systems using PostGIS everyday lol
PostGIS is one of those rare examples of highly specialised software that is both OSS and best-in-class. Usually OSS is relegated for highly commoditized software but GIS is anything but even after all these years.
I can't really think of many others? Maybe OptaPlanner would be another candidate.
Yes, among off-the-shelf GIS databases, PostGIS is generally as good or better than anything else on the market. More capable systems exist, particularly in terms of scale and performance, but those are all bespoke.
Thanks for sharing this! I've been thinking about alternatives to PostGIS to handle larger datasets (millions) and nothing seems to come close to the level of functionality, performance and community support of Postgres
I've been using PostGIS a bit for a toy project with Elite: Dangerous star system data. It's been a hoot, but I do worry that I'm going to start having trouble optimizing my queries. Finding all the systems within say 20 Ly of our sun can take upwards of a few seconds, and I've already added a GIST index on the positions.
I also find it a bit strange how 3D feels kinda tacked on, but it makes sense when you realize most maps are in fact 2D.
Anyway, my 2-cents of experience. If anyone has some good advise for > million row, 3D spacial optimizations for PostGIS, please let me know.
The only thing I can recommend off hand is to try and simplify your geometries a bit for searching if they’re complex. Searching along the border of a complicated shape is much harder than searching around a square, it has to do a lot more calculations.
Is it possible you haven't added the bounding box for your geometries? The GIST index will use your bounding box to optimise the queries, but only if it finds one. Also make sure your query is actually using the index.
What kind of front end do you use. Usually the spatial db is only a part of the solution, and the GIS information needs to displayed and functionality for interaction with maps and objects needs to provided. What framework would pair well with PostGis in the open source world?
My suggestion might be to a precalculated step to split "the universe" into grid areas divided by geometry complexity (i.e. population density gridding) then process these reference areas in parallel. How you do the latter is probably the interesting part!
Hmm, I think you are looking for SP-GiST (unbalanced space partitioned GiST), unless you meant that implicitly. It can be much faster for...space partitioning.
PostGIS is loads of fun with sports data. Being able to find similar passages of play just using ST_MakeLine and ST_FrechetDistance feels like magic when you first do it in all of five lines of code.
Spatial tools are a bit like logic programming in that they’re very slightly esoteric. But once you know they’re the right hammer for some nails, they’ll save you lots of time and effort over your life, and let you express some ideas you might otherwise struggle with.
FWIW, even though PostGIS is pretty great, if your use-case is primarily offline analysis and you don't need the data to be permanently accessible or writable, consider not using a database at all. You can do a lot, a lot faster with e.g. https://shapely.readthedocs.io/en/stable/manual.html and/or https://geopandas.org/.
Ultimately for spatial functions both PostGIS and Shapely/Geopandas are using GEOS as the base library, so neither are 'a lot faster' for analysis at a fundamental level. Lots of queries to a database brings some speed loss but this isn't necessary for the vast majority of use cases. OTOH PostGIS is in many cases much more finely tuned than shapely/geopandas for speed, scale and parallelisation (it is a much more highly maintained library) so can actually be faster.
Functionality is a different thing. PostGIS has a lot more baked in functionality than shapely. But shapely/geopandas exposes the depth of python and its libraries which allow much more extensive customisation of how to work with data.
Lots of tradeoffs and overlapping use cases - I just wanted to add some depth to this discussion.
Yes, even for on-disk read-only use cases, shapefiles are almost always much faster. Overhead from serialization, network, etc often dominates the performance of GIS.
I'd argue that being able to use sql to more easily combine ans filter datasets, including non-geospatial ones is still very useful in the circumstances you described.
I'm not sure what 'a lot faster' really means in this context.
Honestly, I've found using Spatialite queries to be orders of magnitude faster for analysis than shapely or geopandas. The latter typically imply row-by-row selection and manipulation for starters.
If you can wrangle the data into a geopackage first it's super easy to run queries over the data and extract what you need.
It's good software (I've used it more than a decade), however I found GEOS to be a sticking point. When using it on very large polygons, e.g. 10k to 1 million vertices, memory leaks are not uncommon and performance drops off considerably. Debugging SQL -> C -> C++ is not fun and hacking C++ geometry code when it's not part of your normal work is nigh on impossible. I've found the ESRI geometry API for Java to be by far the best geometry API out there. Harder to use initially and obviously JVM specific but faster and more reliable. It's a very good fit for Hadoop / Spark or other JVM applications. Ignore the brand name, I'm not affiliated and it's FOSS with an Apache license.
https://getgeodb.com is a hosted CartoDB which is an open source stack built on PostGIS but its notoriously hard to run and configure. So we decided to run it as a service for others... we find it very useful to build apps on top of PostGIS since running them requires configuring middleware like tileservices, data APIs and importing data.
I've toyed with PostGIS to make a poster from OSM data by selecting streets that intersect with one of Tokyo's special wards[0]. Sadly I haven't found any excuses to play with it more since.
I am writing an open-source line simplification algorithm[1] with PostGIS. So far it's a few hundred lines of SQL-like code[2], but it will grow.
I have never written anything serious in plpgsql, so this is my first non-trivial project. Writing in PostGIS is a strange mix of SQL (fully declarative) and a "real" programming language with variables, arrays, functions and loops. The mix is very interesting and requires getting used to. But doable.
Things I like so far:
- geometric functions are very robust and well documented. Reference manual[3] is amazing. Creating/editing/iterating over lines and polygons, detecting their features, is a breeze.
- good integration with QGIS -- I can see the result graphically on my desktop system very easily.
- I can even write tests[4].
Things I am worried about or don't like as much:
- The SQL/procedural mix requires a lot of getting used to, and sometimes "simple" things in a "real" programming language becomes complex here. E.g. I would love to have mutable linked-lists. Or moving data between "sql-world" (rows) to "plpgsql-world" (arrays and data structures) is non-obvious, and best references are, unfortunately, stackoverflow and scripts of others'.
- My program will run on a large data set (e.g. all rivers in a country or a continent). I will want to optimize it. There is an out-of-tree profiler[5], but "unofficial, out of tree" always adds risk.
- Biggest one: deep copies everywhere. Since plpgsql does not do any memory management, it is deep-copying everything. Sometimes (often in this algorithm!) I want to add a vertex to a line (=river bend); that always requires a full copy of the whole bend. When I know it will not be used and will be safe, I would like to say "I am mutating this geometry and I don't want a deep copy".
In general, I like it for algorithms. Though the moment it needs to hit performance-sensitive production, I believe I will re-do it in C (which, looking at the postgis source code, is quite write-able too).
Relatedly: the appendix section "SQL Primer" in Obe and Hsu's 'PostGIS in Action' is my favorite quick intro to SQL and the first stop I give anyone looking to learn about the language.
I helped build a relatively successful SaaS offering for finance atop PostGIS with MapServer and Mapproxy+Redis. The stack ran circles around ArcGis. Not even a close comparison.
[+] [-] Ahmed90|5 years ago|reply
And since it was charity and had a bunch of private data Google was not an option ($$$$), so I (just a full-stack developer back then) was like "listen I have no idea what is this GIS stuff, but I'll give it a try", after a quick research boom PostGIS, reading the docs and testing things, plus QGIS to visualize and to help understand it better 10/10!
That was my unintentional "career" shift, thanks to PostGIS I'm now a senior dev at the largest food delivery company (local), specialized in GIS and realtime data driven systems using PostGIS everyday lol
[+] [-] tomrod|5 years ago|reply
[+] [-] ntechau|5 years ago|reply
[deleted]
[+] [-] jpgvm|5 years ago|reply
I can't really think of many others? Maybe OptaPlanner would be another candidate.
[+] [-] jandrewrogers|5 years ago|reply
[+] [-] leetrout|5 years ago|reply
[+] [-] gardaani|5 years ago|reply
I've decided to use PostgreSQL in my projects.
[+] [-] tcbasche|5 years ago|reply
[+] [-] dunefox|5 years ago|reply
[+] [-] nixpulvis|5 years ago|reply
I also find it a bit strange how 3D feels kinda tacked on, but it makes sense when you realize most maps are in fact 2D.
Anyway, my 2-cents of experience. If anyone has some good advise for > million row, 3D spacial optimizations for PostGIS, please let me know.
[+] [-] estsauver|5 years ago|reply
[+] [-] nicois|5 years ago|reply
[+] [-] gfiorav|5 years ago|reply
Intersections with large polygons will perform faster if subdivided.
[+] [-] billfruit|5 years ago|reply
[+] [-] tcbasche|5 years ago|reply
[+] [-] jhgb|5 years ago|reply
Doesn't that sound a bit pathological? It should be nowhere near that slow. Try implementing a VP tree over that data.
[+] [-] fdr|5 years ago|reply
[+] [-] 7952|5 years ago|reply
[+] [-] yxhuvud|5 years ago|reply
[+] [-] cwmma|5 years ago|reply
CREATE INDEX [indexname] ON [tablename] USING GIST ([geometryfield] gist_geometry_ops_nd);
by default postgis just creates 2d ones
[+] [-] thom|5 years ago|reply
Spatial tools are a bit like logic programming in that they’re very slightly esoteric. But once you know they’re the right hammer for some nails, they’ll save you lots of time and effort over your life, and let you express some ideas you might otherwise struggle with.
[+] [-] another-dave|5 years ago|reply
Do you have any recommended sources for sample datasets to play around with?
[+] [-] stdbrouw|5 years ago|reply
[+] [-] twelvechairs|5 years ago|reply
Functionality is a different thing. PostGIS has a lot more baked in functionality than shapely. But shapely/geopandas exposes the depth of python and its libraries which allow much more extensive customisation of how to work with data.
Lots of tradeoffs and overlapping use cases - I just wanted to add some depth to this discussion.
[+] [-] rbranson|5 years ago|reply
[+] [-] jimktrains2|5 years ago|reply
[+] [-] tcbasche|5 years ago|reply
Honestly, I've found using Spatialite queries to be orders of magnitude faster for analysis than shapely or geopandas. The latter typically imply row-by-row selection and manipulation for starters.
If you can wrangle the data into a geopackage first it's super easy to run queries over the data and extract what you need.
[+] [-] jononor|5 years ago|reply
[+] [-] twelvechairs|5 years ago|reply
Talks and writing by PostGIS co-creator Paul Ramsey (who is incredible): http://blog.cleverelephant.ca/writings
The mapscaping podcast - a lot of intros to the complex and overlapping worlds of GIS: https://mapscaping.com/blogs/the-mapscaping-podcast
[+] [-] craigkerstiens|5 years ago|reply
[+] [-] cwmma|5 years ago|reply
[+] [-] willtemperley|5 years ago|reply
https://github.com/Esri/geometry-api-java
[+] [-] sabman|5 years ago|reply
would love any feedback :D ... you can also hit me up on twitter https://twitter.com/sabman
[+] [-] ascorbic|5 years ago|reply
[+] [-] zeeZ|5 years ago|reply
[0]: https://christian.rinjes.me/posts/2020-11-25-tokyo-street-ar...
[+] [-] mooneater|5 years ago|reply
http://www.refractions.net/
[+] [-] motiejus|5 years ago|reply
I have never written anything serious in plpgsql, so this is my first non-trivial project. Writing in PostGIS is a strange mix of SQL (fully declarative) and a "real" programming language with variables, arrays, functions and loops. The mix is very interesting and requires getting used to. But doable.
Things I like so far:
- geometric functions are very robust and well documented. Reference manual[3] is amazing. Creating/editing/iterating over lines and polygons, detecting their features, is a breeze.
- good integration with QGIS -- I can see the result graphically on my desktop system very easily.
- I can even write tests[4].
Things I am worried about or don't like as much:
- The SQL/procedural mix requires a lot of getting used to, and sometimes "simple" things in a "real" programming language becomes complex here. E.g. I would love to have mutable linked-lists. Or moving data between "sql-world" (rows) to "plpgsql-world" (arrays and data structures) is non-obvious, and best references are, unfortunately, stackoverflow and scripts of others'.
- My program will run on a large data set (e.g. all rivers in a country or a continent). I will want to optimize it. There is an out-of-tree profiler[5], but "unofficial, out of tree" always adds risk.
- Biggest one: deep copies everywhere. Since plpgsql does not do any memory management, it is deep-copying everything. Sometimes (often in this algorithm!) I want to add a vertex to a line (=river bend); that always requires a full copy of the whole bend. When I know it will not be used and will be safe, I would like to say "I am mutating this geometry and I don't want a deep copy".
In general, I like it for algorithms. Though the moment it needs to hit performance-sensitive production, I believe I will re-do it in C (which, looking at the postgis source code, is quite write-able too).
[1]: https://www.tandfonline.com/doi/abs/10.1559/1523040987824417...
[2]: https://github.com/motiejus/stud/blob/master/IV/wm.sql
[3]: https://postgis.net/docs/reference.html
[4]: https://github.com/motiejus/stud/blob/master/IV/tests.sql
[5]: https://github.com/okbob/plpgsql_check
Edit: formatting
[+] [-] atonse|5 years ago|reply
https://github.com/zombodb/pgx
[+] [-] pella|5 years ago|reply
- https://planet.postgresql.org/ ( postgresql related posts )
- https://github.com/postgis/docker-postgis ( docker images for postgis ; alpine+debian)
[+] [-] dbreunig|5 years ago|reply
https://www.manning.com/books/postgis-in-action-second-editi...
[+] [-] brianzelip|5 years ago|reply
1. https://talkpython.fm/episodes/show/295/gis-python
2. https://changelog.com/podcast/417 (postgres)
[+] [-] billfruit|5 years ago|reply
The only viable open source route I am seeing is through some kind of QGIS extensions. Is there something better for that purpose?
[+] [-] samstave|5 years ago|reply
[+] [-] hnedeotes|5 years ago|reply
[+] [-] ruffrey|5 years ago|reply
[+] [-] poxrud|5 years ago|reply