Getting started with PostGIS
Bringing support for geographic objects to PostgreSQL, PostGIS has several advantages over similar spatial databases like SQL Server and Oracle Locator/Spatial. Adam Wright shows us the extender’s most popular functions, data types and how to get started.
PostGIS has been around for around fifteen years; it was first released by Refractions Research in 2001 under the GNU license. It is open-source software which means you can install it anywhere you like, no charge. You can get support from a number of companies; including EnterpriseDB (EDB), as part of the Postgres support package, and most platform-as-a-service offerings include PostGIS as an extension. For example, if you’re on Amazon Web Services and using EDB Postgres Cloud Database (PPCD), you can easily turn on PostGIS.
PostGIS is built on top of Postgres, which is a fully ACID compliant database server. If you issue a commit in order for it to be successful, those changes must persist on disk. You can pull the power on your system, and Postgres will come back in auto recovery mode and recover itself to a consistent state. Postgres supports a wide range of index types, triggers, views, material views, table partitioning via inheritance, support for schema-less data types like JSON and Hstore.
For GIS developers, Postgres includes Windowing functions, recursive Common Table Expressions (CTEs), object type support, and the ability to execute code where your data resides with a wide range of Procedural Languages such as PL/Java, PL/R, PL/Python, and more. Also important for GIS developers, are operators in EDB Postgres like grouping sets, cube, rollup, and many more. Those particular operators will also be available in community Postgres beginning with 9.5, along with new features like UPSERT and Row Level Security (RLS). Another powerful design feature of Postgres is its extensibility as it was designed from the very beginning to allow runtime extensions that are treated as first class citizens in the database.
You can also easily scale out open source PostGIS on top of the native features available in open source Postgres. While other spatial extensions can cost up to $40,000 a core, you can turn on PostGIS for no additional cost. You can get support for a fraction of the cost of other systems with no loss in functionality or quality. We can take our spatial data in one database, replicate it to another, and then another. The other database servers can be on the same rack or be in a different regional office. We can use a replication-aware connection pooler like pgpool and route all writes to the primary node and reads to the standby servers.
Doing this helps scale out read workloads and it also helps improve writes on the primary by offloading workloads such as reporting and backups. We can also add a tool like EDB Failover Manager (EFM) to the mix to monitor the health of your Postgres cluster and automate failover in ways that are transparent to your application and end users.
PostGIS data types
PostGIS has four main spatial data types. There is geometry for flat-earth, geography for round-earth, raster type where each cell can contain values, and topology types for modelling real world rules of your vector features. Topologies allow you to apply relationship rules such as expanding one boundary automatically shrinks the adjoining boundary. All of the classic sub-types you would expect are available like points, lines, polygons along with multipoint, multi-line, and multi-polygon. Do not forget to add a spatial index which is just a matter of simply issuing standard Create Index syntax – „CREATE INDEX myindexname ON mytable usings gist(geom)“.
The first and most commonly used data type is geometry which uses a Cartesian grid and can be used for storing simple points, line and polygons but can also store multipart shapes, a Z coordinate (elevation) value, and an M (measure) value. Relationship checks can oftentimes be faster with the Geometry type, and it currently has more support from third-party desktop applications. It’s useful when the area you’re working with regional data where the curvature of the earth does not come into play.
If you are dealing with distances on a global scale and your data covers the poles and crosses the dateline, you will want to evaluate the geography type, which was more recently added in PostGIS 1.5. Geography data is treated like it’s on an ellipsoid. This is often useful for multinational corporations and federal agencies that collect data from all over the globe, such as flight paths and global logistics. If you have data that spans the globe or don’t want to think about projections, you can store data using the geography type.
There are currently fewer functions available with the geography type than geometry, but you can cast to and from geography and geometry. But keep in mind doing so isn’t free and reprojecting your data on the fly comes with a performance penalty. This is true in any spatial database or desktop GIS system. There are a number of different considerations for choosing the right spatial reference system. The PostGIS docs are a good resource, and the “PostGIS in Action” book is another good resource for helping decide the right way to store your data.
Topology is a powerful tool for expressing spatial relationships with different vector features. The classic example is of a metro subway map. If you need to travel from one end of the city to the other, you start at your current station, a point. You find your destination on a map, another point. You must fine the tracks, lines that connect. But, there will also be metadata for which stations connect to different train lines.
Raster is made up of a matrix of pixels, or cells, with each pixel containing a value that represents a condition for that area covered by the cell. Raster data can be used as a base layer or a backdrop behind vector layers. This can provide some additional context to your vector data, like locating your point in Google Maps and then switching to Google Earth to view the world around you. Raster data can also be used for analysis by storing meaningful values in each of the cells like crop yield and moisture levels in precision agriculture.
Common functions in PostGIS
PostGIS provides a number of functions for dealing with common GIS concepts. You can easily process and analyse data with simple SQL queries. A common spatial definition is a boundary, which can be a political boundary, rule-based, or something you generate for the purpose of a single event. An item can fall inside or outside the boundary. Law enforcement may decide appropriate charges based on whether an offence occurred inside of a school zone.
Another common spatial definition is a buffer. You can use the ST_Buffer function and expand the buffer by a specified number of units. You can provide a negative radius and shrink the boundary rather than extending it. If you are performing a radius search, you can use a powerful function like ST_DWithin. You can call one function to provide city planners with the number of parks or bars within a one-mile radius of downtown. If you are developing a mobile application, one function call can return all ATMs within a mile of the users’ current location. The spatial definition „contains“ is when one spatial object contains another spatial object. The interior includes all points of the other object. If an object such as a point or linestring only exists along a polygon’s boundary, the polygon does not contain it.
If you are coming from the relational database world, you are familiar with equals. In that case, you would return true or return a set of rows if a condition is met. The ST_Equals functions can return true for a simple check if two features are spatially equal. A commute to and from work is spatially equal. If you need to verify the ordering of waypoints, use St_OrderingEquals to verify the points are consistent. A spatial intersection is when geometries have two points in common. For this you can start with the ST_Intersects function and return true if any parts of those geometries or any two geometries are shared. An example of this is a GPS on a Hazardous Materials (Hazmat) truck to determine if the truck crossed into an urban area boundary where Hazmat trucks may not be allowed.
Getting your data into PostGIS
Once you are ready to get started, there are a number of options for getting data in and out of PostGIS. If you are coming from desktop GIS and have a collection of shapefiles, these can be imported at the command with shp2pgsql. There is also an shp2pgsql GUI available if you prefer. You can have the tool create a table based on your shapefile or you can load to an existing table. The open source GDAL library includes OGR Simple Feature Library; it provides access to a number of additional data formats including other database systems. This is a powerful tool for converting from one data source to another.
Once operational, maintaining Postgres with PostGIS doesn’t have to be complicated. You can backup up data in the same you always have with pg_basebackup or pg_dump. Be sure to use the same Postgres best practices like performing a VACUUM ANALYZE after data loads, build indexes on your spatial columns, use EXPLAIN plans as a basis for query tuning and parameter modifications, always apply minor updates. And don’t forget you can leverage the power of the underlying database system with rules, triggers, partial indexes, materialised views, partitioning, multiple table spaces, multi layer security and much, much more.