PostgreSQL 14 is here – what should you look out for?
PostgreSQL 14 has more than 200 improvements and new features, but broadly they fall into four areas: Performance, Observability, Security and Convenience. For developers, what is there to look forward to in this new version? Let’s look at them in some more detail.
PostgreSQL is the most popular open source database according to StackOverflow, with more developers wanting to work with it compared to any other database. It has now been updated to version 14. For developers, what is there to look forward to in this new version?
PostgreSQL 14 has more than 200 improvements and new features, but broadly they fall into four areas: Performance, Observability, Security and Convenience. Let’s look at them in some more detail.
1. Performance Enhancements
Enterprises have to process more and more data, and the velocity, volume, & variety (the three Vs of Big Data) show no signs of slowing down. Being able to scale up and out to support these workloads on PostgreSQL 14 is a big area of focus. This includes looking at better handling of concurrent connections and more features for query parallelism and distributed workloads.
PostgreSQL has improvements for those developers that want to use this database to scale horizontally. PostgreSQL has a foreign data wrapper (FDW), postgres_fdw, that has been instrumental in easing the burdens of handling distributed workloads. With PostgreSQL 14, there are two major improvements in the FDW to improve performance for such transactions – it now supports parallel table scans on foreign tables, and bulk insert of data is now allowed on foreign tables. Both these improvements further cement the increasing ability of PostgreSQL to scale horizontally and natively handle distributed databases.
Logical replication has been expanded to allow streaming in-progress transactions to subscribers, marking another enhancement for distributed workloads. Large transactions were previously written to disk till the transaction was completed, before replicating that transaction to the subscriber. By allowing in-progress transactions to be streamed, users gain significant performance benefits along with more confidence in their distributed workloads.
On the Server side, reducing B-tree index bloat was one of the biggest areas of investment. Frequently updated indexes can bloat over time due to dead tuples, as these tuples are removed only when a vacuum is run. Between vacuums, the page can get filled up and lead to an update or insert causing a page split. This split is then not reversible.
PostgreSQL 14 has improved this process so dead tuples can be detected and removed even between vacuums, reducing the number of page splits and thereby reducing index bloat. The vacuum system has also been enhanced to eagerly remove deleted pages. Previously, it took two vacuum cycles to do so, with the first one marking the page as deleted and the second one actually freeing up that space.
High latency connections with frequent write operations can slow down client performance. Libpq is the interface used by programmers that work in C, and provides a good example of this issue as libpq waits for each transaction to be successful before sending the next one. With PostgreSQL 14, ‘pipeline mode’ has been introduced to libpq allowing the client to send multiple transactions at the same time. This can potentially provide a tremendous boost to performance. What’s more, because this is a client-side feature, PostgreSQL 14’s libpq can even be used with older versions of the PostgreSQL server.
PostgreSQL’s support for query parallelism allows the system to engage multiple CPU cores in multiple threads to execute queries in parallel, thereby drastically improving performance. PostgreSQL 14 brings more refinement to this system by adding parallelism support to RETURN QUERY and REFRESH MATERIALIZED VIEW. Improvements have also been rolled out to the performance of parallel sequential scans and nested loop joins.
PostgreSQL 14 also adds support for LZ4 compression for TOAST, a system used to efficiently store large data. LZ4 is a lossless compression algorithm that focuses on speed of compression and decompression. LZ4 can be configured at the column as well as the system level. Previously, the only option was pglz compression, which is fast, but has a small compression ratio.
SEE ALSO: Cloud-Driven Data Modernization
Observability is one of the biggest buzzwords of 2021, as developers want more insight into how their applications are performing over time. PostgreSQL 14 adds more features to help with monitoring, with one of the biggest changes being the move of query hash system from pg_stat_statement to the core database. This allows for monitoring a query using a single ID across several PostgreSQL systems and logging functions.
This version also adds new features to track progress of COPY, WAL activity, and replication slots statistics.
All developers should consider database security as part of their set-up, and those involved in DevOps programs will also be responsible for those implementations over time. Helping developers improve their security was an element in PostgreSQL 14 too.
The biggest overall change was SCRAM as the default authentication for PostgreSQL going forward, after SCRAM-SHA-256 authentication was first introduced in PostgreSQL 10. The previous default MD5 authentication has had some weaknesses that have been exploited in the past. SCRAM is much more powerful, and it makes regulatory compliance for data security easier.
Alongside improved authentication, two predefined roles have been added in PostgreSQL 14 – pg_read_all_data and pg_write_all_data. The former makes it convenient to grant read-only access for a user to all tables, views, and schemas in the database. This role will have read access by default to any new tables that are created. The latter makes it convenient to create super-user styled privileges. While this can be convenient for access and activities, this kind of account has to be used sparingly.
4. Convenience for Application Developers
As part of making it easier to use PostgreSQL, the new version has improved access to JSON using subscripts. From an application development perspective, I have always found support for JSON in PostgreSQL very interesting. PostgreSQL has supported this unstructured data form since version 9.2, but it has had a unique syntax for retrieving data. In version 14, support for subscripts has been added, making it easier for developers to retrieve JSON data using a commonly recognized syntax.
Alongside this, PostgreSQL has had Range types since version 9.2. PostgreSQL 14 now introduces ‘multirange’ support which allows for non-contiguous ranges, helping developers write simpler queries for complex sequences. A simple example of this in action would be specifying the ranges of time a meeting room is booked through the day. By making it easier to manage different data queries in one request, it helps developers manage how their applications use data.
Stored procedures were added in PostgreSQL 11, giving developers transactional control in a block of code. PostgreSQL 14 implements the OUT parameter, allowing developers to return data using multiple parameters in their stored procedures. This feature will be familiar to Oracle developers and a welcome addition for folks trying to migrate from Oracle to PostgreSQL.
PostgreSQL is distinct from many other open source databases as it has a truly independent community and a permissive license that allows many companies and providers to build on it. For me, the important area to focus on is what the community has done to improve performance for heavy transactional workloads and additional support for distributed data. As more companies look at migrating away from Oracle, or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.