Behind the PostgreSQL renaissance
Whats shaking up the core of this hipster DBMS, and why should you pick it over NoSQL?
Ahead of Postgres Day 2014 on 9th July, JAXenter caught up with Dave Page, Chief Architect of EnterpriseDB, who take the lead in supplying enterprise-class Postgres products to database crunchers around the world.
JAX: Can you tell us about your work/experience with PostgreSQL? What attracted you to the technology?
Page: I started working with PostgreSQL around 1997, when I was building some in-house applications at the housing association where I was working at the time. The proprietary databases we were using for other apps were going to be very expensive to use, due to the restrictive licensing, so I was looking for a good Open Source option. PostgreSQL fitted the bill nicely with its wide feature set and active and helpful community.
After a short while using the database, I found myself missing a good GUI management tool, so I started working on pgAdmin – the current version of which is still the most popular tool used with Postgres. That quickly led to me working on other areas of the project, such as the ODBC driver, when I found issues or needed additional features. Some years later, we produced the native Windows port of PostgreSQL, following which I took on responsibility for the installers which has become a big part of my work in the community and in my day job at EnterpriseDB.
Nowadays, I’m still active in the community, serving on the core team and web/sysadmin teams, as well as on the boards of PostgreSQL Europe and the PostgreSQL Community Association of Canada – both non-profit organisations supporting the community. This work is supported by EnterpriseDB (but not directed by them), who I work for as Chief Architect responsible for tools, cloud products and builds/packaging of all products.
What led you to create pgAdmin?
I came from a Microsoft background, so I was very used to working with GUI tools. When I started working with PostgreSQL, the only option was a tool called pgAccess which was largely unmaintained and was written in TCL/TK which wasn’t easy to run on Windows back then. I, and others on the mailing lists, discussed the need for a good quality GUI tool. I wrote an initial proof of concept in Visual Basic 4 called pgManager, which was quickly abandoned and replaced by pgAdmin, which was written using Visual Basic 5. That proved very popular, but wasn’t a great design (I was an IT Manager back then, not a developer). When we realised that pgAdmin had reached the end of its life, I decided to start on pgAdmin II in VB 6.
I locked myself away for some weeks and wrote the code from the ground up, making sure I didn’t repeat past mistakes. I was particularly proud of both the design of the application, the cleanliness of the code and its structure. Some years later the big issues we couldn’t resolve in pgAdmin II were character set, internationalisation and multi-platform support. pgAdmin III was a rewrite in C++ with wxWidgets that followed a similar design to pgAdmin II, but ran on Linux, Mac OS X, FreeBSD and Solaris as well as Windows, and could support users in different countries who wanted an internationalised user interface and the ability to use alternative character sets.
It’s now 12 years later and we’re currently in the early stages of planning for pgAdmin 4, in which we expect to attract more developers through the use of Python instead of C++, and produce an application that will include all the useful features of past versions of pgAdmin in a code base that can be run either on the desktop or on a web server.
PostgreSQL is enjoying a bit of a resurgence in popularity lately – what do you think is driving this?
I think there are two reasons; following the Oracle purchase of Sun (who had just purchased MySQL) people have been looking for alternatives as they’ve become less happy with their choice of DBMS. This has led to people realising that PostgreSQL is fast, stable and has a much wider range of features to offer – which in turn has led to further adoption as word spreads. This effect has been quite visible if you follow related discussions on sites like Slashdot – more and more people over the years are talking about Postgres as their database of choice.
What have been the most notable advances in Postgres in the past two years, in your opinion?
Two features immediately spring to mind and I think we’re about to see a third. The first was the Windows port in v8.0. That required a huge amount of work as Postgres was originally designed to run on Unix systems, but it’s led to PostgreSQL being much more widely used both because developers can run it on their Windows laptops and because software vendors have been embedding it in their desktop applications – we see a lot of users running a couple of different packages for tracking data in poker games for example.
I think the next major advancement was the addition of streaming replication and hot standby servers in v9.0. These related technologies allowed users to stream the binary write ahead log files to standby servers, which is extremely efficient compared to traditional trigger based replication solutions. In addition, the standby servers can be made “hot”, meaning they can be used to run read-only workloads. For example, you can run your backups or reporting workloads against your standby server to reduce the load on the master.
I think the next big feature we’ll see will be bi-directional (or multi-master) replication in the core server. The infrastructure for this will be in the upcoming 9.4 release, and allows a stream of data changes to be decoded from the binary logs, again, without the use of triggers. This change data can then be utilised in many ways, including with a replication engine which is being worked on at the moment.
What are the pros and cons of PostgreSQL vs NoSQL?
PostgreSQL is a fully fledged relational database system which gives users the ability to model their data and tightly control the constraints under which it’s stored, as well as to manage complex atomic transactions. For many types of application, this is crucial to ensure quality of data (for example, there is no 31st February as some databases might have you believe)! NoSQL databases don’t typically implement this strict level of control, which can result in the user having a database containing documents (or data, depending on the database in question) in a myriad of different formats, perhaps containing invalid values. For the vast majority of users I work with, that’s a real problem – they need to be sure their data is clean, accurate, correct and consistent.
The downside of this is that the system is naturally more complex – as a user you have to properly model your data and design your schema, and as a sysadmin you have a harder job scaling out horizontally and managing clusters of servers (which I should note, is not actually an issue for the majority of users – most applications do not need to be “web scale”). This is what makes NoSQL databases attractive to some people – it’s quick to get up and running and you don’t have to think about what your data needs to look like. In my experience though, that can be a short sighted view; I’ve seen numerous users getting bitten post-deployment as a result.
That said – there were times when users needed a NoSQL database, such as for storing document or key value pairs. This is one of the reasons why we do have XML and JSON support in PostgreSQL, as well as HStore, a key-value storage data type. In fact, in PostgreSQL 9.4 – due to be released around September time – we’ll be adding a new JSONB data type which initial benchmarks run by community members have shown to out-perform MongoDB. We also have a number of foreign data wrappers which allow you to query external data sources including Redis and MongoDB as if they were tables in PostgreSQL, allowing users to write complex SQL to combine their relational and non-relational data in one query.
Do you think the lustre is starting to wear off NoSQL as people become more aware of its limitations for certain applications?
Definitely – as I mentioned above, I’ve seen a number of users run into serious problems and migrate back to relational databases from NoSQL after they’ve hit serious problems.
Which enterprises do you see as the best fit for PostgreSQL, and are they and areas that you see as ripe for penetration?
We see PostgreSQL in use in all manner of environments, from home users playing poker and the scientific community storing research data right through to some of the world’s largest banks, insurance companies and government agencies. I don’t think there is any one area that’s particularly ripe for penetration – we’re seeing it across the board.
What do you hope to see added/changed in PostgreSQL along the line?
One of the big things for me is monitoring. PostgreSQL can be something of a black box when it comes to server monitoring, and one of things we’re discussing at the moment is how to improve that by giving DBAs the information they need to really understand what their systems are doing. We can give the user a lot of that information with tools like Postgres Enterprise Manager, which I’m responsible for at EnterpriseDB, but we have to combine data from various sources to give the user what they need. There are improvements we could make to PostgreSQL itself to make it easier to get at that information and so forth.
The other important topic for me at the moment is vertical scaling. As hardware with greater numbers of sockets, cores and threads becomes more common (and those numbers continue to rise), we need to ensure that PostgreSQL can scale appropriately. We’re actively working on that with support from some of the major hardware vendors who have kindly given us access to some very large machines.
Where do you think Oracle went wrong with MySQL?
I think they went wrong in much the same way as they did with other projects such as OpenOffice, which forked into LibreOffice and Hudson which forked into Jenkins – they tried to take control and drive the projects the way they wanted, rather than the community wanted. PostgreSQL doesn’t have that problem – and cannot have that problem – as there is no single legal entity that owns the copyright to the PostgreSQL source code and no single company controlling the project.