Multi-mastering

Behind the PostgreSQL renaissance

Lucy Carey
PostgreSQl

What’s shaking up the core of this ‘hipster’ DBMS, and why should you pick it over NoSQL?

Ahead of Postgres
Day 2014 on 9
th
July, JAXenter caught up with Dave
Page, Chief Architect of
EnterpriseDB, who take the lead in supplying
enterprise-class Postgres products to database crunchers around the
world.

JAX: Can you tell us about your
work/experience with PostgreSQL? What attracted you to the
technology?

Page: I started working with
PostgreSQL around 1997, when I was building some in-house
applications at the housing association where I was working at the
time. The proprietary databases we were using for other apps were
going to be very expensive to use, due to the restrictive
licensing, so I was looking for a good Open Source option.
PostgreSQL fitted the bill nicely with its wide feature set and
active and helpful community.

After a short while using the database, I found
myself missing a good GUI management tool, so I started working on
pgAdmin – the current version of which is still the most popular
tool used with Postgres. That quickly led to me working on other
areas of the project, such as the ODBC driver, when I found issues
or needed additional features. Some years later, we produced the
native Windows port of PostgreSQL, following which I took on
responsibility for the installers which has become a big part of my
work in the community and in my day job at EnterpriseDB.

Nowadays, I’m still active in the community,
serving on the core team and web/sysadmin teams, as well as on the
boards of PostgreSQL Europe and the PostgreSQL Community
Association of Canada – both non-profit organisations supporting
the community. This work is supported by EnterpriseDB (but not
directed by them), who I work for as Chief Architect responsible
for tools, cloud products and builds/packaging of all
products.

What led you to create pgAdmin?

I came from a Microsoft background, so I was very used
to working with GUI tools. When I started working with PostgreSQL,
the only option was a tool called pgAccess which was largely
unmaintained and was written in TCL/TK which wasn’t easy to run on
Windows back then. I, and others on the mailing lists, discussed
the need for a good quality GUI tool. I wrote an initial proof of
concept in Visual Basic 4 called pgManager, which was quickly
abandoned and replaced by pgAdmin, which was written using Visual
Basic 5. That proved very popular, but wasn’t a great design (I was
an IT Manager back then, not a developer). When we realised that
pgAdmin had reached the end of its life, I decided to start on
pgAdmin II in VB 6.

I locked myself away for some weeks and wrote
the code from the ground up, making sure I didn’t repeat past
mistakes. I was particularly proud of both the design of the
application, the cleanliness of the code and its structure. Some
years later the big issues we couldn’t resolve in pgAdmin II were
character set, internationalisation and multi-platform support.
pgAdmin III was a rewrite in C++ with wxWidgets that followed a
similar design to pgAdmin II, but ran on Linux, Mac OS X, FreeBSD
and Solaris as well as Windows, and could support users in
different countries who wanted an internationalised user interface
and the ability to use alternative character sets.

It’s now 12 years later and we’re currently in the
early stages of planning for pgAdmin 4, in which we expect to
attract more developers through the use of Python instead of C++,
and produce an application that will include all the useful
features of past versions of pgAdmin in a code base that can be run
either on the desktop or on a web server.

PostgreSQL is enjoying a bit of a resurgence
in popularity lately – what do you think is driving
this?

I think there are two reasons; following the
Oracle purchase of Sun (who had just purchased MySQL) people have
been looking for alternatives as they’ve become less happy with
their choice of DBMS. This has led to people realising that
PostgreSQL is fast, stable and has a much wider range of features
to offer – which in turn has led to further adoption as word
spreads. This effect has been quite visible if you follow related
discussions on sites like Slashdot – more and more people over the
years are talking about Postgres as their database of
choice.

What have been the most notable advances in
Postgres in the past two years, in your opinion?

Two features immediately spring to mind and I
think we’re about to see a third. The first was the Windows port in
v8.0. That required a huge amount of work as Postgres was
originally designed to run on Unix systems, but it’s led to
PostgreSQL being much more widely used both because developers can
run it on their Windows laptops and because software vendors have
been embedding it in their desktop applications – we see a lot of
users running a couple of different packages for tracking data in
poker games for example.

I think the next major advancement was the
addition of streaming replication and hot standby servers in v9.0.
These related technologies allowed users to stream the binary write
ahead log files to standby servers, which is extremely efficient
compared to traditional trigger based replication solutions. In
addition, the standby servers can be made “hot”, meaning they can
be used to run read-only workloads. For example, you can run your
backups or reporting workloads against your standby server to
reduce the load on the master.

I think the next big feature we’ll see will be
bi-directional (or multi-master) replication in the core server.
The infrastructure for this will be in the upcoming 9.4 release,
and allows a stream of data changes to be decoded from the binary
logs, again, without the use of triggers. This change data can then
be utilised in many ways, including with a replication engine which
is being worked on at the moment.

What are the pros and cons of PostgreSQL vs
NoSQL?

PostgreSQL is a fully fledged relational
database system which gives users the ability to model their data
and tightly control the constraints under which it’s stored, as
well as to manage complex atomic transactions. For many types of
application, this is crucial to ensure quality of data (for
example, there is no 31
st February as
some databases might have you believe)! NoSQL databases don’t
typically implement this strict level of control, which can result
in the user having a database containing documents (or data,
depending on the database in question) in a myriad of different
formats, perhaps containing invalid values. For the vast majority
of users I work with, that’s a real problem – they need to be sure
their data is clean, accurate, correct and consistent.

The downside of this is that the system is
naturally more complex – as a user you have to properly model your
data and design your schema, and as a sysadmin you have a harder
job scaling out horizontally and managing clusters of servers
(which I should note, is not actually an issue for the majority of
users – most applications do not need to be “web scale”). This is
what makes NoSQL databases attractive to some people – it’s quick
to get up and running and you don’t have to think about what your
data needs to look like. In my experience though, that can be a
short sighted view; I’ve seen numerous users getting bitten
post-deployment as a result.

That said – there were times when users needed a
NoSQL database, such as for storing document or key value pairs.
This is one of the reasons why we do have XML and JSON support in
PostgreSQL, as well as HStore, a key-value storage data type. In
fact, in PostgreSQL 9.4 – due to be released around September time
– we’ll be adding a new JSONB data type which initial benchmarks
run by community members have shown to out-perform MongoDB. We also
have a number of foreign data wrappers which allow you to query
external data sources including Redis and MongoDB as if they were
tables in PostgreSQL, allowing users to write complex SQL to
combine their relational and non-relational data in one
query.

Do you think the lustre is starting to wear
off NoSQL as people become more aware of its limitations for
certain applications?

Definitely – as I mentioned above, I’ve seen a
number of users run into serious problems and migrate back to
relational databases from NoSQL after they’ve hit serious
problems.

Which enterprises do you see as the best fit
for PostgreSQL, and are they and areas that you see as ripe for
penetration?

We see PostgreSQL in use in all manner of
environments, from home users playing poker and the scientific
community storing research data right through to some of the
world’s largest banks, insurance companies and government agencies.
I don’t think there is any one area that’s particularly ripe for
penetration – we’re seeing it across the board.

What do you hope to see added/changed in
PostgreSQL along the line?

One of the big things for me is monitoring.
PostgreSQL can be something of a black box when it comes to server
monitoring, and one of things we’re discussing at the moment is how
to improve that by giving DBAs the information they need to really
understand what their systems are doing. We can give the user a lot
of that information with tools like Postgres Enterprise Manager,
which I’m responsible for at EnterpriseDB, but we have to combine
data from various sources to give the user what they need. There
are improvements we could make to PostgreSQL itself to make it
easier to get at that information and so forth.

The other important topic for me at the moment
is vertical scaling. As hardware with greater numbers of sockets,
cores and threads becomes more common (and those numbers continue
to rise), we need to ensure that PostgreSQL can scale
appropriately. We’re actively working on that with support from
some of the major hardware vendors who have kindly given us access
to some very large machines.

Where do you think Oracle went wrong with
MySQL?

I think they went wrong in much the same way as
they did with other projects such as OpenOffice, which forked into
LibreOffice and Hudson which forked into Jenkins – they tried to
take control and drive the projects the way they wanted, rather
than the community wanted. PostgreSQL doesn’t have that problem –
and cannot have that problem – as there is no single legal entity
that owns the copyright to the PostgreSQL source code and no single
company controlling the project.

 

Author
Comments
comments powered by Disqus