Interview with Robert Treat, Postgres Technical Fellow at Instaclustr

“PostgreSQL has reached every part of industry and every part of government”

© Shutterstock / Renee Heetfeld

We had a chat with Robert Treat, Postgres Technical Fellow at the open source data platform company Instaclustr about PostgreSQL’s popularity, its use cases, and which open source technologies it pairs with. Learn how IT teams and organizations can overcome hurdles when scaling PostgreSQL.

JAXenter: PostgreSQL has been growing in popularity as a SQL alternative to Oracle, SQL Server, and others; what use cases is the open source database best suited for, and when isn’t PostgreSQL going to be a fit?

Robert Treat: What Postgres does best is provide you with an enterprise-level database with all the advanced features at an unbeatable price combined and with extremely flexible open source licensing. Because it can provide so many different query and schema options, it is really great for prototyping new services and applications where you don’t fully know what functionality you need from your datastore (but you do know you’ll want to be able to do basic analysis of the data you have, to determine how it all fits together). On the operations side, the lightweight install and simple licensing make it a great option for building (or migrating) more traditional RDBMS-based applications and services into a DevOps-oriented deployment and production system.

That said, one of the great things about the current state of open source databases is that you can find a lot of purpose-built solutions that can make trade-offs to optimize for specific use cases. As an example, Postgres has some of the best full-text search capabilities you will find in an RDBMS, and we often suggest that folks building new applications should “Just Use Postgres” for full-text searching until they understand whether that’s going to be a mission critical service. If that turns out to be the case, then you can bring in Elasticsearch and run Open Distro in all its glory. Similarly, for something like CDC pipelines or building message brokers, you can do that in Postgres, but it might be a lot easier to turn to something like Apache Kafka.

SEE ALSO: “Cloud technology makes experimentation cheap”

JAXenter: For teams building up an open source data stack, what other open source technologies does PostgreSQL pair well with?

Robert Treat: There’s been growing popularity – and rightfully so – building open source data-layers around PostgreSQL, Apache Cassandra, and Apache Kafka. As an overall data strategy, this open source triad can create a sturdy three-legged stool at a fraction of the cost of proprietary solutions. PostgreSQL obviously brings particularly advanced relational database capabilities for structured data.

Apache Cassandra continues to prove its power as an enterprise-trusted open source NoSQL database (and expect adoption to only increase as Cassandra 4.0 hits general availability relatively soon). Apache Kafka then delivers open source data stream processing capabilities that are second to none. Combined, these three open source technologies play particularly complimentary roles to the data backbone they provide IT teams. And – importantly – all three work exceptionally well in their pure open source versions without the need to pony up for open core or proprietary alternatives.

JAXenter: PostgreSQL can be quick to set up but a challenge at scale. Specifically, what are the biggest hurdles to scale and optimization that IT teams need to understand, and how can they overcome them?

Robert Treat: First, that’s exactly right. PostgreSQL is among the easiest databases to spin up – you can install it and have it up and running in minutes. You can throw it in Docker containers if you want, do virtualization, and don’t have to deal with licensing. You don’t need a DBA to get started and that, combined with being open source, has certainly contributed to PostgreSQL’s rising popularity.

But inevitably at some point, as a business starts to scale, someone says “hey, our data volumes are a lot higher than when we started, we really ought to look at whether we’ve got our PostgreSQL environment under control.” They start trying to figure out ‘what do we need to start tuning to improve output?,’ ‘what should we be monitoring?,’ ‘we can’t just do simple backups anymore, we need to care more about failover,’ etc. Maybe you find yourself on a legacy version of PostgreSQL that’s worked fine for a couple years or so, but there are major version upgrades now available that would provide more features or usability. The question then becomes, can you confidently make that migration now that your database environments are more mature and mission-critical than they were when you started.

Continually optimizing PostgreSQL performance (and, relatedly, ensuring disaster recovery) can also be a tricky hurdle at scale. By way of comparison, Cassandra has a pretty straightforward model for how you scale up in that system. In the PostgreSQL world, there are multiple types of replication and often myriad reasons you might want to do replication, how you would want to spread queries around, etc. – so there’s definitely a more complicated set of trade-offs that you have to navigate as data volumes grow.

JAXenter: What are IT teams’ biggest lingering misconceptions around PostgreSQL?

Robert Treat: Particularly with organizations that might be newer to open source data-layer technologies, I sometimes hear the question “but is PostgreSQL ready for primetime, is this something that can work at enterprise scale and for mission-critical environments?” The answer is a clear and resounding ‘yes’ – your banks, national military systems, the largest retailers, etc, all are using PostgreSQL. Sure, they’ll use it with commercialized software depending on the use case, but PostgreSQL has reached every part of industry and every part of government.

I think the other misconception that’s still pervasive is around the PostgreSQL replication feature set. PostgreSQL was relatively late to expand its feature set and there just isn’t as much third-party replication tooling as there is for some other open source database systems. So that can sometimes feed into questions from users asking “can we use PostgreSQL for high availability where we need 24/7 uptime – and what are the considerations that have to be made to ensure that?” Again, compared to something like the multi-node Cassandra database, updates there can be a little more streamlined than with PostgreSQL. So that’s something you definitely have to account for, and ideally understand going into your PostgreSQL strategy so it’s not a surprise later on.

JAXenter: As an open source project, is security a concern for larger enterprises utilizing the database?

Robert Treat: While many people these days are paid to work on Postgres full-time, at its core Postgres is still a community-driven project that organizes around the idea of volunteerism. This makes it difficult for the project to accomplish tasks that are less about software development and more about paperwork; for example certifications and compliance. Of course, it does get used in even the strictest environments, but the way that typically happens is a company will want to use PostgreSQL in an environment requiring specific certifications, so they will reach out to a PostgreSQL consultancy to leverage the expertise to help them get through the needed certification process. Because that’s done at a corporate-level rather than through the community, it’s not necessarily obvious, especially if the client company considers its use of Postgres a trade-secret. If you go to the PostgreSQL site, where the open source code is and where the community works, there aren’t banners proclaiming “PCI-compliant database” or “HIPAA-compliant database.”

But are there thousands of companies running PCI-compliant, HIPAA-compliant, etc. workloads on PostgreSQL? Absolutely. The difference is that proprietary database vendors usually will have gone through those processes publicly and market that fact very publicly. But with organizations like NASA and the defense department using PostgreSQL, there’s more than enough tangential evidence to support PostgreSQL’s security and compliance hardiness.

SEE ALSO: A hands-on tutorial for real-time analytics on MongoDB using Rockset

JAXenter: Stepping away from PostgreSQL specifically, the tug-of-war between open source projects and open core products has intensified this year, with several open source technologies changing to more closed licenses. What does the future of open source look like? Are closed licenses a necessary way for monetization, or is there a path for full-featured open source projects to thrive?

Robert Treat: It has definitely been a busy year for open source license changes and, unfortunately, the needle on most of those changes has moved toward more restrictive usage. For some organizations, perhaps there’s still a comfort in using proprietary or open core technologies versus pure open source alternatives. But the strength of open source communities has never been stronger, and the longevity of mature technologies like PostgreSQL is clear. Combine that with the data portability and drastic cost advantages inherent to pure open source, and it will be harder and harder for even the largest enterprises to justify a proprietary technology stack when more-than-capable open source technologies are ready for the task at hand.

Inline Feedbacks
View all comments