“Hadoop was madness” – Interview on how Apache Spark is changing the banking sector
Spark has made some improvements over Hadoop but where are we now with this mess? John Davies will shed some light on this issue and point out the latter’s importance even as time goes by and Spark challenges its reign.
We’ve asked John Davies, speaker at the upcoming JAX Finance and CTO of C24, a London-based fast data company, about the impact of Apache Spark on the banking sector and why the financial industry is at the forefront of dealing with the massive and unstructured data. This is what he said:
JAXenter: John, in one of your talks at JAX Finance you are giving an overview on how the IT industry has been dealing with data during the last decades— up to now. As data is at the heart of almost all IT activities, especially in finance, it seems crucial to understand the specific paradigms of how to retrieve, access, store and analyze data. After „Big Data“ comes „Fast Data“— with Apache Spark being everybody’s darling and outperforming Hadoop, the former new kid on the block. How did that happen?
John Davies: Twenty years ago very few banks were linked up electronically other than price feeds. Risk was performed at the local level and the databases could handle the load, then the database was at the centre of the banking universe. Ten years later and everything is linked from front to middle to back office and everything revolves around messaging. While this drove a move to NoSQL to better handle the hierarchical messages, it also overwhelmed resources meaning we lost track of what was going on — the result was the crash of 2008. Since then legislation has demanded better visibility of risk and that requires huge analytics across distributed data sources.
To me Hadoop was madness, we went from relational to NoSQL to handle the complex hierarchy and then we have to put it back into relational again to run a distributed query.
Apache Spark is part of the way back to common sense but much of the big data we have today is because we’re making the data bigger than it needs to be, we’ve been lazy. By making the data smaller, leaner and faster (Fast Data) we can run Spark several orders of magnitude faster than Hadoop with a fraction of the work and complexity to get there.
JAXenter: As finance is, by definition, all about super structured and usually purely transaction-oriented data — do you see this industry being at the forefront of dealing with the massive and unstructured data or are others leading the data science game?
John Davies: While the message formats are usually very well defined in the finance industry you’d be surprise how much chaos there is hidden inside the well formatted messages. Corporate Actions for example are a nightmare to define electronically but even something as simple as a payment can, in itself, hide a lot of information not inherently in the message. The time, origin, previous payments etc. can be key to identifying AML (Anti-Money Laundering). We are already using AI (or the latest buzz word Machine Learning) in huge repositories to identify some seriously interesting, and illegal transactions.
JAXenter: Speaking about Fast Data and Analytics: it seems that the specific challenge always lies in the fact that, on the one hand, we need a robust, secure and fast infrastructure for that, but on the other hand everything must make sense to the business, meaning that business folks should be capable to use data analytics as “their own tool.”
John Davies: Actually infrastructure is not the only solution. As I mentioned above, people get very lazy and tend to just throw hardware at problems and forget about engaging their brains.
You are right though that there remains a careful balance between the technology and business, it’s a very symbiotic relationship these days, one can’t survive without the other.
JAXenter: In your session you are raising the question „What’s next?“ — can you give us a hint on what your answer will be at the conference?
John Davies: There is a bright and interesting future but I can tell you for free I doubt we’ll be refactoring everything to use blockchain, I will cover it though.
John Davies will be delivering three talks at JAX Finance which will focus on exploring the data aggregation and analytics scene: