The rise of logical data fabrics – knit a virtual view of all enterprise data
It is time to stop “collecting” the data into a central repository and start “connecting” to the data at the sources. A new architecture—logical data fabric—facilitates this approach by gaining a virtual view of the data.
We are all familiar with fabrics – a cloth produced by weaving or knitting textile fibers. Why not apply that to data as well? Like the individual textile fibers, which come from different materials, colors, and texture, data comes in different formats and levels of quality, and from disparate locations. This variety creates problems for the data consumers, forcing them to spend an inordinate amount of time collating and formatting the data. As a result, these business users have less time to infer intelligence from the data or use it for strategic business purposes.
Companies have fought this data separation by physically consolidating the information together into a central repository, but such efforts have largely failed since new data keeps sprouting in other places. Now a new data management paradigm called logical data fabric has emerged and is gaining popularity. Logical data fabrics aim to leave the data in place but gain a unified view for the entire enterprise through a virtual approach.
This data fabric employs familiar data virtualization as a core technology, but automates many of the capabilities using AI and ML.
Centralization and Decentralization – How Data Comes Together Only to Move Apart Quickly
If you look at the evolution of data dating back to 1970s, information was stored largely in different spindles and punch cards. When users needed access to the data, they would have to load the appropriate data store and access it. Working with data in this manner was extremely slow, time consuming, and frustrating, to say the least.
The need for more structured storage gave way to the invention of databases in the 80s. Data arranged in rows and columns made it easy for not only storing it but also quick retrieval. SQL became a popular language. This relational concept led to an exponential growth of databases, with many applications, such as Financials, Supply Chain, and ERP, invented to use the database for storage. And quickly, the databases multiplied and became so siloed that there was no way to gain a unified view of the business across all of the databases and applications.
The need to once again unify the data, this time for analytical purposes, gave rise to data warehouses in the 90s. A new storage format was invented to unify the data, and so were data movement tools like ETL to extract the data from the numerous databases, transform from the relational format to star schema, and then load them into a data warehouse. Yet, SQL still maintained its popularity as the key language to extract the data from these new repositories. As data warehouses grew in size, small data marts started springing up to depict a departmental view of the data for sales, marketing, or finance. It was not uncommon for a single organization to have multiple data warehouses and data marts. This multiplicity once again siloed the data across the different lines of businesses inhibiting an enterprise view of the data, and giving rise to an uber “enterprise” data warehouse that housed all of the data in a single database instead of splitting them into smaller ones.
Then came the millennium and the rise of unstructured data – data from social media, streaming data from devices such as wellheads, etc. Quickly, the volume of unstructured data began exploding to the point that more than 80 percent of the data within an organization became unstructured. Now data warehouses lost their appeal as the “single source of the truth,” because unstructured data could no longer be stored in data warehouses. A different repository was needed to store the vast amounts of structured and unstructured data available.
Thus, in the 2010s, big data systems such as Hadoop gained popularity as a way to store all of an enterprises’ structured and unstructured data. It made it easy to store the data in whatever format it was. Calling it the data lake, the system promised to replace data warehouses as the single source of the truth. However, such expectations never materialized. Given the volume, variety, and the velocity of data, it became an improbable exercise to move all of the data into such an enormous repository. The ability to find the data became another problem, analogous to the proverbial searching for a needle in a haystack. For these reasons, many companies started splitting the data lakes into smaller ones, separate for marketing, sales, or finance lines of businesses. Sound familiar?
Every decade, computer scientists have been inventing a centralized repository for storing the data in a single place, only to find that data quickly moves apart into decentralized components.
Data Gravity: Why Centralizing Data in a Central Physical Repository Doesn’t Work
Storing the data into a centralized repository is, no doubt, extremely beneficial. It is easy for business users to find the data, identify the relationships and associations among the data, and govern it. However, the notion of data gravity pulls the data towards the sources where they originate and away from the central repository. Computer scientists have been going against the data gravity to centralize the data into a single physical repository—databases, data warehouses, and data lakes. They work for some period of time, just like throwing a ball up in the air, which goes up until the point to which the velocity of the ball exceeds the force of gravity. Soon after, the forces reverse, and the ball falls back towards the earth. Just like that, data moves away from the central repository back to the sources where it naturally belongs.
It is time to give up the fight to forcefully consolidate the data into a central physical repository and yield to data gravity by leaving the data in the sources. Then how can organizations gain a single view of the data?
Stop Collecting and Start Connecting: Why Logical Data Fabric is a Better Solution
It is time to stop “collecting” the data into a central repository and start “connecting” to the data at the sources. A new architecture—logical data fabric—facilitates this approach by gaining a virtual view of the data. It yields to the data gravity by leaving the data at the sources where they are created, but knits them together to weave a unified view of all enterprise data irrespective of the data’s location—on-premises or in the cloud—, format—structured or unstructured—, or latency—data in motion or data in rest. This approach liberates the data to be innovated at the sources while bringing it together in a virtual fashion for the benefits of data discovery, management, and governance.
By using data virtualization as its core technology, the logical data fabric overcomes the limitations of the physical repository. No longer do IT teams have to program ETL scripts to move data from its source into a temporary repository for the purposes of transforming the data before loading it into the target systems. Data virtualization can perform transformations on the fly, which saves on storage costs. Besides, data virtualization’s low-code/no-code approach significantly saves on the number of developers and the amount of effort required to develop unified views.
What else can logical data fabrics do? By becoming the enterprise data layer, data fabrics catalog all of the data assets within the enterprise, including the sources from where the data originates, its format, and its relationship and association with other data assets. With this virtual data catalog, business users can perform data discovery, document business definitions for the purposes of data governance, and learn the history and lineage of how this data has evolved, all in one place. Because they no longer have to go to different systems to perform these actions, the logical data fabric improves the efficacy of the business users.
In addition, the logical data fabric has powerful data preparation capabilities to format the data into a normalized format ready for business consumption. Business users can access the data within their favorite analytical, operational, web, or mobile applications.
The Future of Logical Data Fabrics? Automating with AI / ML
The next step for logical data fabrics is to embrace AI and ML to automate some of the routine tasks. Since the data is constantly changing—new data sources are added, legacy ones are sunset, and new forms of data are innovated—, data fabrics can use AI/ ML to learn the changes and automatically adapt the integration of the new data into unified views and deliver them in the appropriate format to the business users. They can also encourage better collaboration by understanding the usage behavior of certain users and providing suggestions to others about the availability of new data sets for exploration and use.
Not only do logical data fabrics use AI and ML to improve their own function, they also enable data scientists working on data science, AI, and ML projects to be more effective. By providing all enterprise data with the ability to build multiple logical models, these data fabrics enable the data scientists to optimize the models so that they can come up with questions and insights that benefit the organizations to improve their performance.
Data Fabrics are Already in Use with More to Come
Data fabrics are no longer just a concept. They are already gaining adoption, thanks to the tremendous benefits outlined above. By adhering to the principles of data gravity, providing business-friendly views of all of the enterprise data, and automating using AI and ML, data fabrics will be one of the hottest trends to track in 2020 and beyond.