Community Data License Agreement — Open source your datasets

Linux Foundation wants you to share data as easily as you share open source software

Jane Elizabeth
Community Data License Agreement
© Shutterstock / Andrey_Popov

How can we share large datasets without getting the lawyers involved? No one wants a nasty intellectual property rights fight on their hands. And so, the Linux Foundation has come up with the Community Data License Agreement, an open source licensing framework for collaborative communities to share “open” data. No lawyer needed.

A surprising amount of cutting-edge tech requires lots and lots of data. Look at neural networks, machine learning, and big data analytics: they all need vast amounts of data. The same goes for technologies like self-driving cars. But how can organizations and developers share data openly? The Linux Foundation has the answer: the Community Data License Agreement.

As they point out, open source communities have shown the power of open collaboration. Some of the most important software assets were built collaboratively with programmers all over the world. And now, the era of big data has emphasized the need to make these kinds of training sets more openly available.

But therein lies the problem: intellectual property law treats data differently than software. The commonly used OSI-approved licenses do not work well when applied to data.

No one is going to share the training sets of information if they think they’re going to get sued. Especially not cash-strapped universities or lean start-ups. (Lawyers are expensive, yo.)

Enter the Linux Foundation.

 SEE MORE: Facebook to relicense React, Jest, Flow, and Immutable.js under MIT license

The Community Data License Agreement

The Community Data License Agreement (CLDA for short) is an effort to define a licensing framework to support collaborative communities built around curating and sharing “open” data.  They are designed to allow individuals and organizations of all types to share data as easily as they share open source software.

The Linux Foundation explicitly designed these CLDA licenses with the needs of companies, organizations, and communities in mind. The licenses are intended to enable contributors and consumers of open datasets.

There are two types of CDLA licenses: a Sharing license and a Permissive license:

  • The Sharing License encourages contributions of data back into the community
  • The Permissive license has no requirements on recipients or contributors of open data.

Both types of licenses are designed to encourage and facilitate the productive use of this data. The differences between the licenses mean that data producers are allowed to make stipulations on what the users can do with the information. In choosing between the two licenses, these producers can deicde on which type is best aligned with their interests.

The existence of known, equal terms, with disclaimers and warranties mean that data producers and data users alike have a clear understanding and a standard framework to approach this information. Making rights and responsibilities clear for everyone is fair and allows everyone involved to make informed decisions.

Given that privacy concerns are all the rage right now, it’s important to note that the CLDA is explicitly data privacy agnostic. In practice, this means that both the publisher and curators of the data have to handle privacy concerns on their own and develop their own governance structure. CLDA is emphatically not involved in deciding how the community determines to handle the data privacy.

So, maybe you will need that lawyer after all.

SEE MORE: IBM announces Open Liberty, an open source runtime for Java microservices

The future of open source

It’s not a secret that tech’s love of open source isn’t always copacetic with Silicon Valley’s love of making money. Whether its Facebook’s licensing drama over React or Larry Ellison’s ongoing grudge-match against Google that’s put APIs on the spot, the ethos of open source isn’t without its challenges.

Despite the lawyers, tech has made great strides towards a more open future. After all, we’re not the only ones who think that open source is the reason behind the greatest innovations and developments of the past decades. Companies that have long held themselves apart from open source are now coming into the fold. Big names like IBM and Microsoft.

In fact, we talked about this a lot in the latest JAX Mag – All eyes on Open Source. We chatted with people from the Eclipse Foundation, The Apache Software Foundation, Cloud Foundry, Red Hat, Hyperledger, and more about the future of open source. Download your copy here!

Jane Elizabeth
Jane Elizabeth is an assistant editor for

Inline Feedbacks
View all comments