“Setting up an archive of publicly available source code is urgent”
Software Heritage’s goal is to collect, preserve and share the entire Software Commons, that is, all publicly available software in source code form. We talked to Stefano Zacchiroli, co-founder and CTO of Software Heritage, about the mission of this nonprofit initiative, the fragile state of software and the support they receive from countless well-known IT organizations.
JAXenter: What is Software Heritage and what is its mission?
Stefano Zacchiroli: Software Heritage is a recently unveiled nonprofit initiative to collect for the very long term, preserve, and share the entire Software Commons, that is, all publicly available software in source code form. Our mission encompasses preserving all past and present Free/Open Source Software (FOSS) as well as other bits of publicly available software.
JAXenter: How did you come up with this idea? Did we need an initiative to preserve all software that is publicly available in source code form?
Stefano Zacchiroli: As computer scientists by trade, Free Software activists by vocation, and frequent collaborators with industries for R&D projects, a couple of years ago we noticed three seemingly unrelated needs: preserving the vast human cultural heritage that is embedded in source code; improving the sad state of scientific reproducibility when it comes to software; providing the equivalent of a “serial number” for FOSS products that are nowadays ubiquitous in the IT industry. From there we quickly realized that a comprehensive, curated, independently-run archive of all publicly available source code would go a long way in addressing all those needs.
Software Heritage means having a single logical place where developers can go to find source code that used to be publicly available but has disappeared from its previous hosting place.
JAXenter: What is the endgame of this initiative?
Stefano Zacchiroli: Having a single logical place where developers can go to find source code that used to be publicly available but has disappeared from its previous hosting place. The ambition of Software Heritage is that a single (logical) place, with massive (physical) geographic replication worldwide to minimize chances of ever losing a single bit of source code. Realizing this vision requires a good out-of-the-box coverage of source code hosting sites, as well as UIs that allow deposit source code archives one-off.
JAXenter: Why do you say that software is fragile and we are starting to lose it?
Stefano Zacchiroli: It is the curse and blessing of all digital information. On the one hand, it is trivial and cheap to copy around. On the other hand, losing something forever is often just one “rm -rf” command away (and I’m ignoring the physical decay of digital media here, which is another increasingly more common cause for permanent loss of digital information). Source code is no exception to this but is also becoming more and more important to society.
IT companies want both an independent, nonprofit actor active in the field of public software preservation, and a neutral place where they can cooperate on building a comprehensive open database of FOSS provenance and licensing information.
Losing a picture that is dear to our hearts might be very hurtful. Losing the source code that contains the knowledge required to interpret a (non-standard) graphics file format might imply, down the line of the centuries, losing access to significant shares of our collective knowledge as human beings. Looking at shorter time frames, losing the source code of software that is in use by the industry is something that has unfortunately already happened in the past (e.g., during the infamous “year 2000 problem”) and cost hundreds of billions of dollars to fix. It’s pretty evident that setting up an archive of publicly available source code is, at this point, not only needed but also urgent. Once popular third-party source code hosting platforms have either already disappeared (e.g., Gitorious) or are in the process of shutting down (e.g., Google Code). Nobody knows who will be next.
JAXenter: You have the support of countless well-known companies and international organizations such as GitHub, Microsoft, the Eclipse Foundation, the Linux Foundation and more. What attracted them to offer their support?
Stefano Zacchiroli: Their reasons for supporting us are distributed along the lines of the initial needs that we set to address: human heritage, science, industry. Cultural and Free Software activist institutions are well aware of the cultural value of FOSS and care about reducing the risk of losing it. Scientific institutions see the potential for research, both on the front of scientific reproducibility and on that of large scale, “big code” software analysis. IT companies want both an independent, nonprofit actor active in the field of public software preservation, and a neutral place where they can cooperate on building a comprehensive open database of FOSS provenance and licensing information.
JAXenter: What does it take to protect humanity’s software legacy?
Stefano Zacchiroli: Lots of help! We’re a very dedicated but also very small team and we welcome help from all interested parties:
- Companies that are aligned with our mission can join our sponsorship program to support our work.
- Developers can participate as they usually do in FOSS projects: join our development mailing list or IRC channel, and dive into our code to submit bug reports or patches.
- Users can contribute by curating content on our wiki that most notably hosts a suggestion box of endangered source code that we should archive. Raising awareness of the project by promoting it with peers and on social media is very welcome too.
Thank you very much!