On June 4, Microsoft announced it would be acquiring GitHub for $7.5 billion US. See the official statements: Microsoft announcement, Microsoft blog post, GitHub blog post. For some perspective, the market cap of STEEM is currently estimated to be just under $0.5 billion US.
This deal got a lot of people of thinking. GitHub has been a huge force in the rise of open source software, and is now the development venue for many of the most important software projects. Microsoft, on the other hand, has a dirty history of actively aiming to undermine open standards and software. As such, Microsoft's acquisition of GitHub concerned many Steemians (such as @mepatriot) as well as many scientists / @steemstem followers, since GitHub has become a core part of the open science workflow.
Nature News article
Nature News is one of the premier venues for science related news. Andrew Silver interviewed me and several other researchers for a Nature News story titled Microsoft’s purchase of GitHub leaves some scientists uneasy.
It's a good summary of the different viewpoints within the scientific community on Microsoft's acquisition. Here's the portion discussing my oppinion:
Daniel Himmelstein, a data scientist at the University of Pennsylvania in Philadelphia, says that GitHub is problematic for researchers, but that this has nothing to do with the Microsoft acquisition.
GitHub hosts repositories of code or data created by the open-source Git, which can be distributed among users, so the repositories themselves can still have backups if a server dies. However, certain information, such as comments on projects and requests to add code, are stored on GitHub’s website. Some of these data are an important part of the scientific record, says Himmelstein, but they are at risk from outages, surveillance or censorship. “Regardless of the Microsoft acquisition, GitHub, as a centralized and closed company, possesses a dangerous level of control over the open-source ecosystem,” he says.
Scientists face fewer threats, says Himmelstein, if they put their work on decentralized hosting systems, such as the git-ssb project, which don’t have a single point of failure. “To the extent that the Microsoft acquisition makes people aware of the centralized nature of GitHub,” he says, “that’s a positive thing.”
As with most interviews, only a small portion of one's commentary makes it into the final story. As such, I like to release my full comments once the story is published. This practice has several benefits:
- Let's interested readers explore the interview in more depth. Responding to questions can take considerable time, so I want to maximize the impact of my time investment.
- Adds transparency to the media process and helps keep quotations honest. Note that Andrew Silver and other Nature News journalists (such as @simoxenham) I've interacted with have all been excellent and have never misrepresented my comments. However, this is not true throughout all the media and hence more transparency is better.
- Posting my responses publicly makes it easier for me to find them in the future!
In general, I let the journalist interviewing me know that I plan to post my responses. I haven't encountered any journalists that have voiced displeasure. In fact, it can be beneficial to the journalist by bringing more visibility to their article and process.
So without any further ado, he're my responses to Andrew's questions (quoted). Our correspondences were by email.
Initial responses from June 4
So with the MIcrosoft acquisition of Github--is that actually going to change how science is done?
I don't think there will be much change in the short-term. It will likely be several years before the GitHub platform changes as a result of this acquisition (for better or for worse). There's no pressing need for scientists to stop using GitHub. Most academic researchers use GitHub's open source or educational plans, which provide free repository hosting. Therefore, there's not much incentive to switch providers, at least until GitHub introduces changes that degrade the user experience.
Do you plan on running any experiments different, moving any projects to another platform? Or do you know of any others actually going to do so? And do you know of any lobbying efforts, perhaps?
While I'm a huge fan of the service GitHub provides, it's important to remember that GitHub itself is a closed source website. While the underlying git version control is open, the GitHub website and infrastructure is closed. Therefore, regardless of the Microsoft acquisition, GitHub, as a centralized & closed company, possesses a dangerous level of control over the open source ecosystem. Fortunately, GitHub does have a good API and the most crucial content is usually part of the underlying git repositories, so migrating to alternatives is not difficult.
Over the past year, I've taken several steps to avoid becoming overly reliant on GitHub. I created a repository on GitLab and feel confident that I could migrate my 80 GitHub repositories to GitLab (whose Community Edition is open source) in just a few days. I've also started using full URLs when linking to issues or pull requests in commit messages to help prevent any ambiguity in the case we switch providers.
Nonetheless, when GitHub goes down, I do find myself twiddling my thumbs, as I can't do any work. However, in the longterm, should major problems arise with GitHub, scientists should be able to migrate platforms without too much hassle. To the extent that the Microsoft acquisition makes people aware of the centralized nature of GitHub, that's a positive thing... as they probably should have considered this before.
Follow up responses from June 12
Why is making people aware of the centralized Nature of GitHub a positive thing?
The centralization is a weakness in the resiliency of the scientific software infrastructure. Git is a distributed protocol, so at least the code and its history are safer from issues of centralization. However, in science (and open source software more generally), peer review, e.g. comments on issues and pull requests, is an essential part of a project and should be preserved as part of the scientific record. So naturally, it's good to get researchers thinking about whether their peer review of repositories is hosted in a reliable way: smart decentralization being the best guarantor of reliability.
Also, aren’t GitLab and Bitbucket also centralized? Couldn’t they also be acquired?
Bitbucket yes. GitLab yes, but the community edition of its software is open source (see this blog post). Thus, GitLab users have the ability to host GitLab CE themselves. If GitLab made changes the community disliked, others could host GitLab clones. One could image that universities would be a natural fit for hosting GitLab instances for their faculty's repositories.
There are more decentralized designs than GitLab (which is centralized but open source and hence forkable). On twitter, I saw someone mention git-ssb. However, I have not tried out these alternatives, but think they would be the best longterm solution.
More follow up responses from June 12
What’s not reliable about centralization? What are some things that could happen to the record if it’s centralized that wouldn’t happen if it’s decentralized?
Centralized designs have single points of failure. Failure in this sense could take multiple forms:
- GitHub experiences a service outage (major outages have been happen a few times a year).
- GitHub does not provide the features users want
- GitHub performs surveillance or advertising (features users don't want)
- GitHub starts to charge users
- GitHub censors content for legal, ethical, or political reasons
- GitHub becomes owned by a company that is at odds with users' free software ethics.
Quote from here: "Decentralized architectures tend to be much more resilient for most threat models, since there's no single points of failure and any component is swappable."