My full comments to Nature News on Microsoft's acquisition of GitHub

in #science3 years ago (edited)

On June 4, Microsoft announced it would be acquiring GitHub for $7.5 billion US. See the official statements: Microsoft announcement, Microsoft blog post, GitHub blog post. For some perspective, the market cap of STEEM is currently estimated to be just under $0.5 billion US.

This deal got a lot of people of thinking. GitHub has been a huge force in the rise of open source software, and is now the development venue for many of the most important software projects. Microsoft, on the other hand, has a dirty history of actively aiming to undermine open standards and software. As such, Microsoft's acquisition of GitHub concerned many Steemians (such as @mepatriot) as well as many scientists / @steemstem followers, since GitHub has become a core part of the open science workflow.

Nature News article

Nature News is one of the premier venues for science related news. Andrew Silver interviewed me and several other researchers for a Nature News story titled Microsoft’s purchase of GitHub leaves some scientists uneasy.

microsoft-github-nature-news.png

It's a good summary of the different viewpoints within the scientific community on Microsoft's acquisition. Here's the portion discussing my oppinion:

Daniel Himmelstein, a data scientist at the University of Pennsylvania in Philadelphia, says that GitHub is problematic for researchers, but that this has nothing to do with the Microsoft acquisition.

GitHub hosts repositories of code or data created by the open-source Git, which can be distributed among users, so the repositories themselves can still have backups if a server dies. However, certain information, such as comments on projects and requests to add code, are stored on GitHub’s website. Some of these data are an important part of the scientific record, says Himmelstein, but they are at risk from outages, surveillance or censorship. “Regardless of the Microsoft acquisition, GitHub, as a centralized and closed company, possesses a dangerous level of control over the open-source ecosystem,” he says.

Scientists face fewer threats, says Himmelstein, if they put their work on decentralized hosting systems, such as the git-ssb project, which don’t have a single point of failure. “To the extent that the Microsoft acquisition makes people aware of the centralized nature of GitHub,” he says, “that’s a positive thing.”

Full commens

As with most interviews, only a small portion of one's commentary makes it into the final story. As such, I like to release my full comments once the story is published. This practice has several benefits:

  1. Let's interested readers explore the interview in more depth. Responding to questions can take considerable time, so I want to maximize the impact of my time investment.
  2. Adds transparency to the media process and helps keep quotations honest. Note that Andrew Silver and other Nature News journalists (such as @simoxenham) I've interacted with have all been excellent and have never misrepresented my comments. However, this is not true throughout all the media and hence more transparency is better.
  3. Posting my responses publicly makes it easier for me to find them in the future!

In general, I let the journalist interviewing me know that I plan to post my responses. I haven't encountered any journalists that have voiced displeasure. In fact, it can be beneficial to the journalist by bringing more visibility to their article and process.

So without any further ado, he're my responses to Andrew's questions (quoted). Our correspondences were by email.

Initial responses from June 4

So with the MIcrosoft acquisition of Github--is that actually going to change how science is done?

I don't think there will be much change in the short-term. It will likely be several years before the GitHub platform changes as a result of this acquisition (for better or for worse). There's no pressing need for scientists to stop using GitHub. Most academic researchers use GitHub's open source or educational plans, which provide free repository hosting. Therefore, there's not much incentive to switch providers, at least until GitHub introduces changes that degrade the user experience.

Do you plan on running any experiments different, moving any projects to another platform? Or do you know of any others actually going to do so? And do you know of any lobbying efforts, perhaps?

While I'm a huge fan of the service GitHub provides, it's important to remember that GitHub itself is a closed source website. While the underlying git version control is open, the GitHub website and infrastructure is closed. Therefore, regardless of the Microsoft acquisition, GitHub, as a centralized & closed company, possesses a dangerous level of control over the open source ecosystem. Fortunately, GitHub does have a good API and the most crucial content is usually part of the underlying git repositories, so migrating to alternatives is not difficult.

Over the past year, I've taken several steps to avoid becoming overly reliant on GitHub. I created a repository on GitLab and feel confident that I could migrate my 80 GitHub repositories to GitLab (whose Community Edition is open source) in just a few days. I've also started using full URLs when linking to issues or pull requests in commit messages to help prevent any ambiguity in the case we switch providers.

Nonetheless, when GitHub goes down, I do find myself twiddling my thumbs, as I can't do any work. However, in the longterm, should major problems arise with GitHub, scientists should be able to migrate platforms without too much hassle. To the extent that the Microsoft acquisition makes people aware of the centralized nature of GitHub, that's a positive thing... as they probably should have considered this before.

Follow up responses from June 12

Why is making people aware of the centralized Nature of GitHub a positive thing?

The centralization is a weakness in the resiliency of the scientific software infrastructure. Git is a distributed protocol, so at least the code and its history are safer from issues of centralization. However, in science (and open source software more generally), peer review, e.g. comments on issues and pull requests, is an essential part of a project and should be preserved as part of the scientific record. So naturally, it's good to get researchers thinking about whether their peer review of repositories is hosted in a reliable way: smart decentralization being the best guarantor of reliability.

Also, aren’t GitLab and Bitbucket also centralized? Couldn’t they also be acquired?

Bitbucket yes. GitLab yes, but the community edition of its software is open source (see this blog post). Thus, GitLab users have the ability to host GitLab CE themselves. If GitLab made changes the community disliked, others could host GitLab clones. One could image that universities would be a natural fit for hosting GitLab instances for their faculty's repositories.

There are more decentralized designs than GitLab (which is centralized but open source and hence forkable). On twitter, I saw someone mention git-ssb. However, I have not tried out these alternatives, but think they would be the best longterm solution.

More follow up responses from June 12

What’s not reliable about centralization? What are some things that could happen to the record if it’s centralized that wouldn’t happen if it’s decentralized?

Centralized designs have single points of failure. Failure in this sense could take multiple forms:

  • GitHub experiences a service outage (major outages have been happen a few times a year).
  • GitHub does not provide the features users want
  • GitHub performs surveillance or advertising (features users don't want)
  • GitHub starts to charge users
  • GitHub censors content for legal, ethical, or political reasons
  • GitHub becomes owned by a company that is at odds with users' free software ethics.

Quote from here: "Decentralized architectures tend to be much more resilient for most threat models, since there's no single points of failure and any component is swappable."

Sort:  

You didn't want to submit to Nature as a student but now they come to you for commentary, cool and ironic. (I watched the PhD dissertation on your blog :-p )

Thanks! Yeah I was stoked the live stream / recording worked. It was from my iPhone, so the fact that it didn't go offline or break was a miracle in its own right.

You didn't want to submit to Nature as a student but now they come to you for commentary

For those who don't know my view, it's that scholarly publications should be freely available and openly licensed, since they're usually publicly funded. Nature is the most prestigious journal, but still applies the outdated subscription model to most of its articles. It's sad that much of the most impactfull science gets locked behind paywalls with licenses that prohibit sharing and reusing the content. Yuck!

Thanks for sharing this @dhimmel, I've passed it on to some friends in the medical field who I expect would be very curious to know more about what you are doing. Great work!

“To the extent that the Microsoft acquisition makes people aware of the centralized nature of GitHub,”

A very much valuable point that is. I myself never thought of it. Neither people in my circle was talking about the centralized nature of Github. Instead, all are showing fanboyism to join gitlab.


For those who are searching for git-ssb link. It is here

Great point @dexterdev. GitLab Community Edition is an improvement from GitHub because it's source code is open. However, switching to the proprietary aspects of GitLab from GitHub, isn't a real advance. As we know, the point is not whether Microsoft owns GitHub but whether anyone can have centralized control over the tools required for software development. In cryptocurrency, most veterans understand this point well. Good projects are not those that are controlled by a single benevolent entity. Good projects are decentralized and incentivized such that the benevolence of any individual participant is irrelevant.

Good projects are not those that are controlled by a single benevolent entity. Good projects are decentralized and incentivized such that the benevolence of any individual participant is irrelevant.

Exactly.

Another decentralized alternative is SIT - Serverless Information Tracker. Ironically, the site is basically a link to a Github repo, but the issues for the repo are tracked only in SIT itself.

Some people say the censorship already started because a troll was removed from the Trending list.

Interesting article @dhimmel. I was unaware that Microsoft was planning to acquire GitHub. While my brain is still trying to process a lot of the jargon used, what do you think this would mean for the likes of Steemit and how it uses GitHub for source code? If I am not asking this properly please let me know and I will try to rephrase the question.

Much of the source code related to Steem is stored on GitHub. This includes the main protocol implementation, the steemit dot com source, and the busy dot com source. In fact, I'd gander that most open source tools from http://steemtools.com are hosted on GitHub.

Now, since GitHub stores code using git, which is an open source distributed protocol, most the source code for most projects is stored in a redundant way. Each contributor likely has a copy, although perhaps it's a bit outdated. The problem is that GitHub provides many important tools that are not part of the git repository. This includes issues (comments and discussions) and pull requests (requests to update the codebase). These are crucial aspects of a repository's history. They include all of the peer review details.

In the short term, Steem projects don't have a ton to worry about. GitHub is not going to change overnight from this acquisition. However, I think it underscores the importance of trying to move issues and pull requests to decentralized platforms. @utopian-io has started to move some discussion and peer review of code to the Steem blockchain. But there's a long way to go until anything replaces the utility and adoption of GitHub.

Got it! Thanks for that explanation in layman's terms. It made it much easier for me to follow! Sometimes when I wake up and see technical stuff right away, my brain doesn't always follow it at once! :) How is your brother doing in his misson in the Peace Corps?

@jhimmel is doing well last I spoke to him! We went over how to convert SBD into STEEM into VESTS (Power).

Nice! I've been doing that lately as well! I looked for a post from him a few days ago. Will check in a few minutes to see if there any posts. Thanks for the update!

Hope GitHub won't develop into more rigid, closed platform.

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 10 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 18 SBD worth and should receive 99 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

Very unusual acquisition by Microsoft. Unless they are going into more decentralized waters themselves or have some hidden agenda, I don't get it.

Congratulations @dhimmel! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes received

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @steemitboard!


Participate in the SteemitBoard World Cup Contest!
Collect World Cup badges and win free SBD
Support the Gold Sponsors of the contest: @good-karma and @lukestokes


Do you like SteemitBoard's project? Then Vote for its witness and get one more award!

May I call it giant marry another giant. If genetic process is good, they will have baby giant(s). LoL

I think that from June 18 all crypto-currencies will start to rise. Crypto currency has already been printed. My forecast is justified ???