It Takes a Community: The Open Source Challenge

code icon and other figures around a display showing three persons, illustration - Credit: MarcoVector

It Takes a Community: The Open Source Challenge
Communications of the ACM, May 2022, Vol. 65 No. 5, Pages 48-55
Practice
By Reynold Xin, Wes McKinney, Alan Gates, Chris McCubbin

“REYNOLD XIN: Open source is not necessarily about free software. Instead, it has more to do with the inherent interest companies have in building ecosystems and communities that will help them lower their cost of hiring new employees and then ramping them up.

 

Of the many challenges faced by open source developers, among the most daunting are some that other programmers scarcely ever think about. And that’s because most programmers work in settings where “other people” attend to such matters—people who work in the legal department or human resources, for example. But when there are not any people like that to turn to, what then?

 

Building a successful open source community depends on many different elements, some of which are familiar to any developer—a clear and present market opportunity, an intelligent approach, efficient coding, and so forth. Just as important are the skills to recruit, to inspire, to mentor, to manage, and to mediate disputes—all without the use of various forms of compensation to reward and provide incentives to contributors.

 

What exactly does it take to pull all that off? We will let people with track records as leaders of some of the most successful open source projects yet mounted address that from their own experience. Participating in the discussion that follows are Reynold Xin, chief architect of Databricks, best known for his work on Apache Spark; Alan Gates, co-founder of Hortonworks, who helped develop Hadoop, Pig, HCatalog, and Hive while at Yahoo Labs; and Wes McKinney, founder of Ursa Labs, responsible for creating pandas (Python Data Analysis Library), and currently charged with leading the Apache Arrow effort.

 

CHRIS MCCUBBIN: Linux was released as open source in 1992. Then came a second wave of open source offerings that emerged throughout the dot-com era. What’s it like at this point to launch an open source project?

 

REYNOLD XIN: One big difference is that the whole foundation concept took hold. Linux was basically just a hobby project early on, and, in that respect, it was similar to a lot of the other open source projects started back in the 1990s. Now you have the Linux Foundation, which has a multimillion-dollar annual operating budget. And while the Apache Software Foundation, which is run by volunteers, doesn’t have an operating budget anything like that, it has managed to create a significant brand for itself.

 

One of the reasons a lot of open source projects, from the late 1990s through 2010 in particular, started out as foundations was so they would have a better way to deal with the communities that grew up around those projects. Over the past few years, that trend has reversed a little. Largely thanks to the rise of GitHub, more and more open source projects now launch simply by putting a repository there. Many of the projects that have started out in this way have managed to achieve a fair amount of success without any sort of help from a foundation. I definitely see that as being one of the more important current trends.

 

MCCUBBIN: Certainly, the field has gotten to be a lot more congested of late. Which is to say, for every problem, it now seems there are at least a few projects offering potential solutions. But it can be hard sometimes to figure out which of those are being actively maintained.

 

XIN: Exactly. But even being associated with a foundation doesn’t necessarily mean a project is going to be actively maintained. Project communities can come and go. In truth, many open source projects—especially the smaller ones—depend on just one or two key contributors. As soon as those contributors move on, there’s not much that remains to stand behind that code. Also, even with medium-sized projects that are backed by successful foundations, you cannot be completely confident the code is going to be well maintained.

 

ALAN GATES: To take another spin on this, I’d say that over the past 20 years we have also witnessed a growing corporate presence. Even 15 years ago, when Hadoop was launched, there were companies that would get behind certain projects and offer various types of support. By then, lots of people were already using Linux.

 

Companies also started letting it be known what projects they were getting behind so they could promote that as part of their identity. Red Hat was one of the first to be really successful at that. Then some others started to get behind Linux as well. At this point, corporate involvement in open source projects has expanded far beyond that—both in terms of how they use open source and how they organize their development efforts.

 

XIN: In a way, open source has already won, to quote a friend who shall remain anonymous.

 

MCCUBBIN: I definitely think that’s the case. My own experience is that, with a startup I helped launch in 2012, we basically went entirely with open source for our framework. That represented a huge shift from anything I’d ever done before, which had all been pretty much DIY.

 

Is that a trend you still see? Or is there now a bit more pushback on open source, owing to maintainability issues and things like that?

 

GATES: There’s some pushback now. Some companies are starting to say, “We really want to be involved with open source, but what’s the right way to go about that now?” You see different companies trying out different license models, so it sure feels like they want to continue being involved. As Reynold says, open source has won. But the question is, What’s the right form of engagement?

 

WES MCKINNEY: The general trend is that corporations increasingly want the core platforms they depend upon to be entirely open source. But some unnerving security-related issues have come up over the years in places like the npm/JavaScript ecosystem with projects that were not supervised by foundations or maybe just didn’t have the benefit of large, centralized development teams. Increasingly, corporations have come around to deciding that, while they want all their core platform software to be open source, they are also willing to pay for the development of premium enterprise features, as well as for support and indemnification—or, at minimum, for priority-one and priority-two bug fixes. [Emphasis added]

 

Even going back to before 2010, there was a push away from proprietary products and vendor lock-in, and yet it took some time before people recognized just how important it was to work with open source software that was well-maintained and well-supported. In fact, the vendors that emerged as part of the Apache Hadoop ecosystem during that period—companies such as Cloudera and Hortonworks, in particular—were those created specifically to provide the peace of mind and level of security, as well as the support required for very large companies, financial institutions, and insurance companies, to have sufficient confidence to bet their businesses on open source software.

 

So, what I think has happened is that companies are still paying for software but in a different way than before. It used to be that they paid for software licenses, but now they pay for assurances against faults and potential loss. Which is to say indemnification has become a much bigger issue as organizations have started putting billions of dollars on the line. And yet, we still have problems like the Equifax hack that came about as a consequence of that organization’s failure to apply security patches that had been made readily available in the open source ecosystem. That has now become a classic example of what can happen whenever users fail to maintain their open source software properly. [Emphasis added]
 

MCCUBBIN: This is where commercial software offers an interesting contrast. Vendors such as Microsoft, for example, have forced users to make updates. Do you think the open source world ought to start moving more in that direction?

Read the Full Article »

About the Authors:

  • Reynold Xin, chief architect of Databricks, best known for his work on Apache Spark;
  • Alan Gates, co-founder of Hortonworks, who helped develop Hadoop, Pig, HCatalog, and Hive while at Yahoo Labs; and
  • Wes McKinney, founder of Ursa Labs, responsible for creating pandas (Python Data Analysis Library), and currently charged with leading the Apache Arrow effort.
  • Chris McCubbin, a senior applied scientist with Amazon Web Services, helps steer the discussion.