In collaboration with researchers from academia, industry, and the community, GitHub designed a survey to gather high quality and novel data on open source software development practices and communities.
We collected responses from 5,500 randomly sampled respondents sourced from over 3,800 open source repositories on GitHub.com, and over 500 responses from a non-random sample of communities that work on other platforms.
The results are an open data set about the attitudes, experiences, and backgrounds of those who use, build, and maintain open source software.
With over 50 questions, the 2017 survey covers a wide range of topics. Below, we highlight some of the most actionable and important insights about the community.
The data described below covers only the random sample sourced from open source repositories on GitHub.com. Percentages are rounded and may not always sum to 100.
- Documentation is highly valued, frequently overlooked, and a means for establishing inclusive and accessible communities.
- Negative interactions are infrequent but highly visible, with consequences for project activity.
- Open source is used by the whole world, but its contributors don't yet reflect its broad audience.
- Using and contributing to open source often happens on the job.
- Open source is the default when choosing software.
Documentation is highly valued, but often overlooked
- Incomplete or outdated documentation is a pervasive problem, observed by 93% of respondents, yet 60% of contributors say they rarely or never contribute to documentation. When you run into documentation issues, help a maintainer out and open a pull request that improves them.
- Documentation helps create inclusive communities. Documentation that clearly explains a project's processes, such as contributing guides and codes of conduct, is valued more by groups that are underrepresented in open source, like women.
- Nearly a quarter of the open source community reads and writes English less than ‘very well.’ When communicating on a project, use clear and accessible language for people who didn’t grow up speaking English, or read less-than-fluently.
Negative interactions are infrequent but highly visible, with consequences for project activity
Open source brings together people from all over the world, which can lead to conflicts. While serious incidents are rare, the public nature of open source makes negative interactions highly visible.
As a result, discouraging effects can extend far beyond the individuals directly involved. Setting positive expectations of behavior, and addressing negative incidents quickly, can improve contributor retention and collaboration.
- 18% of respondents have personally experienced a negative interaction with another user in open source, but 50% have witnessed one between other people. It's not possible to know from this data whether the gap is due to people who experienced such interactions leaving open source, or broad visibility of incidents. Either way, negative interactions impact many more than the immediate participants, so address problematic behavior swiftly, politely, and publicly, to send a signal to potential contributors that such behavior isn’t typical or tolerated.
- By far, the most frequently encountered bad behavior is rudeness (45% witnessed, 16% experienced), followed by name calling (20% witnessed, 5% experienced) and stereotyping (11% witnessed, 3% experienced). More serious incidents, such as sexual advances, stalking, or doxxing are each encountered by less than 5% of respondents and experienced by less than 2% (but cumulatively witnessed by 14%, and experienced by 3%).
- Negative experiences have real consequences for project health. 21% of people who experienced or witnessed a negative behavior said they stopped contributing to a project because of it, and 8% started working in private channels more often.
- Tooling that allows people to address problematic behavior directly is the most effective way of addressing harassing behavior. Blocking a user was reported to be more effective than enforcement from third parties like maintainers, ISPs/hosting services, or even legal resources. Give people tools to protect themselves.
Open source contributors don't yet reflect its broad audience of users
Open source provides the basis for technology that serves the entire world. In some ways, the diversity of the user base is reflected or even exceeded among open source contributors, but in other ways there are still huge gaps in representation.
Improving project accessibility could help unlock many more contributions, ensure that technology serves a comprehensive set of use cases and needs, and contribute to better representation in technology jobs.
- The gender imbalance in open source remains profound: 95% of respondents are men; just 3% are women and 1% are non-binary. Women are about as likely as men (68% vs 73%) to say they are very interested in making future contributions, but less likely to say they are very likely to actually do so (45% vs 61%).
- Along other dimensions, representation is stronger: 1% of respondents identify as transgender (including 9% of women in open source), and 7% identify as lesbian, gay, bisexual, asexual, or another minority sexual orientation. 26% are immigrants (from and to anywhere in the world) and 16% are members of ethnic or national minorities in the country where they currently live.
- Women are more likely than men to encounter language or content that makes them feel unwelcome (25% vs 15%) as well as stereotyping (12% vs 2%) and unsolicited sexual advances (6% vs 3%). Unsurprisingly, women are also more likely than men to seek out help directly (29% vs 13%) from people they already know well (22% vs 6%), rather than ask for help from strangers in a public forum or channel. Collaboration between strangers is one of open source's most remarkable aspects: strive to build a community where everyone feels welcome to participate.
- Half of contributors say that their open source work was somewhat or very important in getting their current role. Open source work helps people build their professional reputation. Improving contributor representation can help create a more representative tech sector overall.
Using and contributing to open source often happens on the job
Open source is widely used in professional contexts. The majority of employed respondents use and contribute to open source at work, and many people cite their open source work as important to getting their current job.
However, a significant number say that their employers’ official policies and IP agreements are unclear regarding what is allowed, and under what terms. Businesses play a key role in open source by subsidizing open source work from employees, so creating and communicating clear policies can encourage more frequent, regular contributions.
- 70% of respondents are employed full- or part-time, and 85% of those contribute in some way to software development (e.g. developers, designers, other roles in the software industry) frequently or occasionally in their main job.
- Virtually all (94%) of those who are employed use open source at least sometimes in their professional work (81% use it frequently), and 65% of those who contribute back do so as part of their work duties.
- Most report that their employers accept or encourage use of open source applications (82%) and dependencies in their code base (84%), but some said their employers’ policies on use of open source are unclear (applications: 13%, dependencies: 11%).
- Nearly half say their employer’s IP policy allows them to contribute to open source without permission (47%), and another 12% can do so with permission. However, 28% say their IP policy is unclear, and another 9% are not sure about how their IP agreement treats open source contributions.
Open source is the default when choosing software
Security matters when choosing new software, and most users believe that open source is more secure, on average, than proprietary software. When it comes to stability or user experience, users are less convinced of the superiority of open source. Even so, most are committed to open source, and always seek out open source options.
- Open source’s comparative advantage is in security: security is among the the most important features when using any kind of software (86% extremely or very important). Security is the only dimension we asked about where a majority of users believe that open source software is usually better than proprietary software (58%).
- Users also care about stability and user experience (88% and 75% extremely or very important, respectively) when it comes to choosing software, but on these dimensions fewer were convinced of open source’s superiority: only 36% said user experience tends to be better, and 30% said that open source software is generally more stable than proprietary options.
- Despite these tradeoffs, users still prefer open source. 72% say that they always seek out open source options when evaluating new tools.
The Open Source Survey is an open data project. Download the data and explore it yourself.
The data and questionnaire are released under CC0-1.0. See the repository for important information about privacy, citation, and trademarks.
About the survey
In today's digital world, open source software powers nearly all of our modern society and economy. Understanding the people who build, maintain, and use these projects is important to anyone concerned about the sustainability of open source, and the critical network of services and technologies that depend on it.
This survey was designed to provide high quality data on a range of topics that improve understanding of open source communities, and inform future research:
- Provide high-quality data that helps inform decisions about open source work, tooling, and community.
- Help users, contributors, maintainers, and other stakeholders understand each other in terms of motivations, experiences, and needs.
- Contribute to greater public knowledge and understanding of a uniquely organized system of public goods provision on which the modern global economy relies.
The survey was designed in collaboration with researchers and stakeholders from academia, industry, and the open source community, according to design principles that emphasize scientific rigor, respondents’ privacy, and open source/data values. We focused on selecting topics that provide actionable insights and open new avenues of research, including:
- Behaviors and preferences around consumption and contribution
- Attitudes and practices around privacy in online spaces
- Seeking and providing technical help
- Negative experiences, and their consequences
- Employer policies on using and contributing to open source
- Demographics of open source participants, and their history with technology
Where possible, we adapted questions from other studies, to allow for comparisons with other populations.
Respondents were sampled randomly from traffic and qualifying activity to licensed open source repositories on GitHub.com and invited to complete the survey through a dialog box. A smaller sample was recruited from open source communities sourced outside of GitHub, through invitations posted to mailing lists or similar communication points.
The survey was fielded in English, Spanish, Chinese, Japanese, and Russian. Topics, drafts of the questionnaire, translations, and design and sampling plans were all posted in a public GitHub repository for public visibility and feedback.
More detailed information on methodology is included in the data download.
Partners & Acknowledgments
This survey was designed by GitHub with valuable input from the research and open source communities. We especially thank: Anna Filippova (Carnegie Mellon University), Andrea Forte (Drexel University), Edward Galvez (Wikimedia Foundation), Rebecca Weiss (Mozilla), and Laura Dabbish (Carnegie Mellon University) for conversations, research questions, and prior art that informed the questionnaire design; the Open Source Initiative for offsite sampling recruitment, the many members of the community who assisted with translations and suggestions for improving questions; and everyone who participated in the survey.