Various governance models exist for free and open source software projects. Most of those happen naturally, some of them are chosen… Which one is the best ? Is there a best ? How could we judge the best ? Like any ecosystem, I’d postulate that F/OSS project communities should have long-term survival as their main goal: the ability to continue operation as the same community over time, without fracture of fork.
The “benevolent dictator for life” model usually happens naturally. The project is often originally the brain child of a single, talented individual, who retains final say over everything that happens to their project. This usually works very well: the person is naturally respected in the community. It also naturally allows for opinionated design, and people who sign up to the project can’t ignore what they sign up for.
The main issue with that setup is that it’s not replicable, it can’t be dictated. It either happens naturally, or it will never happen. You don’t choose someone to become your dictator-for-life after the fact. Any attempt to do so would fail to get enough legitimacy and natural respect to make it last. The second issue with that setup is that it’s not durable. If the dictator stops being active in the community, their opinion is not as much respected anymore (especially by new contributors), which usually triggers a painful fork or governance model switch (that’s what happened in Gentoo). Even in the rare cases where the original dictator manages to retain interest and respect in the project, it’s inherently brittle: the “natural” dictator can’t really be replaced in case something bad happens. Succession is always dirty. So from a long-term survival standpoint, this model is not that great.
Aristocracy is used to solve the perceived drawbacks of the dictator-for-life model. Instead of focusing on one person, let’s have a group of people in control of the project, and let that group self-select successors in the wider pool of contributors. That’s the role of “committers” in certain projects, and it’s also how Apache project management committees (PMCs) usually work. It also works quite well, with self-selection usually ensuring that the members share enough common culture to reach consensus on most decisions.
The drawback here is obviously the self-selection bias. Aristocracies all fall after getting more and more disconnected from the people they control, and revolution happens. Open source aristocracies are no different: they fall after gradually growing disconnected from their project contributors base. Whenever contributors to an open source project feel like their leaders are no longer representative of the contributors or relevant to the present of the project, this disconnect happens. In mild cases, people just go contribute somewhere else, and in difficult cases this usually triggers a fork.
Direct democracy / Anarchy
The obvious way to solve that disconnect is to give the power directly to the contributors. Direct democracy projects give ultimate power to all the contributors. Anarchy projects let contributors do whatever they want. Debian is an interesting mix of the two: developers vote on general resolutions, but maintainers also have a lot of control on their packages.
While these models have a certain appeal, those projects usually have a hard time taking necessary decisions that affect the whole project, so they tend to linger not taking any critical decision. It’s also a model that is difficult to evolve: when you try to add new layers on top of it, they are never really accepted by the contributors base.
That leaves us with representative democracy. You regularly designate a small group of people and trust them to make the right decisions for the governance of the project. It can happen in cases where there was no natural dictator at the beginning of the project. It’s different from aristocracy in that they are chosen by the contributors base and regularly renewed — ensuring that they are always seen as a fair representation of the contributors to the project. It’s more efficient than direct democracy or anarchy in making clear and opinionated decisions.
Now it’s far from perfect. As Churchill famously said, it’s the worst form of government, except all those other forms that have been tried from time to time. It also only works if the elected people are seen as legitimate and representative, so it requires good participation levels in elections. So here is my plea: the OpenStack Technical Committee, which oversees the development of the OpenStack open source project as a whole, is being partially renewed this week. If you’re an OpenStack contributor, please vote: this will ensure that elected people have the legitimacy necessary for making the decisions that need to be made, and increase the health of the project.
Yesterday we entered the Icehouse development cycle Feature Freeze. But with the incredible growth of the OpenStack development community (508 different contributors over the last 30 days, including 101 new ones !), I hear a lot of questions about it. I’ve explained it on various forums in the past, but I figured it couldn’t hurt to write something a bit more definitive about it.
Those are valid questions. Why freeze features ? That sounds very anti-agile. Isn’t our test-centric development model supposed to protect us from regressions anyway ? Let’s start with what feature freeze is not. Feature freeze should only affect the integrated OpenStack release. If you don’t release (i.e. if you don’t special-case certain moments in the development), then feature freezing makes little sense. It’s also not a way to punish people who failed to meet a deadline. There are multiple reasons that a given feature will miss a deadline, and most of those are not the fault of the original author of the feature. We do time-based releases, so some features and some developers will necessarily be caught on the wrong side of the fence at some point and need to wait for the next boat. It’s an artifact of open innovation projects.
Feature freeze (also known as “FF”) is, obviously, about stopping adding new features. You may think of it as artificially blocking your progress, but this has a different effect on other people:
- As was evidenced by the Icehouse cycle, good code reviewers are a scarce resource. The first effect of feature freeze is that it limits the quantity of code reviews and make them all about bugfixes. This lets reviewers concentrate on getting as many bugfixes in as possible before the “release”. It also helps developers spend time on bugfixes. As long as they can work on features, their natural inclination (or their employer orders) might conflict with the project interest at this time in the cycle, which is to make that point in time we call the “release” as bug-free as possible.
- From a QA perspective, stopping the addition of features means you can spend useful time testing “in real life” how OpenStack behaves. There is only so much our automated testing will catch. And it’s highly frustrating to spend time testing software that constantly changes under you.
- QA is not the only group that needs to catch up. For the documentation team, the I18N team, feature freeze is essential. It’s difficult to write documentation if you don’t know what will be in the end product. It’s frustrating to translate strings that are removed or changed the next day.
- And then you have all the downstream consumers of the release that can use time to prepare it. Packagers need software that doesn’t constantly change and add dependencies, so that they can prepare packages for OpenStack projects that are released as close to our release date as possible. The marketing team needs time to look into what was produced over the cycle and arrange it in key messages to communicate to the outside world at release time.
- Finally, for release management, feature freeze is a tool to reduce risk. The end goal is to avoid introducing an embarassing regression just before release. By gradually limiting the impact of what we accept in the release branch (using feature freeze, but also using the RC dance that comes next), we try our best to prevent that.
For all these groups, it’s critical that we stop adding features, changing behavior, adding new configuration options, or changing translatable strings as early as possible. Of course, it’s a trade-off. There might be things that are essential to the success of the release, or things that are obviously risk-limited. That’s why we have an exception process: the Feature Freeze exceptions (“FFEs”).
Feature freeze exceptions may be granted by the PTL (with the friendly but strong advice from the release management team). The idea is to weigh the raw benefit of having that feature in the release, against the complexity of the code that is being brought in, its risk of causing a regression, and how deep we are in feature freeze already. A self-contained change that is ready to merge a few days after feature freeze is a lot more likely to get an exception than a refactoring of a key layer that still needs some significant work to land. It also depends on how many exceptions were already granted on that project, because at some point adding anything more just causes too much disruption.
It’s a difficult call to make, and the release management team is here to help the PTLs make it. If your feature gets denied, don’t take it personally. As you saw there are a large number of factors involved. Our common goal is to raise the quality of the end release, and every feature freeze exception we grant is a step away from that. We just can’t take that many steps back and still guaranteeing we’ll win the race.
Open innovation vs. proprietary innovation
For companies, there are two ways to develop open source projects. The first one is to keep design and innovation inside your corporate borders, and only accept peripheral contributions. In that case you produce open source software, but everything else resembles traditional software development: you set the goals and roadmap for your product, and organize your development activity to meet those goals, using Agile or waterfall methodologies.
The second one is what we call open innovation: build a common and level playing field for contributions from anywhere, under the auspices of an independent body (foundation or other). In that case you don’t really have a roadmap: what ends up in the software is what the contributors manage to push through a maintainers trust tree (think: the Linux kernel) or a drastic code review / CI gate (think: OpenStack). Products or services are generally built on top of those projects and let the various participants differentiate on top of the common platform.
Now, while I heavily prefer the second option (which I find much closer to the ideals of free software), I recognize that both options are valid and both are open source. The first one ends up attracting far less contributions, but it works quite well for niche, specialized products that require some specific know-how and where focused product design gives you an edge. But the second works better to reach universal adoption and total world domination.
A tragedy of the commons
The dilemma of open innovation is that it’s a natural tragedy of the commons. You need strategic contributions to keep the project afloat: people working on project infrastructure, QA, security response, documentation, bugfixing, release management which do not directly contribute to your employer baseline as much as a tactical contribution (like a driver to interface with your hardware) would. Some companies contribute those necessary resources, while some others just get the benefits of monetizing products or services on top of the platform without contributing their fair share. The risk, of course, is that the strategic contributor gets tired of paying for the free rider.
Open innovation is a living ecosystem, a society. Like all societies, it has its parasites, its defectors, those which don’t live by the rules. And like all societies, it actually needs a certain amount of defectors, as it makes the society stronger and more able to evolve. The trick is to keep the amount of parasites down to a tolerable level. In our world, this is usually done by increasing the difficulty or the cost of defecting, while reducing the drawbacks or the cost of cooperating.
Keeping our society healthy
In his book Liars and Outliers, Bruce Schneier details the various knobs a society can play with to adjust the number of defectors. There are moral pressures, reputational pressures, institutional pressures and security pressures. In open innovation projects, moral pressures and security pressures don’t work that well, so we usually use a combination of institutional pressures (licensing, trademark rules) and reputational pressures (praising contributors, shaming free riders) to keep defectors to an acceptable level.
Those are challenges that are fully understood and regularly applied in the Linux kernel project. For OpenStack, the meteoritic growth of the project (and the expertise land-grab that came with it) protected us from the effects of the open innovation dilemma so far. But the Technical Committee shall keep an eye on this dilemma and be ready to adjust the knobs if it starts becoming more of a problem. Because at some point, it will.
Over the last 3 years, the technical governance of the OpenStack open source project evolved a lot, and most recently last Tuesday. As an elected member of that governance body since April 2011, I witnessed that evolution from within and helped in drafting the various models over time. Now seems like a good time to look back in history, and clear a few misconceptions about the OpenStack project governance along the way.
The project was originally created by Rackspace in July 2010 and seeded with code from NASA (Nova) and Rackspace (Swift). At that point an initial project governance was set up. There was an Advisory Board (which was never really created), the OpenStack Architecture Board, and technical committees for each subproject, each lead by a Technical Lead. The OpenStack Architecture Board had 5 members appointed by Rackspace and 4 elected by the community, with 1-year to 3-year (!) terms. The technical leads for the subprojects were appointed by Rackspace.
By the end of the year 2010 the Architecture Board was renamed Project Oversight Committee (POC), but its structure didn’t change. While it left room for community input, the POC was rightfully seen as fully controlled by Rackspace, which was a blocker to deeper involvement for a lot of the big players in the industry.
It was a danger for the open source project as well, as the number of contributors external to Rackspace grew. As countless examples prove, when the leadership of an open source project is not seen as representative of its contributors, you face the risk of a revolt, a fork of the code and seeing your contributors leave for a more meritocratic and representative alternative.
In March 2011, a significant change was introduced to address this perceived risk. Technical leads for the 3 projects (Nova, Swift, and Glance at that point) would from now on be directly elected by their contributors and called Project Technical Leads (PTLs). The POC was replaced by the Project Policy Board (PPB), which had 4 seats appointed by Rackspace, 3 seats for the above PTLs, and 5 seats directly-elected by all the contributors of the project. By spring 2012 we grew to 6 projects and therefore the PPB had 15 members.
This was definitely an improvement, but it was not perfect. Most importantly, the governance model itself was still owned by Rackspace, which could potentially change it and displace the PPB if it was ever unhappy with it. This concern was still preventing OpenStack from reaching the next adoption step. In October 2011, Rackspace therefore announced that they would set up an independent Foundation. By the summer of 2012 that move was completed and Rackspace had transfered the control over the governance of the OpenStack project to the OpenStack Foundation.
At that point the governance was split into two bodies. The first one is the Board of Directors for the Foundation itself, which is responsible for promoting OpenStack, protecting its trademark, and deciding where to best spend the Foundation’s sponsors money to empower future development of OpenStack.
The second body was the successor to the PPB, the entity that would govern the open source project itself. A critical piece in the transition was the need to preserve and improve the independence of the technical meritocracy. The bylaws of the Foundation therefore instituted the Technical Committee, a successor for the PPB that would be self-governed, and would no longer have appointed members (or any pay-to-play members). The Technical Committee would be completely elected by the active technical contributors: a seat for each elected PTL, plus 5 directly-elected seats.
The TC started out in September 2012 as an 11-member committee, but with the addition of 3 new projects (and the creation of a special seat for Oslo), it grew to 15 members in April 2013, with the perspective to grow to 18 members in Fall 2013 if all projects applying for incubation recently get finally accepted. With the introduction of the “integrated” project concept (separate from the “core” project concept), we faced the addition of even more projects in the future and committee bloat would inevitably ensue. That created a potential for resistance to the addition of “small” projects or the splitting of existing projects (which make sense technically but should not be worth adding yet another TC seat).
Another issue was the ever-increasing representation of “vertical” functions (project-specific PTLs elected by each project contributors) vs. general people elected by all contributors. In the original PPB mix, there were 3 “vertical” seats for 5 general seats, which was a nice mix to get specific expertise but overall having a cross-project view. With the growth in the number of projects, in the current TC we had 10 “vertical” seats for 5 general seats. Time was ripe for a reboot.
Various models were considered and discussed, and while everyone agreed on the need to change, no model was unanimously seen as perfect. In the end, simplicity won and we picked a model with 13 directly-elected members, which will be put in place at the Fall 2013 elections.
Power to the active contributors
This new model is a direct, representative model, where if you recently authored a change for an OpenStack project, you get one vote, and a chance every 6 months to choose new people to represent you. This model is pretty flexible and should allow for further growth of the project.
Few open source projects use such a direct governance model. In Apache projects for example (often cited as a model of openness and meritocracy), the oversight committee equivalent to OpenStack’s TC would be the PMC. In most cases, PMC membership is self-sustaining: existing PMC members ultimately decide, through discussions and votes on the private PMC list, who the new PMC members should be. In contrast, in OpenStack the recently-active contributors end up being in direct control of who their leaders are, and can replace the Technical Committee members if they feel like they are not relevant or representing them anymore. Oh, and the TC doesn’t use a private list: all our meetings are public and our discussions are archived.
As far as open source projects governance models go, this is as open, meritocratic, transparent and direct as it gets.
The beginning of a new release cycle is as good as any moment to question why we actually go through the hassle of producing OpenStack releases. Twice per year, on a precise date we announce 6 months in advance, we bless and publish source code tarballs of the various integrated projects in OpenStack. Every week we have a meeting that tracks our progress toward this common goal. Why ?
Releases vs. Continuous deployment
The question is particularly valid if you take into account the type of software that we produce. We don’t really expect cloud infrastructure providers to religiously download our source code tarballs every 6 months and run from that. For the largest installations, running much closer to the master branch and continuously deploy the latest changes is a really sound strategy. We invested a lot of effort in our gating systems and QA automated testing to make sure the master branch is always runnable. We’ll discuss at the OpenStack Summit next week how to improve CD support in OpenStack. We backport bugfixes to the stable branches post-release. So why do we continue to single out a few commits and publish them as “the release” ?
The need for cycles
The value is not really in releases. It is in release cycles.
Producing OpenStack involves the work of a large number of people. While most of those people are paid to participate in OpenStack development, as far as the OpenStack project goes, we don’t manage them. We can’t ask them to work on a specific area, or to respect a given deadline, or to spend that extra hour to finalize something. The main trick we use to align everyone and make us all part of the same community is to have a cycle. We have regular milestones that we ask contributors to target their features to. We have a feature freeze to encourage people to switch their mindset to bugfixing. We have weekly meetings to track progress, communicate where we are and motivate us to go that extra mile. The common rhythm is what makes us all play in the same team. The “release” itself is just the natural conclusion of that common effort.
A reference point in time
Singling out a point in time has a number of other benefits. It’s easier to work on documentation if you group your features into a coherent set (we actually considered shortening our cycles in the past, and the main blocker was our capacity to produce good documentation often enough). It’s easier to communicate about OpenStack progress and new features if you do it periodically rather than continuously. It’s easier to have Design Summits every 6 months if you create a common brainstorm / implementation / integration cycle. The releases also serve as reference points for API deprecation rules, for stable release maintenance, for security backports.
If you’re purely focused on the software consumption part, it’s easy to underestimate the value of release cycles. They actually are one of the main reasons for the pace of development and success of OpenStack so far.
The path forward
We need release cycles… do we need release deliverables ? Do we actually need to bless and publish a set of source code tarballs ? My personal view on that is: if there is no additional cost in producing releases, why not continue to do them ? With the release tooling we have today, blessing and publishing a few tarballs is as simple as pushing a tag, running a script and sending an email. And I like how this formally concludes the development cycle to start the stable maintenance period.
But what about Continuous Deployment ? Well, the fact that we produce releases shouldn’t at all affect our ability to continuously deploy OpenStack. The master branch should always be in good shape, and we definitely should have the necessary features in place to fully support CD. We can have both. So we should have both.
The OpenStack Grizzly release of yesterday officially closes the Grizzly development cycle. But while I try to celebrate and relax, I can’t help from feeling worried and depressed on the hours following the release, as we discover bugs that we could have (should have ?) caught before release. It’s a kind of postpartum depression for release managers; please consider this post as part of my therapy.
We’d naturally like to release when the software is “ready”, “good”, or “bug-free”. Reality is, with software of the complexity of OpenStack, onto which we constantly add new features, there will always be bugs. So, rather than releasing when the software is bug-free, we “release” when waiting more would not really change the quality of the result. We release when it’s time.
In OpenStack, we invest a lot in automated testing, and each proposed commit goes through an extensive set of unit and integration tests. But with so many combinations of deployment options, there are still dark corners that will only be explored by users as they apply the new code to their specific use case. We encourage users to try new code before release, by publishing and making noise about milestones, release candidates… But there will always be a significant number of users who will not try new code until the point in time we call “release”. So there will always be significant bugs that are discovered (and fixed) after release day.
The best point in time
What we need to do is pick the right moment to “release”: when all known release-critical issues are fixed. When the benefits of waiting more are not worth the drawbacks of distracting developers from working on the next development cycle, or of abandoning the benefits of a predictable time-based common release.
That’s the role of the Release Candidates that we produce in the weeks before the release day. When we fixed all known release-critical bugs, we create an RC. If we find new ones before the release day, we fix them and regenerate a new release candidate. On release day, we consider the current release candidates as “final” and publish them.
The trick, then, is to pick the right length for this feature-frozen period leading to release, one that gives enough time for each of the projects in OpenStack to reach this the first release candidate (meaning, “all known release-critical bugs fixed”), and publish this RC1 to early testers. For Grizzly, it looked like this:
This graph shows the number of release-critical bugs in various projects over time. We can see that the length of the pre-release period is about right: waiting more would not have resulted in a lot more bugs to be fixed. We basically needed to release to get more users to test and report the next bugs.
The Grizzly is still alive
The other thing we need to have is a process to continue to fix bugs after the “release”. We document the most obvious regressions in the constantly-updated Release Notes. And we handle the Grizzly bugs using the stable release update process.
After release, we maintain a branch where important bugfixes are backported and from which we’ll publish point releases. This stable/grizzly branch is maintained by the OpenStack stable maintenance team. If you see a bugfix that should definitely be backported, you can tag the corresponding bug in Launchpad with the grizzly-backport-potential tag to bring it to the team’s attention. For more information on the stable branches, I invite you to read this wiki page.
Being pumped up again
The post-release depression usually lasts a few days, until I realize that not so many bugs were reported. The quality of the new release is actually always an order of magnitude better than the previous releases, due to 6-month worth of improvements in our amazing continuous integration system ! We actually did an incredible job, and it will only get better !
The final stage of recovery is when our fantastic community gets all together at the OpenStack Summit. 4 days to witness and celebrate our success. 4 days to recharge the motivation batteries, brainstorm and discuss what we’ll do over the next 6 months. We are living awesome times. See you there.
Back from a (almost) entirely-offline week vacation, a lot of news were waiting for me. A full book was written. OpenStack projects graduated. An Ubuntu rolling release model was considered. But what grabbed my attention was the announcement of UDS moving to a virtual event. And every 3 months. And over two days. And next week.
As someone who attended all UDSes (but one) since Prague in May 2008, as a Canonical employee then as an upstream developer, that was quite a shock. We all have fond memories and anecdotes of stuff that happened during those Ubuntu developer summits.
What those summits do
For those who never attended one, UDS (and the OpenStack Design Summits that were modeled after them) achieve a lot of goals for a community of open source developers:
- Celebrate recent release, motivate all your developer community for the next 6 months
- Brainstorm early ideas on complex topics, identify key stakeholders to include in further design discussion
- Present an implementation plan for a proposed feature and get feedback from the rest of the community before starting to work on it
- Reduce duplication of effort by getting everyone working on the same type of issues in the same room and around the same beers for a few days
- Meet in informal settings people you usually only interact with online, to get to know them and reduce friction that can build up after too many heated threads
This all sounds very valuable. So why did Canonical decide to suppress UDSes as we knew them, while they were arguably part of their successful community development model ?
Who killed UDS
The reason is that UDS is a very costly event, and it was becoming more and more useless. A lot of Ubuntu development happens within Canonical those days, and UDS sessions gradually shifted from being brainstorming sessions between equal community members to being a formal communication of upcoming features/plans to gather immediate feedback (point  above). There were not so many brainstorming design sessions anymore (point  above, very difficult to do in a virtual setting), with design happening more and more behind Canonical curtains. There is less need to reduce duplication of effort (point  above), with less non-Canonical people starting to implement new things.
Therefore it makes sense to replace it with a less-costly, purely-virtual communication exercise that still perfectly fills point , with the added benefits of running it more often (updating everyone else on status more often), and improving accessibility for remote participants. If you add to the mix a move to rolling releases, it almost makes perfect sense. The problem is, they also get rid of points  and . This will result in a even less motivated developer community, with more tension between Canonical employees and non-Canonical community members.
I’m not convinced that’s the right move. I for one will certainly regret them. But I think I understand the move in light of Canonical’s recent strategy.
What about OpenStack Design Summits ?
Some people have been asking me if OpenStack should move to a similar model. My answer is definitely not.
When Rick Clark imported the UDS model from Ubuntu to OpenStack, it was to fulfill one of the 4 Opens we pledged: Open Design. In OpenStack Design Summits, we openly debate how features should be designed, and empower the developers in the room to make those design decisions. Point  above is therefore essential. In OpenStack we also have a lot of different development groups working in parallel, and making sure we don’t duplicate effort is key to limit friction and make the best use of our resources. So we can’t just pass on point . With more than 200 different developers authoring changes every month, the OpenStack development community is way past Dunbar’s number. Thread after thread, some resentment can build up over time between opposed developers. Get them to informally talk in person over a coffee or a beer, and most issues will be settled. Point  therefore lets us keep a healthy developer community. And finally, with about 20k changes committed per year, OpenStack developers are pretty busy. Having a week to celebrate and recharge motivation batteries every 6 months doesn’t sound like a bad thing. So we’d like to keep point .
So for OpenStack it definitely makes sense to keep our Design Summits the way they are. Running them as a track within the OpenStack Summit allows us to fund them, since there is so much momentum around OpenStack and so many people interested in attending those. We need to keep improving the remote participation options to include developers that unfortunately cannot join us. We need to keep doing it in different locations over the world to foster local participation. But meeting in person every 6 months is an integral part of our success, and we’ll keep doing it.
Next stop is in Portland, from April 15 to April 18. Join us !