In this series, I’ll discuss how to strengthen the privilege escalation model for OpenStack Compute (Nova). Due to the way networking, virtualization and volume management work, some Nova nodes need to be able to run some commands as root. To reduce the effects of a potential compromise (attacker being able to run arbitrary code as the Nova user), we want to limit the commands that Nova can run as root on a given node to the strict necessary. Today we’ll explain how the current model works, its limitations, and the groundwork already implemented during the Diablo cycle to improve that.
Current model: sudo and sudoers
Currently, in a typical Nova deployment, the nodes run under an account with limited rights (usually called “nova”). When Nova needs to run a command as root, it prepends “sudo” to the command. The nova packages of your distribution of choice are supposed to ship a sudoers file that contains all the commands that nova is allowed to run as root without providing a password. This is a privilege escalation security model which is pretty well-known and easy to audit.
Limitations of the current model
That said, in the context of Nova, this model is very limited. The sudoers file does not allow to efficiently filter arguments, so you can basically pass any argument to the allowed command… and some of the commands that nova wants to use are rather open-ended. As an example, the current nova_sudoers file contains commands like chown, kill, dd or tee, which are more than enough to compromise a target system completely.
There are a couple other limitations. The sudoers file belongs to the distributions packaging, so it’s difficult to keep it in sync with the rest of Nova code when someone wants to add a privileged command. Last but not least, the same nova_sudoers file is used for any type of Nova node. A Nova API server, which does not need to run any command as root, is still allowed to run all the commands that a compute node requires, for example. Those other limitations could be fixed while still using sudo and sudoers files, but the first limitation would remain. Can we do better ?
Substitute a wrapper to sudo
To be able to propose alternative privilege escalation security models, we first needed to be able to change all the “sudo” calls in the code and make them potentially use something else. That’s what I worked on late during the Diablo timeframe: creating a run_as_root option in nova.utils.execute that would use a configurable root_helper command (by default, “sudo”), and force all the existing calls to go through that (rather than blindly calling “sudo” themselves).
Thanks to the default root_helper, everything still behaves the same, but now we have the possibility to use something else, if we can be smarter than sudoers files. Like call a wrapper that will do advanced filtering of the command that nova wants to use. In part 2 of this series, we’ll look into a proposed, alternative Python-based root_helper and open discussion on its security model.
Last week saw the delivery of the first milestone of the Essex development cycle for Keystone, Glance, Horizon and Nova. This early milestone collected about two months of post-Diablo work… but it’s not as busy in new features as most would think, since a big part of those last two months was spent releasing OpenStack 2011.3 and brainstorming Essex features.
Keystone delivered their first milestone as a core project, with a few new features like support for additional credentials, service registration and using certificate-based SSL client authentication to authenticate services. It should be easier to upgrade from now on, with support for database migrations.
Glance developers were busy preparing significant changes that will land in the next milestone. Several bugfixes and a few features made it to essex-1 though, including the long-awaited SSL client connections. It also moved to UUID image identifiers.
The Nova essex-1 effort was mostly spent on bugfixing, with 129 bugs fixed. New features include a new XenAPI SM volume driver, DHCP support in the Quantum network manager, and optional deferred deletion of instances. Under the hood, the volume code was significantly cleaned up and XML templates were added to simplify serialization in extensions.
Essex-1 was also the first official OpenStack milestone for Horizon, also known as the Dashboard. New features include a instance details page, support for managing Nova volumes and a new extensible modular architecture. The rest of the effort was spent on catching up with the best of core projects in internationalization, developer documentation, and QA (frontend testing and JS unit tests).
Now, keep your seatbelt fastened, as we are one month away from essex-2, where lots of new development work is expected to land !
The 200 open seats for the Essex Design Summit were all registered in less than 9 days ! If you missed the boat, you can still register on the waiting list at http://summit.openstack.org.
For the last seats we need to give priority to existing OpenStack developers and upstream/downstream community members, so the waiting list will be reviewed manually. You will receive an email if you get cleared and get one of the very last seats for the summit.
Sometime next week, the website should allow registered attendees (as well as attendees on the waiting list) to propose sessions for the summit, so stay tuned !
August was very busy for OpenStack Nova and Glance developers, and the culmination of those efforts is the delivery of the final feature milestone of the Diablo development cycle: diablo-4.
Glance gained final integration with the Keystone common authentication system, support for sharing images between groups of tenants, a new notification system and i18n. Twelve feature blueprints were completed in Nova, including final Keystone integration, the long-awaited capacity to boot from volumes, a configuration drive to pass information to instances, integration points for Quantum, KVM block migration support, as well as several improvements to the OpenStack API.
Diablo-4 is mostly feature-complete: a few blueprints for standalone features were granted an exception and will land post-diablo-4, like volume types or virtual storage arrays in Nova, or like SSL support in Glance.
Now we race towards the release branch point (September 8th) which is when the Diablo release branch will start to diverge from a newly-open Essex development branch. The focus is on testing, bug fixing and consistency… up until September 22, the Diablo release day.
How to control what gets into your open source project code ? The classic model, inherited from pre-DVCS days, is to have a set of “committers” that are trusted with direct access while the vast majority of project “contributors” must kindly ask them to sponsor their patches. You can find that model in a lot of projects, including most Linux distributions. This model doesn’t scale that well — even trusted individuals are error-prone, nobody should escape peer review. But the main issue is the binary nature of the committer power: it divides your community (us vs. them) and does not really encourage contribution.
The solution to this is to implement a gated trunk with a code review system like GitHub pull requests or Launchpad branch merge proposals. Your “committers” become “core developers” that have a casting vote on whether the proposal should be merged. Everyone goes through the peer review process, and the peer review process is open for everyone: your “contributors” become “developers” that can comment too. You reduce the risk of human error and the community is much healthier, but some issues remain: your core developers can still (wittingly or unwittingly) evade peer review, and the final merge process is human and error-prone.
The solution is to add more automation, and not trust humans with direct repository access anymore. An “automated gated trunk” bot can watch for reviews and when a set of pre-defined rules are met (human approvals, testsuites passed, etc.), trigger the trunk merge automatically. This removes human error from the process, and effectively turns your “core developers” into “reviewers”. This last aspect makes for a very healthy development community: there is no elite group anymore, just a developer subgroup with additional review duties.
In OpenStack, we used Tarmac in conjunction with Launchpad/bzr code review in our first attempt to implement this. As we considered migration to git, the lack of support for tracking formal approvals in GitHub code review prevented the implementation of a complex automated gated trunk on top of GitHub, so we deployed Gerrit. I was a bit resisting the addition of a new tool to our toolset mix, but the incredible Monty Taylor and Jim Blair did a great integration job, and I realize now that this gives us a lot more flexibility and room for future evolution. For example I like that some tests can be run when the change is proposed, rather than only after the change is approved (which results in superfluous review roundtrips).
At the end of the day, gated trunk automation helps in having a welcoming, non-elitist (and lazy) developer community. I wish more projects, especially distributions, would adopt it.
No rest for the OpenStack developers, today saw the release of the July development efforts for Nova and Glance: the Diablo-3 milestone.
With a bit more than 100 trunk commits over the month, Nova gained support for multiple NICs, FlatDHCP network mode now support a high-availability option (read more about it here), instances can be migrated and system usage notifications were added to the notification framework. Network code was also refactored in order to facilitate integration with the new networking projects, and countless fixes were made in OpenStack API 1.1 support.
We have one more milestone left (diablo-4) before the final 2011.3 release… still a lot to do !
About a month ago I commented on the features delivered in the diablo-1 milestone. Last week we released the diablo-2 milestone for your testing and feature evaluation pleasure.
Most of the changes to Glance were made under the hood. In particular the new WSGI code from Nova was ported to Glance, and images collections can now be sorted by a subset of the image model attributes. Most of the groundwork to support Keystone authentication was done, but that should only be available in diablo-3 !
Those same initial Keystone integration steps were also done for Nova, along with plenty of other features. We now support distributed scheduling for complex deployments, together with a new instance referencing model. Was also added during this timeframe: support for floating IPs (in OpenStack API), a basic mechanism for pushing notifications out to interested parties, global firewall rules, and an instance type extra specs table that can be used in a capabilities-aware scheduler. More invisible to the user, we completed efforts to standardize error codes and refactored the OpenStack API serialization mechanism.
And there is plenty more coming up in diablo-3… scheduled for release on July 28th.