In order to meet with the performance needs of our customers, while maintaining a stable infrastructure, CloudSigma needs a way to control the resource usage of each virtual machine. We cannot allow for people to use more resources than they have specified because that will affect every customer’s performance. The first step in achieving this was to implement a smart way to allocate the virtual machines to specific bare metal i.e. which machine should run on which hypervisor/compute node. In this way we attempted to prevent over-commit scenarios which might lead to performance issues or even host crashes. That helped us to a certain point, however, without a way to limit usage on the hypervisor machines themselves overload problems persisted.
A virtual machine activity is a process like any other and, in the Linux world, resource management for processes is currently done via the amazing implementation and work of CGroups. Libvirt, as an API interface to qemu and kvm, provides some decent functionality to manage virtual machine resources (via CGroups, of course). We have utilized features like vcpu-pinning, which helps virtual machine memory allocation and boosts performance, but libvirt has a very black box approach to CGroups – you can tweak some parameters directly, but if you want any fine grained control over which process uses what resources, you need to roll up your sleeves and go do it yourself. libvirt is getting better, yet it is not currently enough for us to manage our cloud effectively.
A required feature, which is not provided by libvirt, was re-scaling of CGroups. Imagine two sets of virtual machines, one of which is performance dependent (read “client virtual machines”). The virtual machines there need all the memory and cpu cycles promised to them. In the second group, we have a pool of special virtual machines (read “internal”) that we call “grey”, which just need to work in the background and do not really care how much resources they have – as long as they are running, they are happy. They are started when there are free system resources free and, logically, they are stopped when client demand for resources increases. We are using a lot of python to manage our infrastructure and as we searched through, we did not find any library that provides a decent pythonic way to modify CGroups. That is what we aimed for when developing cgroupspy.
The implementation of the above mentioned is fairly straightforward. In CGroups, each file has its own format – you have files that contain only one integer or a list of integers, or a list integers separated by new lines, etc. What we essentially did was provide a proper model for each type of cgroup – it being memory, cpu set, cpu or any other. This not only provides validation and sanitization of input values, but also gives you the benefit of type conversion when getting and setting parameters.
What is still not in there:
We currently support only memory, cpu, cpuset and cpuacct subsystems. Others like freezer, blkio, devices, etc. are fairly straightforward, and will soon be implemented for a complete support of all cgroups subsystems
Since the code is still in new, it only supports python 2.7 and recent versions of the cgroups API. Since the code does not have any external dependencies, support for 2.6 as well as 3.x will come soon.
Cgroups provides a notification API which is not yet supported. Once it is, one could react on events like “out of memory” in a cgroup and give it more memory or kill some unneeded process
Reacting to realtime changes to the cgroups tree – cgroups appearing and disappearing.
cgroupspy provides a simple, pythonic way to manage cgroups. You can use it to programmatically manage the resource usage of your libvirt guests, lxc containers or just any plain old linux process. It can potentially be used to gather per cgroup usage statistics and react to events, so your system runs as smoothly as possible.
Go check out the example usage in the readme at https://github.com/cloudsigma/cgroupspy. Any contributions are more than welcome and we have released the code under the flexible new BSD license to feel free to go ahead and have fun with it!
Enjoy your cloud computing,