Blob Blame History Raw
From f3b6556eedcc1f8278e45ed809b06f243fe99ced Mon Sep 17 00:00:00 2001
Message-Id: <f3b6556eedcc1f8278e45ed809b06f243fe99ced.1386348946.git.jdenemar@redhat.com>
From: "Daniel P. Berrange" <berrange@redhat.com>
Date: Mon, 2 Dec 2013 13:36:29 +0000
Subject: [PATCH] Improve cgroups docs to cover systemd integration

For

  https://bugzilla.redhat.com/show_bug.cgi?id=1004340

As of libvirt 1.1.1 and systemd 205, the cgroups layout used by
libvirt has some changes. Update the 'cgroups.html' file from
the website to describe how it works in a systemd world.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 7f2b173febaefda73b486337b6c53f5c2127070f)
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
---
 docs/cgroups.html.in | 212 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 172 insertions(+), 40 deletions(-)

diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in
index 77656b2..f7c2450 100644
--- a/docs/cgroups.html.in
+++ b/docs/cgroups.html.in
@@ -47,17 +47,121 @@
     <p>
       As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
       simplified, in order to facilitate the setup of resource control policies by
-      administrators / management applications. The layout is based on the concepts of
-      "partitions" and "consumers". Each virtual machine or container is a consumer,
-      and has a corresponding cgroup named <code>$VMNAME.libvirt-{qemu,lxc}</code>.
-      Each consumer is associated with exactly one partition, which also have a
-      corresponding cgroup usually named <code>$PARTNAME.partition</code>. The
-      exceptions to this naming rule are the three top level default partitions,
-      named <code>/system</code> (for system services), <code>/user</code> (for
-      user login sessions) and <code>/machine</code> (for virtual machines and
-      containers). By default every consumer will of course be associated with
-      the <code>/machine</code> partition. This leads to a hierarchy that looks
-      like
+      administrators / management applications. The new layout is based on the concepts
+      of "partitions" and "consumers". A "consumer" is a cgroup which holds the
+      processes for a single virtual machine or container. A "partition" is a cgroup
+      which does not contain any processes, but can have resource controls applied.
+      A "partition" will have zero or more child directories which may be either
+      "consumer" or "partition".
+    </p>
+
+    <p>
+      As of libvirt 1.1.1 or later, the cgroups layout will have some slight
+      differences when running on a host with systemd 205 or later. The overall
+      tree structure is the same, but there are some differences in the naming
+      conventions for the cgroup directories. Thus the following docs split
+      in two, one describing systemd hosts and the other non-systemd hosts.
+    </p>
+
+    <h3><a name="currentLayoutSystemd">Systemd cgroups integration</a></h3>
+
+    <p>
+      On hosts which use systemd, each consumer maps to a systemd scope unit,
+      while partitions map to a system slice unit.
+    </p>
+
+    <h4><a name="systemdScope">Systemd scope naming</a></h4>
+
+    <p>
+      The systemd convention is for the scope name of virtual machines / containers
+      to be of the general format <code>machine-$NAME.scope</code>. Libvirt forms the
+      <code>$NAME</code> part of this by concatenating the driver type with the name
+      of the guest, and then escaping any systemd reserved characters.
+      So for a guest <code>demo</code> running under the <code>lxc</code> driver,
+      we get a <code>$NAME</code> of <code>lxc-demo</code> which when escaped is
+      <code>lxc\x2ddemo</code>. So the complete scope name is <code>machine-lxc\x2ddemo.scope</code>.
+      The scope names map directly to the cgroup directory names.
+    </p>
+
+    <h4><a name="systemdSlice">Systemd slice naming</a></h4>
+
+    <p>
+      The systemd convention for slice naming is that a slice should include the
+      name of all of its parents prepended on its own name. So for a libvirt
+      partition <code>/machine/engineering/testing</code>, the slice name will
+      be <code>machine-engineering-testing.slice</code>. Again the slice names
+      map directly to the cgroup directory names. Systemd creates three top level
+      slices by default, <code>system.slice</code> <code>user.slice</code> and
+      <code>machine.slice</code>. All virtual machines or containers created
+      by libvirt will be associated with <code>machine.slice</code> by default.
+    </p>
+
+    <h4><a name="systemdLayout">Systemd cgroup layout</a></h4>
+
+    <p>
+      Given this, a possible systemd cgroups layout involving 3 qemu guests,
+      3 lxc containers and 3 custom child slices, would be:
+    </p>
+
+    <pre>
+$ROOT
+  |
+  +- system.slice
+  |   |
+  |   +- libvirtd.service
+  |
+  +- machine.slice
+      |
+      +- machine-qemu\x2dvm1.scope
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- machine-qemu\x2dvm2.scope
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- machine-qemu\x2dvm3.scope
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- machine-engineering.slice
+      |   |
+      |   +- machine-engineering-testing.slice
+      |   |   |
+      |   |   +- machine-lxc\x2dcontainer1.scope
+      |   |
+      |   +- machine-engineering-production.slice
+      |       |
+      |       +- machine-lxc\x2dcontainer2.scope
+      |
+      +- machine-marketing.slice
+          |
+          +- machine-lxc\x2dcontainer3.scope
+    </pre>
+
+    <h3><a name="currentLayoutGeneric">Non-systemd cgroups layout</a></h3>
+
+    <p>
+      On hosts which do not use systemd, each consumer has a corresponding cgroup
+      named <code>$VMNAME.libvirt-{qemu,lxc}</code>. Each consumer is associated
+      with exactly one partition, which also have a corresponding cgroup usually
+      named <code>$PARTNAME.partition</code>. The exceptions to this naming rule
+      are the three top level default partitions, named <code>/system</code> (for
+      system services), <code>/user</code> (for user login sessions) and
+      <code>/machine</code> (for virtual machines and containers). By default
+      every consumer will of course be associated with the <code>/machine</code>
+      partition.
+    </p>
+
+    <p>
+      Given this, a possible systemd cgroups layout involving 3 qemu guests,
+      3 lxc containers and 2 custom child slices, would be:
     </p>
 
     <pre>
@@ -87,23 +191,21 @@ $ROOT
       |   +- vcpu0
       |   +- vcpu1
       |
-      +- container1.libvirt-lxc
-      |
-      +- container2.libvirt-lxc
+      +- engineering.partition
+      |   |
+      |   +- testing.partition
+      |   |   |
+      |   |   +- container1.libvirt-lxc
+      |   |
+      |   +- production.partition
+      |       |
+      |       +- container2.libvirt-lxc
       |
-      +- container3.libvirt-lxc
+      +- marketing.partition
+          |
+          +- container3.libvirt-lxc
     </pre>
 
-    <p>
-      The default cgroups layout ensures that, when there is contention for
-      CPU time, it is shared equally between system services, user sessions
-      and virtual machines / containers. This prevents virtual machines from
-      locking the administrator out of the host, or impacting execution of
-      system services. Conversely, when there is no contention from
-      system services / user sessions, it is possible for virtual machines
-      to fully utilize the host CPUs.
-    </p>
-
     <h2><a name="customPartiton">Using custom partitions</a></h2>
 
     <p>
@@ -127,12 +229,54 @@ $ROOT
     </pre>
 
     <p>
+      Note that the partition names in the guest XML are using a
+      generic naming format, not the low level naming convention
+      required by the underlying host OS. That is, you should not include
+      any of the <code>.partition</code> or <code>.slice</code>
+      suffixes in the XML config. Given a partition name
+      <code>/machine/production</code>, libvirt will automatically
+      apply the platform specific translation required to get
+      <code>/machine/production.partition</code> (non-systemd)
+      or <code>/machine.slice/machine-production.slice</code>
+      (systemd) as the underlying cgroup name
+    </p>
+
+    <p>
       Libvirt will not auto-create the cgroups directory to back
       this partition. In the future, libvirt / virsh will provide
       APIs / commands to create custom partitions, but currently
-      this is left as an exercise for the administrator. For
-      example, given the XML config above, the admin would need
-      to create a cgroup named '/machine/production.partition'
+      this is left as an exercise for the administrator.
+    </p>
+
+    <p>
+      <strong>Note:</strong> the ability to place guests in custom
+      partitions is only available with libvirt &gt;= 1.0.5, using
+      the new cgroup layout. The legacy cgroups layout described
+      later in this document did not support customization per guest.
+    </p>
+
+    <h3><a name="createSystemd">Creating custom partitions (systemd)</a></h3>
+
+    <p>
+      Given the XML config above, the admin on a systemd based host would
+      need to create a unit file <code>/etc/systemd/system/machine-production.slice</code>
+    </p>
+
+    <pre>
+# cat &gt; /etc/systemd/system/machine-testing.slice &lt;&lt;EOF
+[Unit]
+Description=VM testing slice
+Before=slices.target
+Wants=machine.slice
+EOF
+# systemctl start machine-testing.slice
+    </pre>
+
+    <h3><a name="createNonSystemd">Creating custom partitions (non-systemd)</a></h3>
+
+    <p>
+      Given the XML config above, the admin on a non-systemd based host
+      would need to create a cgroup named '/machine/production.partition'
     </p>
 
     <pre>
@@ -147,18 +291,6 @@ $ROOT
   done
 </pre>
 
-    <p>
-      <strong>Note:</strong> the cgroups directory created as a ".partition"
-      suffix, but the XML config does not require this suffix.
-    </p>
-
-    <p>
-      <strong>Note:</strong> the ability to place guests in custom
-      partitions is only available with libvirt &gt;= 1.0.5, using
-      the new cgroup layout. The legacy cgroups layout described
-      later did not support customization per guest.
-    </p>
-
     <h2><a name="resourceAPIs">Resource management APIs/commands</a></h2>
 
     <p>
-- 
1.8.4.5