dcavalca / rpms / dnf

Forked from rpms/dnf 2 years ago
Clone

Blame SOURCES/0001-Base.reset-plug-temporary-leak-of-libsolv-s-page-fil.patch

321a78
From 5ce5ed1ea08ad6e198c1c1642c4d9ea2db6eab86 Mon Sep 17 00:00:00 2001
321a78
From: Laszlo Ersek <lersek@redhat.com>
321a78
Date: Sun, 24 Apr 2022 09:08:28 +0200
321a78
Subject: [PATCH] Base.reset: plug (temporary) leak of libsolv's page file
321a78
 descriptors
321a78
321a78
Consider the following call paths (mixed Python and C), extending from
321a78
livecd-creator down to libsolv:
321a78
321a78
  main                                                [livecd-tools/tools/livecd-creator]
321a78
    install()                                         [livecd-tools/imgcreate/creator.py]
321a78
      fill_sack()                                     [dnf/dnf/base.py]
321a78
        _add_repo_to_sack()                           [dnf/dnf/base.py]
321a78
          load_repo()                                 [libdnf/python/hawkey/sack-py.cpp]
321a78
            dnf_sack_load_repo()                      [libdnf/libdnf/dnf-sack.cpp]
321a78
              write_main()                            [libdnf/libdnf/dnf-sack.cpp]
321a78
                repo_add_solv()                       [libsolv/src/repo_solv.c]
321a78
                  repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
321a78
                    dup()
321a78
              write_ext()                             [libdnf/libdnf/dnf-sack.cpp]
321a78
                repo_add_solv()                       [libsolv/src/repo_solv.c]
321a78
                  repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
321a78
                    dup()
321a78
321a78
The dup() calls create the following file descriptors (output from
321a78
"lsof"):
321a78
321a78
> COMMAND  PID USER FD  TYPE DEVICE SIZE/OFF   NODE NAME
321a78
> python3 6500 root  7r  REG    8,1 25320727 395438 /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf/fedora.solv (deleted)
321a78
> python3 6500 root  8r  REG    8,1 52531426 395450 /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf/fedora-filenames.solvx
321a78
321a78
These file descriptors are *owned* by the DnfSack object (which is derived
321a78
from GObject), as follows:
321a78
321a78
  sack->priv->pool->repos[1]->repodata[1]->store.pagefd = 7
321a78
  sack->priv->pool->repos[1]->repodata[2]->store.pagefd = 8
321a78
  ^     ^     ^     ^         ^            ^      ^
321a78
  |     |     |     |         |            |      |
321a78
  |     |     |     |         |            |      int
321a78
  |     |     |     |         |            Repopagestore [libsolv/src/repopage.h]
321a78
  |     |     |     |         Repodata                   [libsolv/src/repodata.h]
321a78
  |     |     |     struct s_Repo                        [libsolv/src/repo.h]
321a78
  |     |     struct s_Pool (aka Pool)                   [libsolv/src/pool.h]
321a78
  |     DnfSackPrivate                                   [libdnf/libdnf/dnf-sack.cpp]
321a78
  DnfSack                                                [libdnf/libdnf/dnf-sack.h]
321a78
321a78
The file descriptors are *supposed* to be closed on the following call
321a78
path:
321a78
321a78
  main                                         [livecd-tools/tools/livecd-creator]
321a78
    install()                                  [livecd-tools/imgcreate/creator.py]
321a78
      close()                                  [livecd-tools/imgcreate/dnfinst.py]
321a78
        close()                                [dnf/dnf/base.py]
321a78
          reset()                              [dnf/dnf/base.py]
321a78
            _sack = None
321a78
            _goal = None
321a78
            _transaction = None
321a78
              ...
321a78
                dnf_sack_finalize()            [libdnf/libdnf/dnf-sack.cpp]
321a78
                  pool_free()                  [libsolv/src/pool.c]
321a78
                    pool_freeallrepos()        [libsolv/src/pool.c]
321a78
                      repo_freedata()          [libsolv/src/repo.c]
321a78
                        repodata_freedata()    [libsolv/src/repodata.c]
321a78
                          repopagestore_free() [libsolv/src/repopage.c]
321a78
                            close()
321a78
321a78
Namely, when dnf.Base.reset() [dnf/dnf/base.py] is called with (sack=True,
321a78
goal=True), the reference counts of the objects pointed to by the "_sack",
321a78
"_goal" and "_transaction" fields are supposed to reach zero, and then, as
321a78
part of the DnfSack object's finalization, the libsolv file descriptors
321a78
are supposed to be closed.
321a78
321a78
Now, while this *may* happen immediately in dnf.Base.reset(), it may as
321a78
well not. The reason is that there is a multitude of *circular references*
321a78
between DnfSack and the packages that it contains. When dnf.Base.reset()
321a78
is entered, we have the following picture:
321a78
321a78
     _sack                   _goal
321a78
        |                      |
321a78
        v                      v
321a78
   +----------------+      +-------------+
321a78
   | DnfSack object | <--- | Goal object |
321a78
   +----------------+      +-------------+
321a78
     |^    |^    |^
321a78
     ||    ||    ||
321a78
     ||    ||    ||
321a78
  +--||----||----||---+
321a78
  |  v|    v|    v|   | <-- _transaction
321a78
  | Pkg1  Pkg2  PkgN  |
321a78
  |                   |
321a78
  | Transaction oject |
321a78
  +-------------------+
321a78
321a78
That is, the reference count of the DnfSack object is (1 + 1 + N), where N
321a78
is the number of packages in the transaction. Details:
321a78
321a78
(a) The first reference comes from the "_sack" field, established like
321a78
    this:
321a78
321a78
      main                       [livecd-tools/tools/livecd-creator]
321a78
        install()                [livecd-tools/imgcreate/creator.py]
321a78
          fill_sack()            [dnf/dnf/base.py]
321a78
            _build_sack()        [dnf/dnf/sack.py]
321a78
              Sack()
321a78
                sack_init()      [libdnf/python/hawkey/sack-py.cpp]
321a78
                  dnf_sack_new() [libdnf/libdnf/dnf-sack.cpp]
321a78
321a78
(b) The second reference on the DnfSack object comes from "_goal":
321a78
321a78
      main                        [livecd-tools/tools/livecd-creator]
321a78
        install()                 [livecd-tools/imgcreate/creator.py]
321a78
          fill_sack()             [dnf/dnf/base.py]
321a78
             _goal = Goal(_sack)
321a78
               goal_init()        [libdnf/python/hawkey/goal-py.cpp]
321a78
                 Py_INCREF(_sack)
321a78
321a78
(c) Then there is one reference to "_sack" *per package* in the
321a78
    transaction:
321a78
321a78
      main                                  [livecd-tools/tools/livecd-creator]
321a78
        install()                           [livecd-tools/imgcreate/creator.py]
321a78
          runInstall()                      [livecd-tools/imgcreate/dnfinst.py]
321a78
            resolve()                       [dnf/dnf/base.py]
321a78
              _goal2transaction()           [dnf/dnf/base.py]
321a78
                list_installs()             [libdnf/python/hawkey/goal-py.cpp]
321a78
                  list_generic()            [libdnf/python/hawkey/goal-py.cpp]
321a78
                    packagelist_to_pylist() [libdnf/python/hawkey/iutil-py.cpp]
321a78
                      new_package()         [libdnf/python/hawkey/sack-py.cpp]
321a78
                        Py_BuildValue()
321a78
                ts.add_install()
321a78
321a78
    list_installs() creates a list of packages that need to be installed
321a78
    by DNF. Inside the loop in packagelist_to_pylist(), which constructs
321a78
    the elements of that list, Py_BuildValue() is called with the "O"
321a78
    format specifier, and that increases the reference count on "_sack".
321a78
321a78
    Subsequently, in the _goal2transaction() method, we iterate over the
321a78
    package list created by list_installs(), and add each package to the
321a78
    transaction (ts.add_install()). After _goal2transaction() returns,
321a78
    this transaction is assigned to "self._transaction" in resolve(). This
321a78
    is where the last N (back-)references on the DnfSack object come from.
321a78
321a78
(d) Now, to quote the defintion of the DnfSack object
321a78
    ("libdnf/docs/hawkey/tutorial-py.rst"):
321a78
321a78
> *Sack* is an abstraction for a collection of packages.
321a78
321a78
    That's why the DnfSack object references all the Pkg1 through PkgN
321a78
    packages.
321a78
321a78
So, when the dnf.Base.reset() method completes, the picture changes like
321a78
this:
321a78
321a78
     _sack                     _goal
321a78
        |                        |
321a78
   -- [CUT] --              -- [CUT] --
321a78
        |                        |
321a78
        v                |       v
321a78
   +----------------+   [C]  +-------------+
321a78
   | DnfSack object | <-[U]- | Goal object |
321a78
   +----------------+   [T]  +-------------+
321a78
     |^    |^    |^      |
321a78
     ||    ||    ||
321a78
     ||    ||    ||         |
321a78
  +--||----||----||---+    [C]
321a78
  |  v|    v|    v|   | <--[U]-- _transaction
321a78
  | Pkg1  Pkg2  PkgN  |    [T]
321a78
  |                   |     |
321a78
  | Transaction oject |
321a78
  +-------------------+
321a78
321a78
and we are left with N reference cycles (one between each package and the
321a78
same DnfSack object).
321a78
321a78
This set of cycles can only be cleaned up by Python's generational garbage
321a78
collector <https://stackify.com/python-garbage-collection/>. The GC will
321a78
collect the DnfSack object, and consequently close the libsolv page file
321a78
descriptors via dnf_sack_finalize() -- but garbage collection will happen
321a78
*only eventually*, unpredictably.
321a78
321a78
This means that the dnf.Base.reset() method breaks its interface contract:
321a78
321a78
> Make the Base object forget about various things.
321a78
321a78
because the libsolv file descriptors can (and frequently do, in practice)
321a78
survive dnf.Base.reset().
321a78
321a78
In general, as long as the garbage collector only tracks process-private
321a78
memory blocks, there's nothing wrong; however, file descriptors are
321a78
visible to the kernel. When dnf.Base.reset() *temporarily* leaks file
321a78
descriptors as explained above, then immediately subsequent operations
321a78
that depend on those file descriptors having been closed, can fail.
321a78
321a78
An example is livecd-creator's unmounting of:
321a78
321a78
  /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf
321a78
321a78
which the kernel refuses, due to libsolv's still open file descriptors
321a78
pointing into that filesystem:
321a78
321a78
> umount: /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf: target
321a78
> is busy.
321a78
> Unable to unmount /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf
321a78
> normally, using lazy unmount
321a78
321a78
(Unfortunately, the whole lazy umount idea is misguided in livecd-tools;
321a78
it's a misfeature that should be removed, as it permits the corruption of
321a78
the loop-backed filesystem. Now that the real bug is being fixed in DNF,
321a78
lazy umount is not needed as a (broken) workaround in livecd-tools. But
321a78
that's a separate patch for livecd-tools:
321a78
<https://github.com/livecd-tools/livecd-tools/pull/227>.)
321a78
321a78
Plug the fd leak by forcing a garbage collection in dnf.Base.reset()
321a78
whenever we cut the "_sack", "_goal" and "_transaction" links -- that is,
321a78
when the "sack" and "goal" parameters are True.
321a78
321a78
Note that precisely due to the unpredictable behavior of the garbage
321a78
collector, reproducing the bug may prove elusive. In order to reproduce it
321a78
deterministically, through usage with livecd-creator, disabling automatic
321a78
garbage collection with the following patch (for livecd-tools) is
321a78
sufficient:
321a78
321a78
> diff --git a/tools/livecd-creator b/tools/livecd-creator
321a78
> index 291de10cbbf9..8d2c740c238b 100755
321a78
> --- a/tools/livecd-creator
321a78
> +++ b/tools/livecd-creator
321a78
> @@ -31,6 +31,8 @@ from dnf.exceptions import Error as DnfBaseError
321a78
>  import imgcreate
321a78
>  from imgcreate.errors import KickstartError
321a78
>
321a78
> +import gc
321a78
> +
321a78
>  class Usage(Exception):
321a78
>      def __init__(self, msg = None, no_error = False):
321a78
>          Exception.__init__(self, msg, no_error)
321a78
> @@ -261,5 +263,6 @@ def do_nss_libs_hack():
321a78
>      return hack
321a78
>
321a78
>  if __name__ == "__main__":
321a78
> +    gc.disable()
321a78
>      hack = do_nss_libs_hack()
321a78
>      sys.exit(main())
321a78
321a78
Also note that you need to use livecd-tools at git commit 4afde9352e82 or
321a78
later, for this fix to make any difference: said commit fixes a different
321a78
(independent) bug in livecd-tools that produces identical symptoms, but
321a78
from a different origin. In other words, if you don't have commit
321a78
4afde9352e82 in your livecd-tools install, then said bug in livecd-tools
321a78
will mask this DNF fix.
321a78
321a78
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
321a78
---
321a78
 dnf/base.py | 41 +++++++++++++++++++++++++++++++++++++++++
321a78
 1 file changed, 41 insertions(+)
321a78
321a78
diff --git a/dnf/base.py b/dnf/base.py
321a78
index caace028..520574b4 100644
321a78
--- a/dnf/base.py
321a78
+++ b/dnf/base.py
321a78
@@ -72,6 +72,7 @@ import dnf.transaction
321a78
 import dnf.util
321a78
 import dnf.yum.rpmtrans
321a78
 import functools
321a78
+import gc
321a78
 import hawkey
321a78
 import itertools
321a78
 import logging
321a78
@@ -569,6 +570,46 @@ class Base(object):
321a78
             self._comps_trans = dnf.comps.TransactionBunch()
321a78
             self._transaction = None
321a78
         self._update_security_filters = []
321a78
+        if sack and goal:
321a78
+            # We've just done this, above:
321a78
+            #
321a78
+            #      _sack                     _goal
321a78
+            #         |                        |
321a78
+            #    -- [CUT] --              -- [CUT] --
321a78
+            #         |                        |
321a78
+            #         v                |       v
321a78
+            #    +----------------+   [C]  +-------------+
321a78
+            #    | DnfSack object | <-[U]- | Goal object |
321a78
+            #    +----------------+   [T]  +-------------+
321a78
+            #      |^    |^    |^      |
321a78
+            #      ||    ||    ||
321a78
+            #      ||    ||    ||         |
321a78
+            #   +--||----||----||---+    [C]
321a78
+            #   |  v|    v|    v|   | <--[U]-- _transaction
321a78
+            #   | Pkg1  Pkg2  PkgN  |    [T]
321a78
+            #   |                   |     |
321a78
+            #   | Transaction oject |
321a78
+            #   +-------------------+
321a78
+            #
321a78
+            # At this point, the DnfSack object would be released only
321a78
+            # eventually, by Python's generational garbage collector, due to the
321a78
+            # cyclic references DnfSack<->Pkg1 ... DnfSack<->PkgN.
321a78
+            #
321a78
+            # The delayed release is a problem: the DnfSack object may
321a78
+            # (indirectly) own "page file" file descriptors in libsolv, via
321a78
+            # libdnf. For example,
321a78
+            #
321a78
+            #   sack->priv->pool->repos[1]->repodata[1]->store.pagefd = 7
321a78
+            #   sack->priv->pool->repos[1]->repodata[2]->store.pagefd = 8
321a78
+            #
321a78
+            # These file descriptors are closed when the DnfSack object is
321a78
+            # eventually released, that is, when dnf_sack_finalize() (in libdnf)
321a78
+            # calls pool_free() (in libsolv).
321a78
+            #
321a78
+            # We need that to happen right now, as callers may want to unmount
321a78
+            # the filesystems which those file descriptors refer to immediately
321a78
+            # after reset() returns. Therefore, force a garbage collection here.
321a78
+            gc.collect()
321a78
 
321a78
     def _closeRpmDB(self):
321a78
         """Closes down the instances of rpmdb that could be open."""
321a78
-- 
321a78
2.35.1
321a78