yeahuh / rpms / qemu-kvm

Forked from rpms/qemu-kvm 2 years ago
Clone
9ae3a8
From 6baaf82a7742a1de9160146b08ba0cc86b3d4e79 Mon Sep 17 00:00:00 2001
9ae3a8
From: Paolo Bonzini <pbonzini@redhat.com>
9ae3a8
Date: Wed, 10 Jan 2018 17:02:21 +0100
9ae3a8
Subject: [PATCH 2/2] main-loop: Acquire main_context lock around
9ae3a8
 os_host_main_loop_wait.
9ae3a8
9ae3a8
RH-Author: Paolo Bonzini <pbonzini@redhat.com>
9ae3a8
Message-id: <20180110170221.28975-1-pbonzini@redhat.com>
9ae3a8
Patchwork-id: 78541
9ae3a8
O-Subject: [RHEL7.5 qemu-kvm PATCH] main-loop: Acquire main_context lock around os_host_main_loop_wait.
9ae3a8
Bugzilla: 1473536
9ae3a8
RH-Acked-by: Jeffrey Cody <jcody@redhat.com>
9ae3a8
RH-Acked-by: John Snow <jsnow@redhat.com>
9ae3a8
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
9ae3a8
9ae3a8
Bugzilla: 1473536
9ae3a8
9ae3a8
Brew build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14912977
9ae3a8
9ae3a8
When running virt-rescue the serial console hangs from time to time.
9ae3a8
Virt-rescue runs an ordinary Linux kernel "appliance", but there is
9ae3a8
only a single idle process running inside, so the qemu main loop is
9ae3a8
largely idle.  With virt-rescue >= 1.37 you may be able to observe the
9ae3a8
hang by doing:
9ae3a8
9ae3a8
  $ virt-rescue -e ^] --scratch
9ae3a8
  ><rescue> while true; do ls -l /usr/bin; done
9ae3a8
9ae3a8
The hang in virt-rescue can be resolved by pressing a key on the
9ae3a8
serial console.
9ae3a8
9ae3a8
Possibly with the same root cause, we also observed hangs during very
9ae3a8
early boot of regular Linux VMs with a serial console.  Those hangs
9ae3a8
are extremely rare, but you may be able to observe them by running
9ae3a8
this command on baremetal for a sufficiently long time:
9ae3a8
9ae3a8
  $ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done
9ae3a8
9ae3a8
(Check in /tmp/log that the failure was caused by a hang during early
9ae3a8
boot, and not some other reason)
9ae3a8
9ae3a8
During investigation of this bug, Paolo Bonzini wrote:
9ae3a8
9ae3a8
> glib is expecting QEMU to use g_main_context_acquire around accesses to
9ae3a8
> GMainContext.  However QEMU is not doing that, instead it is taking its
9ae3a8
> own mutex.  So we should add g_main_context_acquire and
9ae3a8
> g_main_context_release in the two implementations of
9ae3a8
> os_host_main_loop_wait; these should undo the effect of Frediano's
9ae3a8
> glib patch.
9ae3a8
9ae3a8
This patch exactly implements Paolo's suggestion in that paragraph.
9ae3a8
9ae3a8
This fixes the serial console hang in my testing, across 3 different
9ae3a8
physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
9ae3a8
of automated testing.  I wasn't able to reproduce the early boot hangs
9ae3a8
(but as noted above, these are extremely rare in any case).
9ae3a8
9ae3a8
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432
9ae3a8
Reported-by: Richard W.M. Jones <rjones@redhat.com>
9ae3a8
Tested-by: Richard W.M. Jones <rjones@redhat.com>
9ae3a8
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
9ae3a8
Message-Id: <20170331205133.23906-1-rjones@redhat.com>
9ae3a8
[Paolo: this is actually a glib bug: recent glib versions are also
9ae3a8
expecting g_main_context_acquire around g_poll---but that is not
9ae3a8
documented and probably not even intended].
9ae3a8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9ae3a8
(cherry picked from commit ecbddbb106114f90008024b4e6c3ba1c38d7ca0e)
9ae3a8
9ae3a8
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
9ae3a8
---
9ae3a8
 main-loop.c | 11 +++++++++++
9ae3a8
 1 file changed, 11 insertions(+)
9ae3a8
9ae3a8
diff --git a/main-loop.c b/main-loop.c
9ae3a8
index cf36645..a93d37b 100644
9ae3a8
--- a/main-loop.c
9ae3a8
+++ b/main-loop.c
9ae3a8
@@ -192,9 +192,12 @@ static void glib_pollfds_poll(void)
9ae3a8
 
9ae3a8
 static int os_host_main_loop_wait(uint32_t timeout)
9ae3a8
 {
9ae3a8
+    GMainContext *context = g_main_context_default();
9ae3a8
     int ret;
9ae3a8
     static int spin_counter;
9ae3a8
 
9ae3a8
+    g_main_context_acquire(context);
9ae3a8
+
9ae3a8
     glib_pollfds_fill(&timeout);
9ae3a8
 
9ae3a8
     /* If the I/O thread is very busy or we are incorrectly busy waiting in
9ae3a8
@@ -230,6 +233,9 @@ static int os_host_main_loop_wait(uint32_t timeout)
9ae3a8
     }
9ae3a8
 
9ae3a8
     glib_pollfds_poll();
9ae3a8
+
9ae3a8
+    g_main_context_release(context);
9ae3a8
+
9ae3a8
     return ret;
9ae3a8
 }
9ae3a8
 #else
9ae3a8
@@ -385,12 +391,15 @@ static int os_host_main_loop_wait(uint32_t timeout)
9ae3a8
     fd_set rfds, wfds, xfds;
9ae3a8
     int nfds;
9ae3a8
 
9ae3a8
+    g_main_context_acquire(context);
9ae3a8
+
9ae3a8
     /* XXX: need to suppress polling by better using win32 events */
9ae3a8
     ret = 0;
9ae3a8
     for (pe = first_polling_entry; pe != NULL; pe = pe->next) {
9ae3a8
         ret |= pe->func(pe->opaque);
9ae3a8
     }
9ae3a8
     if (ret != 0) {
9ae3a8
+        g_main_context_release(context);
9ae3a8
         return ret;
9ae3a8
     }
9ae3a8
 
9ae3a8
@@ -440,6 +449,8 @@ static int os_host_main_loop_wait(uint32_t timeout)
9ae3a8
         g_main_context_dispatch(context);
9ae3a8
     }
9ae3a8
 
9ae3a8
+    g_main_context_release(context);
9ae3a8
+
9ae3a8
     return select_ret || g_poll_ret;
9ae3a8
 }
9ae3a8
 #endif
9ae3a8
-- 
9ae3a8
1.8.3.1
9ae3a8