diff --git a/SOURCES/edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch b/SOURCES/edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch new file mode 100644 index 0000000..a1700de --- /dev/null +++ b/SOURCES/edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch @@ -0,0 +1,105 @@ +From 70c9d989107c6ac964bb437c5a4ea6ffe3214e45 Mon Sep 17 00:00:00 2001 +From: Miroslav Rezanina +Date: Mon, 10 Aug 2020 07:52:28 +0200 +Subject: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before + re-fetch +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +RH-Author: Laszlo Ersek +Message-id: <20200731141037.1941-2-lersek@redhat.com> +Patchwork-id: 98121 +O-Subject: [RHEL-8.3.0 edk2 PATCH 1/1] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch +Bugzilla: 1861718 +RH-Acked-by: Vitaly Kuznetsov +RH-Acked-by: Eduardo Habkost + +Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c" +already call CpuPause() in their loop bodies; see SmmWaitForApArrival(), +APHandler(), and SmiRendezvous(). However, the "main wait" within +APHandler(): + +> // +> // Wait for something to happen +> // +> WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run); + +doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore +without pausing. + +The performance impact is especially notable in QEMU/KVM + OVMF +virtualization with CPU overcommit (that is, when the guest has +significantly more VCPUs than the host has physical CPUs). The guest BSP +is working heavily in: + + BSPHandler() [MpService.c] + PerformRemainingTasks() [PiSmmCpuDxeSmm.c] + SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c] + +while the many guest APs are spinning in the "Wait for something to +happen" semaphore acquisition, in APHandler(). The guest APs are +generating useless memory traffic and saturating host CPUs, hindering the +guest BSP's progress in SetUefiMemMapAttributes(). + +Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration +after the first check fails. Due to Pause Loop Exiting (known as Pause +Filter on AMD), the host scheduler can favor the guest BSP over the guest +APs. + +Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch +reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to +less than 4 minutes. + +The patch should benefit physical machines as well -- according to the +Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding +PAUSE to the generic WaitForSemaphore() function is considered a general +improvement. + +Cc: Eric Dong +Cc: Philippe Mathieu-Daudé +Cc: Rahul Kumar +Cc: Ray Ni +Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718 +Signed-off-by: Laszlo Ersek +Message-Id: <20200729185217.10084-1-lersek@redhat.com> +Reviewed-by: Eric Dong +(cherry picked from commit 9001b750df64b25b14ec45a2efa1361a7b96c00a) +Signed-off-by: Miroslav Rezanina +--- + UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++------- + 1 file changed, 11 insertions(+), 7 deletions(-) + +diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c +index 57e788c..4bcd217 100644 +--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c ++++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c +@@ -40,14 +40,18 @@ WaitForSemaphore ( + { + UINT32 Value; + +- do { ++ for (;;) { + Value = *Sem; +- } while (Value == 0 || +- InterlockedCompareExchange32 ( +- (UINT32*)Sem, +- Value, +- Value - 1 +- ) != Value); ++ if (Value != 0 && ++ InterlockedCompareExchange32 ( ++ (UINT32*)Sem, ++ Value, ++ Value - 1 ++ ) == Value) { ++ break; ++ } ++ CpuPause (); ++ } + return Value - 1; + } + +-- +1.8.3.1 + diff --git a/SPECS/edk2.spec b/SPECS/edk2.spec index af7baac..b64298f 100644 --- a/SPECS/edk2.spec +++ b/SPECS/edk2.spec @@ -7,7 +7,7 @@ ExclusiveArch: x86_64 aarch64 Name: edk2 Version: %{GITDATE}git%{GITCOMMIT} -Release: 2%{?dist} +Release: 3%{?dist} Summary: UEFI firmware for 64-bit virtual machines Group: Applications/Emulators License: BSD-2-Clause-Patent and OpenSSL and MIT @@ -56,6 +56,8 @@ Patch28: edk2-OvmfPkg-QemuKernelLoaderFsDxe-suppress-error-on-no-k.patch Patch29: edk2-OvmfPkg-GenericQemuLoadImageLib-log-Not-Found-at-INF.patch # For bz#1844682 - silent build of edk2-aarch64 logs DEBUG_ERROR messages that don't actually report serious errors Patch30: edk2-SecurityPkg-Tcg2Dxe-suppress-error-on-no-swtpm-in-si.patch +# For bz#1861718 - Very slow boot when overcommitting CPU +Patch31: edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch # python3-devel and libuuid-devel are required for building tools. @@ -505,6 +507,11 @@ true %endif %changelog +* Mon Aug 10 2020 Miroslav Rezanina - 20200602gitca407c7246bf-3.el8 +- edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch [bz#1861718] +- Resolves: bz#1861718 + (Very slow boot when overcommitting CPU) + * Wed Jun 24 2020 Miroslav Rezanina - 20200602gitca407c7246bf-2.el8 - edk2-OvmfPkg-QemuKernelLoaderFsDxe-suppress-error-on-no-k.patch [bz#1844682] - edk2-OvmfPkg-GenericQemuLoadImageLib-log-Not-Found-at-INF.patch [bz#1844682]