cryptospore / rpms / qemu-kvm

Forked from rpms/qemu-kvm 2 years ago
Clone

Blame SOURCES/kvm-intel-iommu-send-PSI-always-even-if-across-PDEs.patch

ae23c9
From a09020ea2e2e645b95ed603e075938d413f1114f Mon Sep 17 00:00:00 2001
ae23c9
From: Peter Xu <peterx@redhat.com>
ae23c9
Date: Fri, 12 Oct 2018 07:58:38 +0100
ae23c9
Subject: [PATCH 08/17] intel-iommu: send PSI always even if across PDEs
ae23c9
ae23c9
RH-Author: Peter Xu <peterx@redhat.com>
ae23c9
Message-id: <20181012075846.25449-2-peterx@redhat.com>
ae23c9
Patchwork-id: 82674
ae23c9
O-Subject: [RHEL-8 qemu-kvm PATCH 1/9] intel-iommu: send PSI always even if across PDEs
ae23c9
Bugzilla: 1450712
ae23c9
RH-Acked-by: Auger Eric <eric.auger@redhat.com>
ae23c9
RH-Acked-by: Xiao Wang <jasowang@redhat.com>
ae23c9
RH-Acked-by: Michael S. Tsirkin <mst@redhat.com>
ae23c9
ae23c9
SECURITY IMPLICATION: without this patch, any guest with both assigned
ae23c9
device and a vIOMMU might encounter stale IO page mappings even if guest
ae23c9
has already unmapped the page, which may lead to guest memory
ae23c9
corruption.  The stale mappings will only be limited to the guest's own
ae23c9
memory range, so it should not affect the host memory or other guests on
ae23c9
the host.
ae23c9
ae23c9
During IOVA page table walking, there is a special case when the PSI
ae23c9
covers one whole PDE (Page Directory Entry, which contains 512 Page
ae23c9
Table Entries) or more.  In the past, we skip that entry and we don't
ae23c9
notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
ae23c9
notification to registered UNMAP notifiers in this case.
ae23c9
ae23c9
For UNMAP only notifiers, this might cause IOTLBs cached in the devices
ae23c9
even if they were already invalid.  For MAP/UNMAP notifiers like
ae23c9
vfio-pci, this will cause stale page mappings.
ae23c9
ae23c9
This special case doesn't trigger often, but it is very easy to be
ae23c9
triggered by nested device assignments, since in that case we'll
ae23c9
possibly map the whole L2 guest RAM region into the device's IOVA
ae23c9
address space (several GBs at least), which is far bigger than normal
ae23c9
kernel driver usages of the device (tens of MBs normally).
ae23c9
ae23c9
Without this patch applied to L1 QEMU, nested device assignment to L2
ae23c9
guests will dump some errors like:
ae23c9
ae23c9
qemu-system-x86_64: VFIO_MAP_DMA: -17
ae23c9
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
ae23c9
                    0x7f89a920d000) = -17 (File exists)
ae23c9
ae23c9
CC: QEMU Stable <qemu-stable@nongnu.org>
ae23c9
Acked-by: Jason Wang <jasowang@redhat.com>
ae23c9
[peterx: rewrite the commit message]
ae23c9
Signed-off-by: Peter Xu <peterx@redhat.com>
ae23c9
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
ae23c9
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ae23c9
(cherry picked from commit 36d2d52bdb45f5b753a61fdaf0fe7891f1f5b61d)
ae23c9
Signed-off-by: Peter Xu <peterx@redhat.com>
ae23c9
ae23c9
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
ae23c9
---
ae23c9
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
ae23c9
 1 file changed, 30 insertions(+), 12 deletions(-)
ae23c9
ae23c9
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
ae23c9
index fb31de9..b359efd 100644
ae23c9
--- a/hw/i386/intel_iommu.c
ae23c9
+++ b/hw/i386/intel_iommu.c
ae23c9
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
ae23c9
 
ae23c9
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
ae23c9
 
ae23c9
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
ae23c9
+                             vtd_page_walk_hook hook_fn, void *private)
ae23c9
+{
ae23c9
+    assert(hook_fn);
ae23c9
+    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
ae23c9
+                            entry->addr_mask, entry->perm);
ae23c9
+    return hook_fn(entry, private);
ae23c9
+}
ae23c9
+
ae23c9
 /**
ae23c9
  * vtd_page_walk_level - walk over specific level for IOVA range
ae23c9
  *
ae23c9
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
ae23c9
          */
ae23c9
         entry_valid = read_cur | write_cur;
ae23c9
 
ae23c9
+        entry.target_as = &address_space_memory;
ae23c9
+        entry.iova = iova & subpage_mask;
ae23c9
+        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
ae23c9
+        entry.addr_mask = ~subpage_mask;
ae23c9
+
ae23c9
         if (vtd_is_last_slpte(slpte, level)) {
ae23c9
-            entry.target_as = &address_space_memory;
ae23c9
-            entry.iova = iova & subpage_mask;
ae23c9
             /* NOTE: this is only meaningful if entry_valid == true */
ae23c9
             entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
ae23c9
-            entry.addr_mask = ~subpage_mask;
ae23c9
-            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
ae23c9
             if (!entry_valid && !notify_unmap) {
ae23c9
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
ae23c9
                 goto next;
ae23c9
             }
ae23c9
-            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
ae23c9
-                                    entry.addr_mask, entry.perm);
ae23c9
-            if (hook_fn) {
ae23c9
-                ret = hook_fn(&entry, private);
ae23c9
-                if (ret < 0) {
ae23c9
-                    return ret;
ae23c9
-                }
ae23c9
+            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
ae23c9
+            if (ret < 0) {
ae23c9
+                return ret;
ae23c9
             }
ae23c9
         } else {
ae23c9
             if (!entry_valid) {
ae23c9
-                trace_vtd_page_walk_skip_perm(iova, iova_next);
ae23c9
+                if (notify_unmap) {
ae23c9
+                    /*
ae23c9
+                     * The whole entry is invalid; unmap it all.
ae23c9
+                     * Translated address is meaningless, zero it.
ae23c9
+                     */
ae23c9
+                    entry.translated_addr = 0x0;
ae23c9
+                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
ae23c9
+                    if (ret < 0) {
ae23c9
+                        return ret;
ae23c9
+                    }
ae23c9
+                } else {
ae23c9
+                    trace_vtd_page_walk_skip_perm(iova, iova_next);
ae23c9
+                }
ae23c9
                 goto next;
ae23c9
             }
ae23c9
             ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
ae23c9
-- 
ae23c9
1.8.3.1
ae23c9