Pablo Greco e6a3ae
From 5935958fc4eb9934b1493486a69f0f571e7da112 Mon Sep 17 00:00:00 2001
Pablo Greco e6a3ae
From: Thomas Huth <thuth@redhat.com>
Pablo Greco e6a3ae
Date: Fri, 30 Aug 2019 12:56:24 +0100
Pablo Greco e6a3ae
Subject: [PATCH 06/10] file-posix: Handle undetectable alignment
Pablo Greco e6a3ae
Pablo Greco e6a3ae
RH-Author: Thomas Huth <thuth@redhat.com>
Pablo Greco e6a3ae
Message-id: <20190830125628.23668-2-thuth@redhat.com>
Pablo Greco e6a3ae
Patchwork-id: 90209
Pablo Greco e6a3ae
O-Subject: [RHEL-8.1.0 qemu-kvm PATCH v2 1/5] file-posix: Handle undetectable alignment
Pablo Greco e6a3ae
Bugzilla: 1738839
Pablo Greco e6a3ae
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
Pablo Greco e6a3ae
RH-Acked-by: Max Reitz <mreitz@redhat.com>
Pablo Greco e6a3ae
RH-Acked-by: David Hildenbrand <david@redhat.com>
Pablo Greco e6a3ae
Pablo Greco e6a3ae
In some cases buf_align or request_alignment cannot be detected:
Pablo Greco e6a3ae
Pablo Greco e6a3ae
1. With Gluster, buf_align cannot be detected since the actual I/O is
Pablo Greco e6a3ae
   done on Gluster server, and qemu buffer alignment does not matter.
Pablo Greco e6a3ae
   Since we don't have alignment requirement, buf_align=1 is the best
Pablo Greco e6a3ae
   value.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
2. With local XFS filesystem, buf_align cannot be detected if reading
Pablo Greco e6a3ae
   from unallocated area. In this we must align the buffer, but we don't
Pablo Greco e6a3ae
   know what is the correct size. Using the wrong alignment results in
Pablo Greco e6a3ae
   I/O error.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
3. With Gluster backed by XFS, request_alignment cannot be detected if
Pablo Greco e6a3ae
   reading from unallocated area. In this case we need to use the
Pablo Greco e6a3ae
   correct alignment, and failing to do so results in I/O errors.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
4. With NFS, the server does not use direct I/O, so both buf_align cannot
Pablo Greco e6a3ae
   be detected. In this case we don't need any alignment so we can use
Pablo Greco e6a3ae
   buf_align=1 and request_alignment=1.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
These cases seems to work when storage sector size is 512 bytes, because
Pablo Greco e6a3ae
the current code starts checking align=512. If the check succeeds
Pablo Greco e6a3ae
because alignment cannot be detected we use 512. But this does not work
Pablo Greco e6a3ae
for storage with 4k sector size.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
To determine if we can detect the alignment, we probe first with
Pablo Greco e6a3ae
align=1. If probing succeeds, maybe there are no alignment requirement
Pablo Greco e6a3ae
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
Pablo Greco e6a3ae
don't have any way to tell, we treat this as undetectable alignment. If
Pablo Greco e6a3ae
probing with align=1 fails with EINVAL, but probing with one of the
Pablo Greco e6a3ae
expected alignments succeeds, we know that we found a working alignment.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Practically the alignment requirements are the same for buffer
Pablo Greco e6a3ae
alignment, buffer length, and offset in file. So in case we cannot
Pablo Greco e6a3ae
detect buf_align, we can use request alignment. If we cannot detect
Pablo Greco e6a3ae
request alignment, we can fallback to a safe value. To use this logic,
Pablo Greco e6a3ae
we probe first request alignment instead of buf_align.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Here is a table showing the behaviour with current code (the value in
Pablo Greco e6a3ae
parenthesis is the optimal value).
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Case    Sector    buf_align (opt)   request_alignment (opt)     result
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
Pablo Greco e6a3ae
---
Pablo Greco e6a3ae
 block/file-posix.c | 36 +++++++++++++++++++++++++-----------
Pablo Greco e6a3ae
 1 file changed, 25 insertions(+), 11 deletions(-)
Pablo Greco e6a3ae
Pablo Greco e6a3ae
diff --git a/block/file-posix.c b/block/file-posix.c
Pablo Greco e6a3ae
index 4b404e4..84c5a31 100644
Pablo Greco e6a3ae
--- a/block/file-posix.c
Pablo Greco e6a3ae
+++ b/block/file-posix.c
Pablo Greco e6a3ae
@@ -324,6 +324,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
Pablo Greco e6a3ae
     BDRVRawState *s = bs->opaque;
Pablo Greco e6a3ae
     char *buf;
Pablo Greco e6a3ae
     size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
Pablo Greco e6a3ae
+    size_t alignments[] = {1, 512, 1024, 2048, 4096};
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
     /* For SCSI generic devices the alignment is not really used.
Pablo Greco e6a3ae
        With buffered I/O, we don't have any restrictions. */
Pablo Greco e6a3ae
@@ -350,25 +351,38 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
Pablo Greco e6a3ae
     }
Pablo Greco e6a3ae
 #endif
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
-    /* If we could not get the sizes so far, we can only guess them */
Pablo Greco e6a3ae
-    if (!s->buf_align) {
Pablo Greco e6a3ae
+    /*
Pablo Greco e6a3ae
+     * If we could not get the sizes so far, we can only guess them. First try
Pablo Greco e6a3ae
+     * to detect request alignment, since it is more likely to succeed. Then
Pablo Greco e6a3ae
+     * try to detect buf_align, which cannot be detected in some cases (e.g.
Pablo Greco e6a3ae
+     * Gluster). If buf_align cannot be detected, we fallback to the value of
Pablo Greco e6a3ae
+     * request_alignment.
Pablo Greco e6a3ae
+     */
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    if (!bs->bl.request_alignment) {
Pablo Greco e6a3ae
+        int i;
Pablo Greco e6a3ae
         size_t align;
Pablo Greco e6a3ae
-        buf = qemu_memalign(max_align, 2 * max_align);
Pablo Greco e6a3ae
-        for (align = 512; align <= max_align; align <<= 1) {
Pablo Greco e6a3ae
-            if (raw_is_io_aligned(fd, buf + align, max_align)) {
Pablo Greco e6a3ae
-                s->buf_align = align;
Pablo Greco e6a3ae
+        buf = qemu_memalign(max_align, max_align);
Pablo Greco e6a3ae
+        for (i = 0; i < ARRAY_SIZE(alignments); i++) {
Pablo Greco e6a3ae
+            align = alignments[i];
Pablo Greco e6a3ae
+            if (raw_is_io_aligned(fd, buf, align)) {
Pablo Greco e6a3ae
+                /* Fallback to safe value. */
Pablo Greco e6a3ae
+                bs->bl.request_alignment = (align != 1) ? align : max_align;
Pablo Greco e6a3ae
                 break;
Pablo Greco e6a3ae
             }
Pablo Greco e6a3ae
         }
Pablo Greco e6a3ae
         qemu_vfree(buf);
Pablo Greco e6a3ae
     }
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
-    if (!bs->bl.request_alignment) {
Pablo Greco e6a3ae
+    if (!s->buf_align) {
Pablo Greco e6a3ae
+        int i;
Pablo Greco e6a3ae
         size_t align;
Pablo Greco e6a3ae
-        buf = qemu_memalign(s->buf_align, max_align);
Pablo Greco e6a3ae
-        for (align = 512; align <= max_align; align <<= 1) {
Pablo Greco e6a3ae
-            if (raw_is_io_aligned(fd, buf, align)) {
Pablo Greco e6a3ae
-                bs->bl.request_alignment = align;
Pablo Greco e6a3ae
+        buf = qemu_memalign(max_align, 2 * max_align);
Pablo Greco e6a3ae
+        for (i = 0; i < ARRAY_SIZE(alignments); i++) {
Pablo Greco e6a3ae
+            align = alignments[i];
Pablo Greco e6a3ae
+            if (raw_is_io_aligned(fd, buf + align, max_align)) {
Pablo Greco e6a3ae
+                /* Fallback to request_aligment. */
Pablo Greco e6a3ae
+                s->buf_align = (align != 1) ? align : bs->bl.request_alignment;
Pablo Greco e6a3ae
                 break;
Pablo Greco e6a3ae
             }
Pablo Greco e6a3ae
         }
Pablo Greco e6a3ae
-- 
Pablo Greco e6a3ae
1.8.3.1
Pablo Greco e6a3ae