|
Pablo Greco |
e6a3ae |
From 569674a3b855f516a8bec22ca365fc7614639ce6 Mon Sep 17 00:00:00 2001
|
|
Pablo Greco |
e6a3ae |
From: Max Reitz <mreitz@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Date: Tue, 23 Jul 2019 14:45:42 +0100
|
|
Pablo Greco |
e6a3ae |
Subject: [PATCH 04/14] nbd/client: Lower min_block for block-status, unaligned
|
|
Pablo Greco |
e6a3ae |
size
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
RH-Author: Max Reitz <mreitz@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Message-id: <20190723144546.23701-4-mreitz@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Patchwork-id: 89650
|
|
Pablo Greco |
e6a3ae |
O-Subject: [RHEL-8.1.0 qemu-kvm PATCH 3/7] nbd/client: Lower min_block for block-status, unaligned size
|
|
Pablo Greco |
e6a3ae |
Bugzilla: 1678979
|
|
Pablo Greco |
e6a3ae |
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
|
|
Pablo Greco |
e6a3ae |
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
|
|
Pablo Greco |
e6a3ae |
RH-Acked-by: John Snow <jsnow@redhat.com>
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
From: Eric Blake <eblake@redhat.com>
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
We have a latent bug in our NBD client code, tickled by the brand new
|
|
Pablo Greco |
e6a3ae |
nbdkit 1.11.10 block status support:
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
$ nbdkit --filter=log --filter=truncate -U - \
|
|
Pablo Greco |
e6a3ae |
data data="1" size=511 truncate=64K logfile=/dev/stdout \
|
|
Pablo Greco |
e6a3ae |
--run 'qemu-img convert $nbd /var/tmp/out'
|
|
Pablo Greco |
e6a3ae |
...
|
|
Pablo Greco |
e6a3ae |
qemu-img: block/io.c:2122: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
The culprit? Our implementation of .bdrv_co_block_status can return
|
|
Pablo Greco |
e6a3ae |
unaligned block status for any server that operates with a lower
|
|
Pablo Greco |
e6a3ae |
actual alignment than what we tell the block layer in
|
|
Pablo Greco |
e6a3ae |
request_alignment, in violation of the block layer's constraints. To
|
|
Pablo Greco |
e6a3ae |
date, we've been unable to trip the bug, because qemu as NBD server
|
|
Pablo Greco |
e6a3ae |
always advertises block sizing (at which point it is a server bug if
|
|
Pablo Greco |
e6a3ae |
the server sends unaligned status - although qemu 3.1 is such a server
|
|
Pablo Greco |
e6a3ae |
and I've sent separate patches for 4.0 both to get the server to obey
|
|
Pablo Greco |
e6a3ae |
the spec, and to let the client to tolerate server oddities at EOF).
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
But nbdkit does not (yet) advertise block sizing, and therefore is not
|
|
Pablo Greco |
e6a3ae |
in violation of the spec for returning block status at whatever
|
|
Pablo Greco |
e6a3ae |
boundaries it wants, and those unaligned results can occur anywhere
|
|
Pablo Greco |
e6a3ae |
rather than just at EOF. While we are still wise to avoid sending
|
|
Pablo Greco |
e6a3ae |
sub-sector read/write requests to a server of unknown origin, we MUST
|
|
Pablo Greco |
e6a3ae |
consider that a server telling us block status without an advertised
|
|
Pablo Greco |
e6a3ae |
block size is correct. So, we either have to munge unaligned answers
|
|
Pablo Greco |
e6a3ae |
from the server into aligned ones that we hand back to the block
|
|
Pablo Greco |
e6a3ae |
layer, or we have to tell the block layer about a smaller alignment.
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
Similarly, if the server advertises an image size that is not
|
|
Pablo Greco |
e6a3ae |
sector-aligned, we might as well assume that the server intends to let
|
|
Pablo Greco |
e6a3ae |
us access those tail bytes, and therefore supports a minimum block
|
|
Pablo Greco |
e6a3ae |
size of 1, regardless of whether the server supports block status
|
|
Pablo Greco |
e6a3ae |
(although we still need more patches to fix the problem that with an
|
|
Pablo Greco |
e6a3ae |
unaligned image, we can send read or block status requests that exceed
|
|
Pablo Greco |
e6a3ae |
EOF to the server). Again, qemu as server cannot trip this problem
|
|
Pablo Greco |
e6a3ae |
(because it rounds images to sector alignment), but nbdkit advertised
|
|
Pablo Greco |
e6a3ae |
unaligned size even before it gained block status support.
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
Solve both alignment problems at once by using better heuristics on
|
|
Pablo Greco |
e6a3ae |
what alignment to report to the block layer when the server did not
|
|
Pablo Greco |
e6a3ae |
give us something to work with. Note that very few NBD servers
|
|
Pablo Greco |
e6a3ae |
implement block status (to date, only qemu and nbdkit are known to do
|
|
Pablo Greco |
e6a3ae |
so); and as the NBD spec mentioned block sizing constraints prior to
|
|
Pablo Greco |
e6a3ae |
documenting block status, it can be assumed that any future
|
|
Pablo Greco |
e6a3ae |
implementations of block status are aware that they must advertise
|
|
Pablo Greco |
e6a3ae |
block size if they want a minimum size other than 1.
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
We've had a long history of struggles with picking the right alignment
|
|
Pablo Greco |
e6a3ae |
to use in the block layer, as evidenced by the commit message of
|
|
Pablo Greco |
e6a3ae |
fd8d372d (v2.12) that introduced the current choice of forced 512-byte
|
|
Pablo Greco |
e6a3ae |
alignment.
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
There is no iotest coverage for this fix, because qemu can't provoke
|
|
Pablo Greco |
e6a3ae |
it, and I didn't want to make test 241 dependent on nbdkit.
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
Fixes: fd8d372d
|
|
Pablo Greco |
e6a3ae |
Reported-by: Richard W.M. Jones <rjones@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Signed-off-by: Eric Blake <eblake@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Message-Id: <20190329042750.14704-3-eblake@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
|
|
Pablo Greco |
e6a3ae |
Tested-by: Richard W.M. Jones <rjones@redhat.com>
|
|
Pablo Greco |
e6a3ae |
(cherry picked from commit 7da537f70d929800ba9c657b8a47a7b827695ccc)
|
|
Pablo Greco |
e6a3ae |
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
Pablo Greco |
e6a3ae |
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
|
|
Pablo Greco |
e6a3ae |
---
|
|
Pablo Greco |
e6a3ae |
block/nbd.c | 19 ++++++++++++++++++-
|
|
Pablo Greco |
e6a3ae |
1 file changed, 18 insertions(+), 1 deletion(-)
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
diff --git a/block/nbd.c b/block/nbd.c
|
|
Pablo Greco |
e6a3ae |
index f29c10f..3d642cd 100644
|
|
Pablo Greco |
e6a3ae |
--- a/block/nbd.c
|
|
Pablo Greco |
e6a3ae |
+++ b/block/nbd.c
|
|
Pablo Greco |
e6a3ae |
@@ -473,7 +473,24 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
|
|
Pablo Greco |
e6a3ae |
uint32_t min = s->info.min_block;
|
|
Pablo Greco |
e6a3ae |
uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
|
|
Pablo Greco |
e6a3ae |
|
|
Pablo Greco |
e6a3ae |
- bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
|
|
Pablo Greco |
e6a3ae |
+ /*
|
|
Pablo Greco |
e6a3ae |
+ * If the server did not advertise an alignment:
|
|
Pablo Greco |
e6a3ae |
+ * - a size that is not sector-aligned implies that an alignment
|
|
Pablo Greco |
e6a3ae |
+ * of 1 can be used to access those tail bytes
|
|
Pablo Greco |
e6a3ae |
+ * - advertisement of block status requires an alignment of 1, so
|
|
Pablo Greco |
e6a3ae |
+ * that we don't violate block layer constraints that block
|
|
Pablo Greco |
e6a3ae |
+ * status is always aligned (as we can't control whether the
|
|
Pablo Greco |
e6a3ae |
+ * server will report sub-sector extents, such as a hole at EOF
|
|
Pablo Greco |
e6a3ae |
+ * on an unaligned POSIX file)
|
|
Pablo Greco |
e6a3ae |
+ * - otherwise, assume the server is so old that we are safer avoiding
|
|
Pablo Greco |
e6a3ae |
+ * sub-sector requests
|
|
Pablo Greco |
e6a3ae |
+ */
|
|
Pablo Greco |
e6a3ae |
+ if (!min) {
|
|
Pablo Greco |
e6a3ae |
+ min = (!QEMU_IS_ALIGNED(s->info.size, BDRV_SECTOR_SIZE) ||
|
|
Pablo Greco |
e6a3ae |
+ s->info.base_allocation) ? 1 : BDRV_SECTOR_SIZE;
|
|
Pablo Greco |
e6a3ae |
+ }
|
|
Pablo Greco |
e6a3ae |
+
|
|
Pablo Greco |
e6a3ae |
+ bs->bl.request_alignment = min;
|
|
Pablo Greco |
e6a3ae |
bs->bl.max_pdiscard = max;
|
|
Pablo Greco |
e6a3ae |
bs->bl.max_pwrite_zeroes = max;
|
|
Pablo Greco |
e6a3ae |
bs->bl.max_transfer = max;
|
|
Pablo Greco |
e6a3ae |
--
|
|
Pablo Greco |
e6a3ae |
1.8.3.1
|
|
Pablo Greco |
e6a3ae |
|