Blame SOURCES/kvm-s390-bios-Support-booting-from-real-dasd-device.patch

Pablo Greco e6a3ae
From 2267eadd85126ea711cc8314c7df45a70486651c Mon Sep 17 00:00:00 2001
Pablo Greco e6a3ae
From: Thomas Huth <thuth@redhat.com>
Pablo Greco e6a3ae
Date: Mon, 14 Oct 2019 10:06:44 +0100
Pablo Greco e6a3ae
Subject: [PATCH 19/21] s390-bios: Support booting from real dasd device
Pablo Greco e6a3ae
Pablo Greco e6a3ae
RH-Author: Thomas Huth <thuth@redhat.com>
Pablo Greco e6a3ae
Message-id: <20191014100645.22862-17-thuth@redhat.com>
Pablo Greco e6a3ae
Patchwork-id: 91791
Pablo Greco e6a3ae
O-Subject: [RHEL-8.2.0 qemu-kvm PATCH v2 16/17] s390-bios: Support booting from real dasd device
Pablo Greco e6a3ae
Bugzilla: 1664376
Pablo Greco e6a3ae
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
Pablo Greco e6a3ae
RH-Acked-by: David Hildenbrand <david@redhat.com>
Pablo Greco e6a3ae
RH-Acked-by: Jens Freimann <jfreimann@redhat.com>
Pablo Greco e6a3ae
Pablo Greco e6a3ae
From: "Jason J. Herne" <jjherne@linux.ibm.com>
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Allows guest to boot from a vfio configured real dasd device.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
Pablo Greco e6a3ae
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Pablo Greco e6a3ae
Message-Id: <1554388475-18329-16-git-send-email-jjherne@linux.ibm.com>
Pablo Greco e6a3ae
Signed-off-by: Thomas Huth <thuth@redhat.com>
Pablo Greco e6a3ae
(cherry picked from commit efa47d36da89f4b23c315a7cc085fab0d15eb47c)
Pablo Greco e6a3ae
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Conflicts:
Pablo Greco e6a3ae
	MAINTAINERS
Pablo Greco e6a3ae
	(simple contextual conflict due to missing downstream commits)
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
Pablo Greco e6a3ae
---
Pablo Greco e6a3ae
 MAINTAINERS                  |   3 +-
Pablo Greco e6a3ae
 docs/devel/s390-dasd-ipl.txt | 133 ++++++++++++++++++++++++
Pablo Greco e6a3ae
 pc-bios/s390-ccw/Makefile    |   2 +-
Pablo Greco e6a3ae
 pc-bios/s390-ccw/dasd-ipl.c  | 235 +++++++++++++++++++++++++++++++++++++++++++
Pablo Greco e6a3ae
 pc-bios/s390-ccw/dasd-ipl.h  |  16 +++
Pablo Greco e6a3ae
 pc-bios/s390-ccw/main.c      |   5 +
Pablo Greco e6a3ae
 pc-bios/s390-ccw/s390-arch.h |  13 +++
Pablo Greco e6a3ae
 7 files changed, 405 insertions(+), 2 deletions(-)
Pablo Greco e6a3ae
 create mode 100644 docs/devel/s390-dasd-ipl.txt
Pablo Greco e6a3ae
 create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
Pablo Greco e6a3ae
 create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
Pablo Greco e6a3ae
Pablo Greco e6a3ae
diff --git a/MAINTAINERS b/MAINTAINERS
Pablo Greco e6a3ae
index 9b74756..770885a 100644
Pablo Greco e6a3ae
--- a/MAINTAINERS
Pablo Greco e6a3ae
+++ b/MAINTAINERS
Pablo Greco e6a3ae
@@ -896,7 +896,8 @@ M: Thomas Huth <thuth@redhat.com>
Pablo Greco e6a3ae
 S: Supported
Pablo Greco e6a3ae
 F: pc-bios/s390-ccw/
Pablo Greco e6a3ae
 F: pc-bios/s390-ccw.img
Pablo Greco e6a3ae
-T: git git://github.com/borntraeger/qemu.git s390-next
Pablo Greco e6a3ae
+F: docs/devel/s390-dasd-ipl.txt
Pablo Greco e6a3ae
+T: git https://github.com/borntraeger/qemu.git s390-next
Pablo Greco e6a3ae
 L: qemu-s390x@nongnu.org
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 UniCore32 Machines
Pablo Greco e6a3ae
diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
Pablo Greco e6a3ae
new file mode 100644
Pablo Greco e6a3ae
index 0000000..9107e04
Pablo Greco e6a3ae
--- /dev/null
Pablo Greco e6a3ae
+++ b/docs/devel/s390-dasd-ipl.txt
Pablo Greco e6a3ae
@@ -0,0 +1,133 @@
Pablo Greco e6a3ae
+*****************************
Pablo Greco e6a3ae
+***** s390 hardware IPL *****
Pablo Greco e6a3ae
+*****************************
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+The s390 hardware IPL process consists of the following steps.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+1. A READ IPL ccw is constructed in memory location 0x0.
Pablo Greco e6a3ae
+    This ccw, by definition, reads the IPL1 record which is located on the disk
Pablo Greco e6a3ae
+    at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
Pablo Greco e6a3ae
+    so when it is complete another ccw will be fetched and executed from memory
Pablo Greco e6a3ae
+    location 0x08.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
Pablo Greco e6a3ae
+    IPL1 data is 24 bytes in length and consists of the following pieces of
Pablo Greco e6a3ae
+    information: [psw][read ccw][tic ccw]. When the machine executes the Read
Pablo Greco e6a3ae
+    IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
Pablo Greco e6a3ae
+    location 0x0. Then the ccw program at 0x08 which consists of a read
Pablo Greco e6a3ae
+    ccw and a tic ccw is automatically executed because of the chain flag from
Pablo Greco e6a3ae
+    the original READ IPL ccw. The read ccw will read the IPL2 data into memory
Pablo Greco e6a3ae
+    and the TIC (Transfer In Channel) will transfer control to the channel
Pablo Greco e6a3ae
+    program contained in the IPL2 data. The TIC channel command is the
Pablo Greco e6a3ae
+    equivalent of a branch/jump/goto instruction for channel programs.
Pablo Greco e6a3ae
+    NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+3. Execute IPL2.
Pablo Greco e6a3ae
+    The TIC ccw instruction at the end of the IPL1 channel program will begin
Pablo Greco e6a3ae
+    the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
Pablo Greco e6a3ae
+    process and will contain a larger channel program than IPL1. The point of
Pablo Greco e6a3ae
+    IPL2 is to find and load either the operating system or a small program that
Pablo Greco e6a3ae
+    loads the operating system from disk. At the end of this step all or some of
Pablo Greco e6a3ae
+    the real operating system is loaded into memory and we are ready to hand
Pablo Greco e6a3ae
+    control over to the guest operating system. At this point the guest
Pablo Greco e6a3ae
+    operating system is entirely responsible for loading any more data it might
Pablo Greco e6a3ae
+    need to function. NOTE: The IPL2 channel program might read data into memory
Pablo Greco e6a3ae
+    location 0 thereby overwriting the IPL1 psw and channel program. This is ok
Pablo Greco e6a3ae
+    as long as the data placed in location 0 contains a psw whose instruction
Pablo Greco e6a3ae
+    address points to the guest operating system code to execute at the end of
Pablo Greco e6a3ae
+    the IPL/boot process.
Pablo Greco e6a3ae
+    NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+4. Start executing the guest operating system.
Pablo Greco e6a3ae
+    The psw that was loaded into memory location 0 as part of the ipl process
Pablo Greco e6a3ae
+    should contain the needed flags for the operating system we have loaded. The
Pablo Greco e6a3ae
+    psw's instruction address will point to the location in memory where we want
Pablo Greco e6a3ae
+    to start executing the operating system. This psw is loaded (via LPSW
Pablo Greco e6a3ae
+    instruction) causing control to be passed to the operating system code.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+In a non-virtualized environment this process, handled entirely by the hardware,
Pablo Greco e6a3ae
+is kicked off by the user initiating a "Load" procedure from the hardware
Pablo Greco e6a3ae
+management console. This "Load" procedure crafts a special "Read IPL" ccw in
Pablo Greco e6a3ae
+memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
Pablo Greco e6a3ae
+off the reading of IPL1 data. Since the channel program from IPL1 will be
Pablo Greco e6a3ae
+written immediately after the special "Read IPL" ccw, the IPL1 channel program
Pablo Greco e6a3ae
+will be executed immediately (the special read ccw has the chaining bit turned
Pablo Greco e6a3ae
+on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
Pablo Greco e6a3ae
+program to be executed automatically. After this sequence completes the "Load"
Pablo Greco e6a3ae
+procedure then loads the psw from 0x0.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+**********************************************************
Pablo Greco e6a3ae
+***** How this all pertains to QEMU (and the kernel) *****
Pablo Greco e6a3ae
+**********************************************************
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+In theory we should merely have to do the following to IPL/boot a guest
Pablo Greco e6a3ae
+operating system from a DASD device:
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
Pablo Greco e6a3ae
+2. Execute channel program at 0x0.
Pablo Greco e6a3ae
+3. LPSW 0x0.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+However, our emulation of the machine's channel program logic within the kernel
Pablo Greco e6a3ae
+is missing one key feature that is required for this process to work:
Pablo Greco e6a3ae
+non-prefetch of ccw data.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+When we start a channel program we pass the channel subsystem parameters via an
Pablo Greco e6a3ae
+ORB (Operation Request Block). One of those parameters is a prefetch bit. If the
Pablo Greco e6a3ae
+bit is on then the vfio-ccw kernel driver is allowed to read the entire channel
Pablo Greco e6a3ae
+program from guest memory before it starts executing it. This means that any
Pablo Greco e6a3ae
+channel commands that read additional channel commands will not work as expected
Pablo Greco e6a3ae
+because the newly read commands will only exist in guest memory and NOT within
Pablo Greco e6a3ae
+the kernel's channel subsystem memory. The kernel vfio-ccw driver currently
Pablo Greco e6a3ae
+requires this bit to be on for all channel programs. This is a problem because
Pablo Greco e6a3ae
+the IPL process consists of transferring control from the "Read IPL" ccw
Pablo Greco e6a3ae
+immediately to the IPL1 channel program that was read by "Read IPL".
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+Not being able to turn off prefetch will also prevent the TIC at the end of the
Pablo Greco e6a3ae
+IPL1 channel program from transferring control to the IPL2 channel program.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
Pablo Greco e6a3ae
+transfers control to another channel program segment immediately after reading
Pablo Greco e6a3ae
+it from the disk. So we need to be able to handle this case.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+**************************
Pablo Greco e6a3ae
+***** What QEMU does *****
Pablo Greco e6a3ae
+**************************
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+Since we are forced to live with prefetch we cannot use the very simple IPL
Pablo Greco e6a3ae
+procedure we defined in the preceding section. So we compensate by doing the
Pablo Greco e6a3ae
+following.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
Pablo Greco e6a3ae
+2. Execute "Read IPL" at 0x0.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+   So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+4. Write a custom channel program that will seek to the IPL2 record and then
Pablo Greco e6a3ae
+   execute the READ and TIC ccws from IPL1.  Normally the seek is not required
Pablo Greco e6a3ae
+   because after reading the IPL1 record the disk is automatically positioned
Pablo Greco e6a3ae
+   to read the very next record which will be IPL2. But since we are not reading
Pablo Greco e6a3ae
+   both IPL1 and IPL2 as part of the same channel program we must manually set
Pablo Greco e6a3ae
+   the position.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+5. Grab the target address of the TIC instruction from the IPL1 channel program.
Pablo Greco e6a3ae
+   This address is where the IPL2 channel program starts.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+   Now IPL2 is loaded into memory somewhere, and we know the address.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+6. Execute the IPL2 channel program at the address obtained in step #5.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+   Because this channel program can be dynamic, we must use a special algorithm
Pablo Greco e6a3ae
+   that detects a READ immediately followed by a TIC and breaks the ccw chain
Pablo Greco e6a3ae
+   by turning off the chain bit in the READ ccw. When control is returned from
Pablo Greco e6a3ae
+   the kernel/hardware to the QEMU bios code we immediately issue another start
Pablo Greco e6a3ae
+   subchannel to execute the remaining TIC instruction. This causes the entire
Pablo Greco e6a3ae
+   channel program (starting from the TIC) and all needed data to be refetched
Pablo Greco e6a3ae
+   thereby stepping around the limitation that would otherwise prevent this
Pablo Greco e6a3ae
+   channel program from executing properly.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+   Now the operating system code is loaded somewhere in guest memory and the psw
Pablo Greco e6a3ae
+   in memory location 0x0 will point to entry code for the guest operating
Pablo Greco e6a3ae
+   system.
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+7. LPSW 0x0.
Pablo Greco e6a3ae
+   LPSW transfers control to the guest operating system and we're done.
Pablo Greco e6a3ae
diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
Pablo Greco e6a3ae
index acca961..d6a6e18 100644
Pablo Greco e6a3ae
--- a/pc-bios/s390-ccw/Makefile
Pablo Greco e6a3ae
+++ b/pc-bios/s390-ccw/Makefile
Pablo Greco e6a3ae
@@ -10,7 +10,7 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
Pablo Greco e6a3ae
 .PHONY : all clean build-all
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 OBJECTS = start.o main.o bootmap.o jump2ipl.o sclp.o menu.o \
Pablo Greco e6a3ae
-	  virtio.o virtio-scsi.o virtio-blkdev.o libc.o cio.o
Pablo Greco e6a3ae
+	  virtio.o virtio-scsi.o virtio-blkdev.o libc.o cio.o dasd-ipl.o
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
Pablo Greco e6a3ae
 QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
Pablo Greco e6a3ae
diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
Pablo Greco e6a3ae
new file mode 100644
Pablo Greco e6a3ae
index 0000000..0fc879b
Pablo Greco e6a3ae
--- /dev/null
Pablo Greco e6a3ae
+++ b/pc-bios/s390-ccw/dasd-ipl.c
Pablo Greco e6a3ae
@@ -0,0 +1,235 @@
Pablo Greco e6a3ae
+/*
Pablo Greco e6a3ae
+ * S390 IPL (boot) from a real DASD device via vfio framework.
Pablo Greco e6a3ae
+ *
Pablo Greco e6a3ae
+ * Copyright (c) 2019 Jason J. Herne <jjherne@us.ibm.com>
Pablo Greco e6a3ae
+ *
Pablo Greco e6a3ae
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
Pablo Greco e6a3ae
+ * your option) any later version. See the COPYING file in the top-level
Pablo Greco e6a3ae
+ * directory.
Pablo Greco e6a3ae
+ */
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+#include "libc.h"
Pablo Greco e6a3ae
+#include "s390-ccw.h"
Pablo Greco e6a3ae
+#include "s390-arch.h"
Pablo Greco e6a3ae
+#include "dasd-ipl.h"
Pablo Greco e6a3ae
+#include "helper.h"
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static char prefix_page[PAGE_SIZE * 2]
Pablo Greco e6a3ae
+            __attribute__((__aligned__(PAGE_SIZE * 2)));
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void enable_prefixing(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    memcpy(&prefix_page, lowcore, 4096);
Pablo Greco e6a3ae
+    set_prefix(ptr2u32(&prefix_page));
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void disable_prefixing(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    set_prefix(0);
Pablo Greco e6a3ae
+    /* Copy io interrupt info back to low core */
Pablo Greco e6a3ae
+    memcpy((void *)&lowcore->subchannel_id, prefix_page + 0xB8, 12);
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static bool is_read_tic_ccw_chain(Ccw0 *ccw)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *next_ccw = ccw + 1;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    return ((ccw->cmd_code == CCW_CMD_DASD_READ ||
Pablo Greco e6a3ae
+            ccw->cmd_code == CCW_CMD_DASD_READ_MT) &&
Pablo Greco e6a3ae
+            ccw->chain && next_ccw->cmd_code == CCW_CMD_TIC);
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static bool dynamic_cp_fixup(uint32_t ccw_addr, uint32_t  *next_cpa)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *cur_ccw = (Ccw0 *)(uint64_t)ccw_addr;
Pablo Greco e6a3ae
+    Ccw0 *tic_ccw;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    while (true) {
Pablo Greco e6a3ae
+        /* Skip over inline TIC (it might not have the chain bit on)  */
Pablo Greco e6a3ae
+        if (cur_ccw->cmd_code == CCW_CMD_TIC &&
Pablo Greco e6a3ae
+            cur_ccw->cda == ptr2u32(cur_ccw) - 8) {
Pablo Greco e6a3ae
+            cur_ccw += 1;
Pablo Greco e6a3ae
+            continue;
Pablo Greco e6a3ae
+        }
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+        if (!cur_ccw->chain) {
Pablo Greco e6a3ae
+            break;
Pablo Greco e6a3ae
+        }
Pablo Greco e6a3ae
+        if (is_read_tic_ccw_chain(cur_ccw)) {
Pablo Greco e6a3ae
+            /*
Pablo Greco e6a3ae
+             * Breaking a chain of CCWs may alter the semantics or even the
Pablo Greco e6a3ae
+             * validity of a channel program. The heuristic implemented below
Pablo Greco e6a3ae
+             * seems to work well in practice for the channel programs
Pablo Greco e6a3ae
+             * generated by zipl.
Pablo Greco e6a3ae
+             */
Pablo Greco e6a3ae
+            tic_ccw = cur_ccw + 1;
Pablo Greco e6a3ae
+            *next_cpa = tic_ccw->cda;
Pablo Greco e6a3ae
+            cur_ccw->chain = 0;
Pablo Greco e6a3ae
+            return true;
Pablo Greco e6a3ae
+        }
Pablo Greco e6a3ae
+        cur_ccw += 1;
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+    return false;
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static int run_dynamic_ccw_program(SubChannelId schid, uint16_t cutype,
Pablo Greco e6a3ae
+                                   uint32_t cpa)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    bool has_next;
Pablo Greco e6a3ae
+    uint32_t next_cpa = 0;
Pablo Greco e6a3ae
+    int rc;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    do {
Pablo Greco e6a3ae
+        has_next = dynamic_cp_fixup(cpa, &next_cpa);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+        print_int("executing ccw chain at ", cpa);
Pablo Greco e6a3ae
+        enable_prefixing();
Pablo Greco e6a3ae
+        rc = do_cio(schid, cutype, cpa, CCW_FMT0);
Pablo Greco e6a3ae
+        disable_prefixing();
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+        if (rc) {
Pablo Greco e6a3ae
+            break;
Pablo Greco e6a3ae
+        }
Pablo Greco e6a3ae
+        cpa = next_cpa;
Pablo Greco e6a3ae
+    } while (has_next);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    return rc;
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void make_readipl(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *ccwIplRead = (Ccw0 *)0x00;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /* Create Read IPL ccw at address 0 */
Pablo Greco e6a3ae
+    ccwIplRead->cmd_code = CCW_CMD_READ_IPL;
Pablo Greco e6a3ae
+    ccwIplRead->cda = 0x00; /* Read into address 0x00 in main memory */
Pablo Greco e6a3ae
+    ccwIplRead->chain = 0; /* Chain flag */
Pablo Greco e6a3ae
+    ccwIplRead->count = 0x18; /* Read 0x18 bytes of data */
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void run_readipl(SubChannelId schid, uint16_t cutype)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    if (do_cio(schid, cutype, 0x00, CCW_FMT0)) {
Pablo Greco e6a3ae
+        panic("dasd-ipl: Failed to run Read IPL channel program\n");
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+/*
Pablo Greco e6a3ae
+ * The architecture states that IPL1 data should consist of a psw followed by
Pablo Greco e6a3ae
+ * format-0 READ and TIC CCWs. Let's sanity check.
Pablo Greco e6a3ae
+ */
Pablo Greco e6a3ae
+static void check_ipl1(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *ccwread = (Ccw0 *)0x08;
Pablo Greco e6a3ae
+    Ccw0 *ccwtic = (Ccw0 *)0x10;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    if (ccwread->cmd_code != CCW_CMD_DASD_READ ||
Pablo Greco e6a3ae
+        ccwtic->cmd_code != CCW_CMD_TIC) {
Pablo Greco e6a3ae
+        panic("dasd-ipl: IPL1 data invalid. Is this disk really bootable?\n");
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void check_ipl2(uint32_t ipl2_addr)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *ccw = u32toptr(ipl2_addr);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    if (ipl2_addr == 0x00) {
Pablo Greco e6a3ae
+        panic("IPL2 address invalid. Is this disk really bootable?\n");
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+    if (ccw->cmd_code == 0x00) {
Pablo Greco e6a3ae
+        panic("IPL2 ccw data invalid. Is this disk really bootable?\n");
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static uint32_t read_ipl2_addr(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *ccwtic = (Ccw0 *)0x10;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    return ccwtic->cda;
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void ipl1_fixup(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    Ccw0 *ccwSeek = (Ccw0 *) 0x08;
Pablo Greco e6a3ae
+    Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
Pablo Greco e6a3ae
+    Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
Pablo Greco e6a3ae
+    Ccw0 *ccwRead = (Ccw0 *) 0x20;
Pablo Greco e6a3ae
+    CcwSeekData *seekData = (CcwSeekData *) 0x30;
Pablo Greco e6a3ae
+    CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
Pablo Greco e6a3ae
+    memcpy(ccwRead, (void *)0x08, 16);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /* Disable chaining so we don't TIC to IPL2 channel program */
Pablo Greco e6a3ae
+    ccwRead->chain = 0x00;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
Pablo Greco e6a3ae
+    ccwSeek->cda = ptr2u32(seekData);
Pablo Greco e6a3ae
+    ccwSeek->chain = 1;
Pablo Greco e6a3ae
+    ccwSeek->count = sizeof(*seekData);
Pablo Greco e6a3ae
+    seekData->reserved = 0x00;
Pablo Greco e6a3ae
+    seekData->cyl = 0x00;
Pablo Greco e6a3ae
+    seekData->head = 0x00;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
Pablo Greco e6a3ae
+    ccwSearchID->cda = ptr2u32(searchData);
Pablo Greco e6a3ae
+    ccwSearchID->chain = 1;
Pablo Greco e6a3ae
+    ccwSearchID->count = sizeof(*searchData);
Pablo Greco e6a3ae
+    searchData->cyl = 0;
Pablo Greco e6a3ae
+    searchData->head = 0;
Pablo Greco e6a3ae
+    searchData->record = 2;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /* Go back to Search CCW if correct record not yet found */
Pablo Greco e6a3ae
+    ccwSearchTic->cmd_code = CCW_CMD_TIC;
Pablo Greco e6a3ae
+    ccwSearchTic->cda = ptr2u32(ccwSearchID);
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void run_ipl1(SubChannelId schid, uint16_t cutype)
Pablo Greco e6a3ae
+ {
Pablo Greco e6a3ae
+    uint32_t startAddr = 0x08;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    if (do_cio(schid, cutype, startAddr, CCW_FMT0)) {
Pablo Greco e6a3ae
+        panic("dasd-ipl: Failed to run IPL1 channel program\n");
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static void run_ipl2(SubChannelId schid, uint16_t cutype, uint32_t addr)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    if (run_dynamic_ccw_program(schid, cutype, addr)) {
Pablo Greco e6a3ae
+        panic("dasd-ipl: Failed to run IPL2 channel program\n");
Pablo Greco e6a3ae
+    }
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+/*
Pablo Greco e6a3ae
+ * Limitations in vfio-ccw support complicate the IPL process. Details can
Pablo Greco e6a3ae
+ * be found in docs/devel/s390-dasd-ipl.txt
Pablo Greco e6a3ae
+ */
Pablo Greco e6a3ae
+void dasd_ipl(SubChannelId schid, uint16_t cutype)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    PSWLegacy *pswl = (PSWLegacy *) 0x00;
Pablo Greco e6a3ae
+    uint32_t ipl2_addr;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /* Construct Read IPL CCW and run it to read IPL1 from boot disk */
Pablo Greco e6a3ae
+    make_readipl();
Pablo Greco e6a3ae
+    run_readipl(schid, cutype);
Pablo Greco e6a3ae
+    ipl2_addr = read_ipl2_addr();
Pablo Greco e6a3ae
+    check_ipl1();
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /*
Pablo Greco e6a3ae
+     * Fixup IPL1 channel program to account for vfio-ccw limitations, then run
Pablo Greco e6a3ae
+     * it to read IPL2 channel program from boot disk.
Pablo Greco e6a3ae
+     */
Pablo Greco e6a3ae
+    ipl1_fixup();
Pablo Greco e6a3ae
+    run_ipl1(schid, cutype);
Pablo Greco e6a3ae
+    check_ipl2(ipl2_addr);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /*
Pablo Greco e6a3ae
+     * Run IPL2 channel program to read operating system code from boot disk
Pablo Greco e6a3ae
+     */
Pablo Greco e6a3ae
+    run_ipl2(schid, cutype, ipl2_addr);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    /* Transfer control to the guest operating system */
Pablo Greco e6a3ae
+    pswl->mask |= PSW_MASK_EAMODE;   /* Force z-mode */
Pablo Greco e6a3ae
+    pswl->addr |= PSW_MASK_BAMODE;   /* ...          */
Pablo Greco e6a3ae
+    jump_to_low_kernel();
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
diff --git a/pc-bios/s390-ccw/dasd-ipl.h b/pc-bios/s390-ccw/dasd-ipl.h
Pablo Greco e6a3ae
new file mode 100644
Pablo Greco e6a3ae
index 0000000..c394828
Pablo Greco e6a3ae
--- /dev/null
Pablo Greco e6a3ae
+++ b/pc-bios/s390-ccw/dasd-ipl.h
Pablo Greco e6a3ae
@@ -0,0 +1,16 @@
Pablo Greco e6a3ae
+/*
Pablo Greco e6a3ae
+ * S390 IPL (boot) from a real DASD device via vfio framework.
Pablo Greco e6a3ae
+ *
Pablo Greco e6a3ae
+ * Copyright (c) 2019 Jason J. Herne <jjherne@us.ibm.com>
Pablo Greco e6a3ae
+ *
Pablo Greco e6a3ae
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
Pablo Greco e6a3ae
+ * your option) any later version. See the COPYING file in the top-level
Pablo Greco e6a3ae
+ * directory.
Pablo Greco e6a3ae
+ */
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+#ifndef DASD_IPL_H
Pablo Greco e6a3ae
+#define DASD_IPL_H
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+void dasd_ipl(SubChannelId schid, uint16_t cutype);
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+#endif /* DASD_IPL_H */
Pablo Greco e6a3ae
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
Pablo Greco e6a3ae
index 57a1013..3c449ad 100644
Pablo Greco e6a3ae
--- a/pc-bios/s390-ccw/main.c
Pablo Greco e6a3ae
+++ b/pc-bios/s390-ccw/main.c
Pablo Greco e6a3ae
@@ -13,6 +13,7 @@
Pablo Greco e6a3ae
 #include "s390-ccw.h"
Pablo Greco e6a3ae
 #include "cio.h"
Pablo Greco e6a3ae
 #include "virtio.h"
Pablo Greco e6a3ae
+#include "dasd-ipl.h"
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
Pablo Greco e6a3ae
 static SubChannelId blk_schid = { .one = 1 };
Pablo Greco e6a3ae
@@ -209,6 +210,10 @@ int main(void)
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
     cutype = cu_type(blk_schid);
Pablo Greco e6a3ae
     switch (cutype) {
Pablo Greco e6a3ae
+    case CU_TYPE_DASD_3990:
Pablo Greco e6a3ae
+    case CU_TYPE_DASD_2107:
Pablo Greco e6a3ae
+        dasd_ipl(blk_schid, cutype); /* no return */
Pablo Greco e6a3ae
+        break;
Pablo Greco e6a3ae
     case CU_TYPE_VIRTIO:
Pablo Greco e6a3ae
         virtio_setup();
Pablo Greco e6a3ae
         zipl_load(); /* no return */
Pablo Greco e6a3ae
diff --git a/pc-bios/s390-ccw/s390-arch.h b/pc-bios/s390-ccw/s390-arch.h
Pablo Greco e6a3ae
index 5e92c7a..504fc7c 100644
Pablo Greco e6a3ae
--- a/pc-bios/s390-ccw/s390-arch.h
Pablo Greco e6a3ae
+++ b/pc-bios/s390-ccw/s390-arch.h
Pablo Greco e6a3ae
@@ -87,4 +87,17 @@ typedef struct LowCore {
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 extern LowCore const *lowcore;
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
+static inline void set_prefix(uint32_t address)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    asm volatile("spx %0" : : "m" (address) : "memory");
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+static inline uint32_t store_prefix(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    uint32_t address;
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
+    asm volatile("stpx %0" : "=m" (address));
Pablo Greco e6a3ae
+    return address;
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
 #endif
Pablo Greco e6a3ae
-- 
Pablo Greco e6a3ae
1.8.3.1
Pablo Greco e6a3ae