path: root/Documentation/virtual/kvm
diff options
authorAnton Arapov <>2012-06-08 12:58:00 +0200
committerAnton Arapov <>2012-06-08 12:58:00 +0200
commit6792a3f47a2e42d7164292bf7f1a55cfc4c91652 (patch)
treeb90c002bfbbeaec92f5d8a2383dcabf6524016f7 /Documentation/virtual/kvm
parentfe2895d3d55146cac65b273c0f83e2c7e543cd0e (diff)
fedora kernel: b920e9b748c595f970bf80ede7832d39f8d567dav3.4.1-2
Signed-off-by: Anton Arapov <>
Diffstat (limited to 'Documentation/virtual/kvm')
3 files changed, 262 insertions, 25 deletions
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e1d94bf4056..6386f8c0482 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -95,7 +95,7 @@ described as 'basic' will be available.
Capability: basic
Architectures: all
Type: system ioctl
-Parameters: none
+Parameters: machine type identifier (KVM_VM_*)
Returns: a VM fd that can be used to control the new virtual machine.
The new VM has no virtual cpus and no memory. An mmap() of a VM fd
@@ -103,6 +103,11 @@ will access the virtual machine's physical address space; offset zero
corresponds to guest physical address zero. Use of mmap() on a VM fd
is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
+You most certainly want to use 0 as machine type.
+In order to create user controlled virtual machines on S390, check
+KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
+privileged user (CAP_SYS_ADMIN).
@@ -213,6 +218,11 @@ allocation of vcpu ids. For example, if userspace wants
single-threaded guest vcpus, it should make all vcpu ids be a multiple
of the number of vcpus per vcore.
+For virtual cpus that have been created with S390 user controlled virtual
+machines, the resulting vcpu fd can be memory mapped at page offset
+KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
+cpu's hardware control block.
4.8 KVM_GET_DIRTY_LOG (vm ioctl)
Capability: basic
@@ -1159,6 +1169,14 @@ following flags are specified:
/* Depends on KVM_CAP_IOMMU */
+/* The following two depend on KVM_CAP_PCI_2_3 */
+#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
+#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
+If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts
+via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other
+assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the
+guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.
The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
isolation of the device. Usages not specifying this flag are deprecated.
@@ -1399,6 +1417,71 @@ The following flags are defined:
If datamatch flag is set, the event will be signaled only if the written value
to the registered address is equal to datamatch in struct kvm_ioeventfd.
+Capability: KVM_CAP_SW_TLB
+Architectures: ppc
+Type: vcpu ioctl
+Parameters: struct kvm_dirty_tlb (in)
+Returns: 0 on success, -1 on error
+struct kvm_dirty_tlb {
+ __u64 bitmap;
+ __u32 num_dirty;
+This must be called whenever userspace has changed an entry in the shared
+TLB, prior to calling KVM_RUN on the associated vcpu.
+The "bitmap" field is the userspace address of an array. This array
+consists of a number of bits, equal to the total number of TLB entries as
+determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
+nearest multiple of 64.
+Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
+The array is little-endian: the bit 0 is the least significant bit of the
+first byte, bit 8 is the least significant bit of the second byte, etc.
+This avoids any complications with differing word sizes.
+The "num_dirty" field is a performance hint for KVM to determine whether it
+should skip processing the bitmap and just invalidate everything. It must
+be set to the number of set bits in the bitmap.
+Capability: KVM_CAP_PCI_2_3
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+Allows userspace to mask PCI INTx interrupts from the assigned device. The
+kernel will not deliver INTx interrupts to the guest between setting and
+clearing of KVM_ASSIGN_SET_INTX_MASK via this interface. This enables use of
+and emulation of PCI 2.3 INTx disable command register behavior.
+This may be used for both PCI 2.3 devices supporting INTx disable natively and
+older devices lacking this support. Userspace is responsible for emulating the
+read value of the INTx disable bit in the guest visible PCI command register.
+When modifying the INTx disable state, userspace should precede updating the
+physical device command register by calling this ioctl to inform the kernel of
+the new intended INTx mask state.
+Note that the kernel uses the device INTx disable bit to internally manage the
+device interrupt state for PCI 2.3 devices. Reads of this register may
+therefore not match the expected value. Writes should always use the guest
+intended INTx disable value rather than attempting to read-copy-update the
+current physical device state. Races between user and kernel updates to the
+INTx disable bit are handled lazily in the kernel. It's possible the device
+may generate unintended interrupts, but they will not be injected into the
+See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
+by assigned_dev_id. In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
@@ -1491,6 +1574,101 @@ following algorithm:
Some guests configure the LINT1 NMI input to cause a panic, aiding in
+4.65 KVM_S390_UCAS_MAP
+Capability: KVM_CAP_S390_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_ucas_mapping (in)
+Returns: 0 in case of success
+The parameter is defined like this:
+ struct kvm_s390_ucas_mapping {
+ __u64 user_addr;
+ __u64 vcpu_addr;
+ __u64 length;
+ };
+This ioctl maps the memory at "user_addr" with the length "length" to
+the vcpu's address space starting at "vcpu_addr". All parameters need to
+be alligned by 1 megabyte.
+Capability: KVM_CAP_S390_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_ucas_mapping (in)
+Returns: 0 in case of success
+The parameter is defined like this:
+ struct kvm_s390_ucas_mapping {
+ __u64 user_addr;
+ __u64 vcpu_addr;
+ __u64 length;
+ };
+This ioctl unmaps the memory in the vcpu's address space starting at
+"vcpu_addr" with the length "length". The field "user_addr" is ignored.
+All parameters need to be alligned by 1 megabyte.
+Capability: KVM_CAP_S390_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: vcpu absolute address (in)
+Returns: 0 in case of success
+This call creates a page table entry on the virtual cpu's address space
+(for user controlled virtual machines) or the virtual machine's address
+space (for regular virtual machines). This only works for minor faults,
+thus it's recommended to access subject memory page via the user page
+table upfront. This is useful to handle validity intercepts for user
+controlled virtual machines to fault in the virtual cpu's lowcore pages
+prior to calling the KVM_RUN ioctl.
+Capability: KVM_CAP_ONE_REG
+Architectures: all
+Type: vcpu ioctl
+Parameters: struct kvm_one_reg (in)
+Returns: 0 on success, negative value on failure
+struct kvm_one_reg {
+ __u64 id;
+ __u64 addr;
+Using this ioctl, a single vcpu register can be set to a specific value
+defined by user space with the passed in struct kvm_one_reg, where id
+refers to the register identifier as described below and addr is a pointer
+to a variable with the respective size. There can be architecture agnostic
+and architecture specific registers. Each have their own range of operation
+and their own constants and width. To keep track of the implemented
+registers, find a list below:
+ Arch | Register | Width (bits)
+ | |
+Capability: KVM_CAP_ONE_REG
+Architectures: all
+Type: vcpu ioctl
+Parameters: struct kvm_one_reg (in and out)
+Returns: 0 on success, negative value on failure
+This ioctl allows to receive the value of a single register implemented
+in a vcpu. The register to read is indicated by the "id" field of the
+kvm_one_reg struct passed in. On success, the register value can be found
+at the memory location pointed to by "addr".
+The list of registers accessible using this interface is identical to the
+list in 4.64.
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
@@ -1651,6 +1829,20 @@ s390 specific.
s390 specific.
+ struct {
+ __u64 trans_exc_code;
+ __u32 pgm_code;
+ } s390_ucontrol;
+s390 specific. A page fault has occurred for a user controlled virtual
+machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
+resolved by the kernel.
+The program code and the translation exception code that were placed
+in the cpu's lowcore are presented here as defined by the z Architecture
+Principles of Operation Book in the Chapter for Dynamic Address Translation
struct {
__u32 dcrn;
@@ -1693,6 +1885,29 @@ developer registration required to access it).
/* Fix the size of the union. */
char padding[256];
+ /*
+ * shared registers between kvm and userspace.
+ * kvm_valid_regs specifies the register classes set by the host
+ * kvm_dirty_regs specified the register classes dirtied by userspace
+ * struct kvm_sync_regs is architecture specific, as well as the
+ * bits for kvm_valid_regs and kvm_dirty_regs
+ */
+ __u64 kvm_valid_regs;
+ __u64 kvm_dirty_regs;
+ union {
+ struct kvm_sync_regs regs;
+ char padding[1024];
+ } s;
+If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
+certain guest registers without having to call SET/GET_*REGS. Thus we can
+avoid some system call overhead if userspace has to handle the exit.
+Userspace can query the validity of the structure by checking
+kvm_valid_regs for specific bits. These bits are architecture specific
+and usually define the validity of a groups of registers. (e.g. one bit
+ for general purpose registers)
6. Capabilities that can be enabled
@@ -1741,3 +1956,45 @@ HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
HTAB invisible to the guest.
When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
+Architectures: ppc
+Parameters: args[0] is the address of a struct kvm_config_tlb
+Returns: 0 on success; -1 on error
+struct kvm_config_tlb {
+ __u64 params;
+ __u64 array;
+ __u32 mmu_type;
+ __u32 array_len;
+Configures the virtual CPU's TLB array, establishing a shared memory area
+between userspace and KVM. The "params" and "array" fields are userspace
+addresses of mmu-type-specific data structures. The "array_len" field is an
+safety mechanism, and should be set to the size in bytes of the memory that
+userspace has reserved for the array. It must be at least the size dictated
+by "mmu_type" and "params".
+While KVM_RUN is active, the shared region is under control of KVM. Its
+contents are undefined, and any modification by userspace results in
+boundedly undefined behavior.
+On return from KVM_RUN, the shared region will reflect the current state of
+the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
+to tell KVM which entries have been changed, prior to calling KVM_RUN again
+on this vcpu.
+ - The "params" field is of type "struct kvm_book3e_206_tlb_params".
+ - The "array" field points to an array of type "struct
+ kvm_book3e_206_tlb_entry".
+ - The array consists of all entries in the first TLB, followed by all
+ entries in the second TLB.
+ - Within a TLB, entries are ordered first by increasing set number. Within a
+ set, entries are ordered by way (increasing ESEL).
+ - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
+ where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
+ - The tsize field of mas1 shall be set to 4K on TLB0, even though the
+ hardware ignores this value for TLB0.
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 5dc972c09b5..fa5f1dbc6b2 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -347,7 +347,7 @@ To instantiate a large spte, four constraints must be satisfied:
- the spte must point to a large host page
- the guest pte must be a large pte of at least equivalent size (if tdp is
- enabled, there is no guest pte and this condition is satisified)
+ enabled, there is no guest pte and this condition is satisfied)
- if the spte will be writeable, the large page frame may not overlap any
write-protected pages
- the guest page must be wholly contained by a single memory slot
@@ -356,7 +356,7 @@ To check the last two conditions, the mmu maintains a ->write_count set of
arrays for each memory slot and large page size. Every write protected page
causes its write_count to be incremented, thus preventing instantiation of
a large spte. The frames at the end of an unaligned memory slot have
-artificically inflated ->write_counts so they can never be instantiated.
+artificially inflated ->write_counts so they can never be instantiated.
Further reading
diff --git a/Documentation/virtual/kvm/ppc-pv.txt b/Documentation/virtual/kvm/ppc-pv.txt
index 2b7ce190cde..6e7c3705093 100644
--- a/Documentation/virtual/kvm/ppc-pv.txt
+++ b/Documentation/virtual/kvm/ppc-pv.txt
@@ -81,28 +81,8 @@ additional registers to the magic page. If you add fields to the magic page,
also define a new hypercall feature to indicate that the host can give you more
registers. Only if the host supports the additional features, make use of them.
-The magic page has the following layout as described in
-struct kvm_vcpu_arch_shared {
- __u64 scratch1;
- __u64 scratch2;
- __u64 scratch3;
- __u64 critical; /* Guest may not get interrupts if == r1 */
- __u64 sprg0;
- __u64 sprg1;
- __u64 sprg2;
- __u64 sprg3;
- __u64 srr0;
- __u64 srr1;
- __u64 dar;
- __u64 msr;
- __u32 dsisr;
- __u32 int_pending; /* Tells the guest if we have an interrupt */
-Additions to the page must only occur at the end. Struct fields are always 32
-or 64 bit aligned, depending on them being 32 or 64 bit wide respectively.
+The magic page layout is described by struct kvm_vcpu_arch_shared
+in arch/powerpc/include/asm/kvm_para.h.
Magic page features