summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAgeFilesLines
* KVM: MMU: Fix is_empty_shadow_page() checkAvi Kivity2008-06-061-1/+1
| | | | | | The check is only looking at one of two possible empty ptes. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix printk() format stringAvi Kivity2008-06-061-1/+1
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: reschedule during shadow teardownAvi Kivity2008-06-061-0/+1
| | | | | | | Shadows for large guests can take a long time to tear down, so reschedule occasionally to avoid softlockup warnings. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Clear CR4.VMXE in hardware_disableEli Collins2008-06-061-0/+1
| | | | | | | | | | | | | Clear CR4.VMXE in hardware_disable. There's no reason to leave it set after doing a VMXOFF. VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is in VMX mode, so leaving VMXE set means we'll refuse to power on. With this change the user can power on after unloading the kvm-intel module. I tested on kvm-67 and kvm-69. Signed-off-by: Eli Collins <ecollins@vmware.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: migrate PIT timerMarcelo Tosatti2008-06-066-4/+24
| | | | | | | | | Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on, similarly to what is done for the LAPIC timers, otherwise PIT interrupts will be delayed until an unrelated event causes an exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix hypercall return value on AMDAvi Kivity2008-06-061-1/+2
| | | | | | | | | | | | | | | | | | The hypercall instructions on Intel and AMD are different. KVM allows the guest to choose one or the other (the default is Intel), and if the guest chooses incorrectly, KVM will patch it at runtime to select the correct instruction. This allows live migration between Intel and AMD machines. This patching occurs in the x86 emulator. The current code also executes the hypercall. Unfortunately, the tail end of the x86 emulator code also executes, overwriting the return value of the hypercall with the original contents of rax (which happens to be the hypercall number). Fix not by executing the hypercall in the emulator context; instead let the guest reissue the patched instruction and execute the hypercall via the normal path. Signed-off-by: Avi Kivity <avi@qumranet.com>
* namespacecheck: automated fixesIngo Molnar2008-05-231-1/+1
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* KVM: LAPIC: ignore pending timers if LVTT is disabledMarcelo Tosatti2008-05-181-1/+1
| | | | | | | | | | | Only use the APIC pending timers count to break out of HLT emulation if the timer vector is enabled. Certain configurations of Windows simply mask out the vector without disabling the timer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: PIT: take inject_pending into account when emulating hltMarcelo Tosatti2008-05-181-1/+1
| | | | | | | Otherwise hlt emulation fails if PIT is not injecting IRQ's. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix writes to registers with modrm encodingsAvi Kivity2008-05-181-2/+5
| | | | | | | | | A register destination encoded with a mod=3 encoding left dst.ptr NULL. Normally we don't trap writes to registers, but in the case of smsw, we do. Fix by pointing dst.ptr at the destination register. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Allow more than PAGES_PER_HPAGE write protections per large pageAvi Kivity2008-05-041-1/+0
| | | | | | | | | nonpae guests can call rmap_write_protect twice per page (for page tables) or four times per page (for page directories), triggering a bogus warning. Remove the warning. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: avoid fx_init() schedule in atomicAndrea Arcangeli2008-05-041-1/+10
| | | | | | | | | | This make sure not to schedule in atomic during fx_init. I also changed the name of fpu_init to fx_finit to avoid duplicating the name with fpu_init that is already used in the kernel, this makes grep simpler if nothing else. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Avoid spurious execeptions after setting registersJan Kiszka2008-05-041-0/+2
| | | | | | | | | Clear pending exceptions when setting new register values. This avoids spurious exceptions after restoring a vcpu state or after reset-on-triple-fault. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: PIT: support mode 4Marcelo Tosatti2008-05-041-0/+2
| | | | | | | | | | | | | | | The in-kernel PIT emulation ignores pending timers if operating under mode 4, which for example DragonFlyBSD uses (and Plan9 too, apparently). Mode 4 seems to be similar to one-shot mode, other than the fact that it starts counting after the next CLK pulse once programmed, while mode 1 starts counting immediately, so add a FIXME to enhance precision. Fixes sourceforge bug 1952988. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: disable writeback on lmswAvi Kivity2008-05-041-0/+1
| | | | | | | | | | The recent changes allowing memory operands with lmsw and smsw left lmsw with writeback enabled. Since lmsw has no oridinary destination operand, the dst pointer was not initialized, resulting in an oops. Close the hole by disabling writeback for lmsw. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86: task switch: fix wrong bit setting for the busy flagIzik Eidus2008-05-041-2/+2
| | | | | | | The busy bit is bit 1 of the type field, not bit 8. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Enable EPT feature for KVMSheng Yang2008-05-043-10/+233
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Prepare an identity page table for EPT in real modeSheng Yang2008-05-043-3/+81
| | | | | | | | [aliguory: plug leak] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove #ifdef CONFIG_X86_64 to support 4 level EPTSheng Yang2008-05-041-4/+0
| | | | | | | | Currently EPT level is 4 for both pae and x86_64. The patch remove the #ifdef for alloc root_hpa and free root_hpa to support EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Add EPT supportSheng Yang2008-05-042-10/+36
| | | | | | | Enable kvm_set_spte() to generate EPT entries. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add kvm_x86_ops get_tdp_level()Sheng Yang2008-05-045-8/+19
| | | | | | | | The function get_tdp_level() provided the number of tdp level for EPT and NPT rather than the NPT specific macro. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move some definitions to a header fileSheng Yang2008-05-042-34/+33
| | | | | | | | Move some definitions to mmu.h in order to allow building common table entries between EPT and non-EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: EPT Feature DetectionSheng Yang2008-05-042-5/+83
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* rename div64_64 to div64_u64Roman Zippel2008-05-012-6/+6
| | | | | | | | | | | | | | | | | | | | | Rename div64_64 to div64_u64 to make it consistent with the other divide functions, so it clearly includes the type of the divide. Move its definition to math64.h as currently no architecture overrides the generic implementation. They can still override it of course, but the duplicated declarations are avoided. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Avi Kivity <avi@qumranet.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: MMU: kvm_pv_mmu_op should not take mmap_semMarcelo Tosatti2008-04-271-3/+0
| | | | | | | | | | | kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down in the MMU processing will take it if necessary, so as it is it can deadlock. Apparently a leftover from the days before slots_lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: remove selective CR0 commentJoerg Roedel2008-04-271-11/+0
| | | | | | | | | | | There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: remove now obsolete FIXME commentJoerg Roedel2008-04-271-7/+0
| | | | | | | With the usage of the V_TPR field this comment is now obsolete. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: disable CR8 intercept when tpr is not masking interruptsJoerg Roedel2008-04-271-4/+27
| | | | | | | | | This patch disables the intercept of CR8 writes if the TPR is not masking interrupts. This reduces the total number CR8 intercepts to below 1 percent of what we have without this patch using Windows 64 bit guests. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabledJoerg Roedel2008-04-271-0/+12
| | | | | | | | If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be synced with the TPR field in the local apic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: export kvm_lapic_set_tpr() to modulesJoerg Roedel2008-04-271-0/+1
| | | | | | | | This patch exports the kvm_lapic_set_tpr() function from the lapic code to modules. It is required in the kvm-amd module to optimize CR8 intercepts. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: sync TPR value to V_TPR field in the VMCBJoerg Roedel2008-04-271-2/+16
| | | | | | | | This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB. With this change we can safely remove the CR8 read intercept. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix lea to really get the effective addressAvi Kivity2008-04-271-1/+1
| | | | | | We never hit this, since there is currently no reason to emulate lea. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix smsw and lmsw with a memory operandAvi Kivity2008-04-271-12/+17
| | | | | | | | lmsw and smsw were implemented only with a register operand. Extend them to support a memory operand as well. Fixes Windows running some display compatibility test on AMD hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: initialize src.val and dst.val for register operandsAvi Kivity2008-04-271-0/+2
| | | | | | This lets us treat the case where mod == 3 in the same manner as other cases. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: force a new asid when initializing the vmcbAvi Kivity2008-04-271-1/+1
| | | | | | | Shutdown interception clears the vmcb, leaving the asid at zero (which is illegal. so force a new asid on vmcb initialization. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: fix kvm_vcpu_kick vs __vcpu_run raceMarcelo Tosatti2008-04-271-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is a window open between testing of pending IRQ's and assignment of guest_mode in __vcpu_run. Injection of IRQ's can race with __vcpu_run as follows: CPU0 CPU1 kvm_x86_ops->run() vcpu->guest_mode = 0 SET_IRQ_LINE ioctl .. kvm_x86_ops->inject_pending_irq kvm_cpu_has_interrupt() apic_test_and_set_irr() kvm_vcpu_kick if (vcpu->guest_mode) send_ipi() vcpu->guest_mode = 1 So move guest_mode=1 assignment before ->inject_pending_irq, and make sure that it won't reorder after it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add ioctls to save/store mpstateMarcelo Tosatti2008-04-271-0/+19
| | | | | | | | | | | | | So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*Avi Kivity2008-04-273-18/+18
| | | | | | We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: hlt emulation should take in-kernel APIC/PIT timers into accountMarcelo Tosatti2008-04-274-0/+38
| | | | | | | | | | | | | | | | | | Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: do not intercept task switch with NPTJoerg Roedel2008-04-271-0/+1
| | | | | | | | When KVM uses NPT there is no reason to intercept task switches. This patch removes the intercept for it in that case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add kvm trace userspace interfaceFeng(Eric) Liu2008-04-272-0/+14
| | | | | | | | This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add trace markersFeng (Eric) Liu2008-04-272-1/+60
| | | | | | | | Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: add intercept for machine check exceptionJoerg Roedel2008-04-271-1/+16
| | | | | | | | | To properly forward a MCE occured while the guest is running to the host, we have to intercept this exception and call the host handler by hand. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: align shadow CR4.MCE with hostJoerg Roedel2008-04-271-0/+3
| | | | | | | | This patch aligns the host version of the CR4.MCE bit with the CR4 active in the guest. This is necessary to get MCE exceptions when the guest is running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: indent svm_set_cr4 with tabs instead of spacesJoerg Roedel2008-04-271-4/+4
| | | | | | | | The svm_set_cr4 function is indented with spaces. This patch replaces them with tabs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Don't assume struct page for x86Anthony Liguori2008-04-272-59/+56
| | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: prepopulate guest pages after write-protectingMarcelo Tosatti2008-04-271-1/+1
| | | | | | | | | | | | | | | | Zdenek reported a bug where a looping "dmsetup status" eventually hangs on SMP guests. The problem is that kvm_mmu_get_page() prepopulates the shadow MMU before write protecting the guest page tables. By doing so, it leaves a window open where the guest can mark a pte as present while the host has shadow cached such pte as "notrap". Accesses to such address will fault in the guest without the host having a chance to fix the situation. Fix by moving the write protection before the pte prefetch. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Only mark_page_accessed() if the page was accessed by the guestAvi Kivity2008-04-271-1/+2
| | | | | | | | | | If the accessed bit is not set, the guest has never accessed this page (at least through this spte), so there's no need to mark the page accessed. This provides more accurate data for the eviction algortithm. Noted by Andrea Arcangeli. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Free apic access page on vm destructionAvi Kivity2008-04-271-0/+2
| | | | | | Noticed by Marcelo Tosatti. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: allow the vm to shrink the kvm mmu shadow cachesIzik Eidus2008-04-271-2/+56
| | | | | | | Allow the Linux memory manager to reclaim memory in the kvm shadow cache. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>