diff options
author | Samuel Thibault <samuel.thibault@ens-lyon.org> | 2009-12-16 01:11:51 +0100 |
---|---|---|
committer | Samuel Thibault <samuel.thibault@ens-lyon.org> | 2009-12-16 01:11:51 +0100 |
commit | 55dbc2b5d857d35262ad3116803dfb31b733d031 (patch) | |
tree | 9b9a7d7952ff74855de2a6c348e9c856a531158d | |
parent | 870925205c78415dc4c594bfae9de8eb31745b81 (diff) |
Add Xen support
2009-03-11 Samuel Thibault <samuel.thibault@ens-lyon.org>
* i386/i386/vm_param.h (VM_MIN_KERNEL_ADDRESS) [MACH_XEN]: Set to
0x20000000.
* i386/i386/i386asm.sym (pfn_list) [VM_MIN_KERNEL_ADDRESS ==
LINEAR_MIN_KERNEL_ADDRESS]: Define to constant PFN_LIST.
2009-02-27 Samuel Thibault <samuel.thibault@ens-lyon.org>
* i386/intel/pmap.c [MACH_HYP] (INVALIDATE_TLB): Call hyp_invlpg
instead of flush_tlb when e - s is compile-time known to be
PAGE_SIZE.
2008-11-27 Samuel Thibault <samuel.thibault@ens-lyon.org>
* i386/configfrag.ac (enable_pae): Enable by default on the Xen
platform.
2007-11-14 Samuel Thibault <samuel.thibault@ens-lyon.org>
* i386/i386at/model_dep.c (machine_relax): New function.
(c_boot_entry): Refuse to boot as dom0.
2007-10-17 Samuel Thibault <samuel.thibault@ens-lyon.org>
* i386/i386/fpu.c [MACH_XEN]: Disable unused fpintr().
2007-08-12 Samuel Thibault <samuel.thibault@ens-lyon.org>
* Makefile.am (clib_routines): Add _START.
* i386/xen/xen_boothdr: Use _START for VIRT_BASE and PADDR_OFFSET. Add
GUEST_VERSION and XEN_ELFNOTE_FEATURES.
2007-06-13 Samuel Thibault <samuel.thibault@ens-lyon.org>
* i386/i386/user_ldt.h (user_ldt) [MACH_XEN]: Add alloc field.
* i386/i386/user_ldt.c (i386_set_ldt) [MACH_XEN]: Round allocation of
LDT to a page, set back LDT pages read/write before freeing them.
(user_ldt_free) [MACH_XEN]: Likewise.
2007-04-18 Samuel Thibault <samuel.thibault@ens-lyon.org>
* device/ds_routines.c [MACH_HYP]: Add hypervisor block and net devices.
2007-02-19 Thomas Schwinge <tschwinge@gnu.org>
* i386/xen/Makefrag.am [PLATFORM_xen] (gnumach_LINKFLAGS): Define.
* Makefrag.am: Include `xen/Makefrag.am'.
* configure.ac: Include `xen/configfrag.ac'.
(--enable-platform): Support the `xen' platform.
* i386/configfrag.ac: Likewise.
* i386/Makefrag.am [PLATFORM_xen]: Include `i386/xen/Makefrag.am'.
2007-02-19 Samuel Thibault <samuel.thibault@ens-lyon.org>
Thomas Schwinge <tschwinge@gnu.org>
* i386/xen/Makefrag.am: New file.
* xen/Makefrag.am: Likewise.
* xen/configfrag.ac: Likewise.
2007-02-11 (and later dates) Samuel Thibault <samuel.thibault@ens-lyon.org>
Xen support
* Makefile.am (clib_routines): Add _start.
* Makefrag.am (include_mach_HEADERS): Add include/mach/xen.h.
* device/cons.c (cnputc): Call hyp_console_write.
* i386/Makefrag.am (libkernel_a_SOURCES): Move non-Xen source to
[PLATFORM_at].
* i386/i386/debug_trace.S: Include <i386/xen.h>
* i386/i386/fpu.c [MACH_HYP] (init_fpu): Call set_ts() and clear_ts(),
do not enable CR0_EM.
* i386/i386/gdt.c: Include <mach/xen.h> and <intel/pmap.h>.
[MACH_XEN]: Make gdt array extern.
[MACH_XEN] (gdt_init): Register gdt with hypervisor. Request 4gb
segments assist. Shift la_shift.
[MACH_PSEUDO_PHYS] (gdt_init): Shift pfn_list.
* i386/i386/gdt.h [MACH_XEN]: Don't define KERNEL_LDT and LINEAR_DS.
* i386/i386/i386asm.sym: Include <i386/xen.h>.
[MACH_XEN]: Remove KERNEL_LDT, Add shared_info's CPU_CLI, CPU_PENDING,
CPU_PENDING_SEL, PENDING, EVTMASK and CR2.
* i386/i386/idt.c [MACH_HYP] (idt_init): Register trap table with
hypervisor.
* i386/i386/idt_inittab.S: Include <i386/i386asm.h>.
[MACH_XEN]: Set IDT_ENTRY() for hypervisor. Set trap table terminator.
* i386/i386/ktss.c [MACH_XEN] (ktss_init): Request exception task switch
from hypervisor.
* i386/i386/ldt.c: Include <mach/xen.h> and <intel/pmap.h>
[MACH_XEN]: Make ldt array extern.
[MACH_XEN] (ldt_init): Set ldt readwrite.
[MACH_HYP] (ldt_init): Register ldt with hypervisor.
* i386/i386/locore.S: Include <i386/xen.h>. Handle KERNEL_RING == 1
case.
[MACH_XEN]: Read hyp_shared_info's CR2 instead of %cr2.
[MACH_PSEUDO_PHYS]: Add mfn_to_pfn computation.
[MACH_HYP]: Drop Cyrix I/O-based detection. Read cr3 instead of %cr3.
Make hypervisor call for pte invalidation.
* i386/i386/mp_desc.c: Include <mach/xen.h>.
[MACH_HYP] (mp_desc_init): Panic.
* i386/i386/pcb.c: Include <mach/xen.h>.
[MACH_XEN] (switch_ktss): Request stack switch from hypervisor.
[MACH_HYP] (switch_ktss): Request ldt and gdt switch from hypervisor.
* i386/i386/phys.c: Include <mach/xen.h>
[MACH_PSEUDO_PHYS] (kvtophys): Do page translation.
* i386/i386/proc_reg.h [MACH_HYP] (cr3): New declaration.
(set_cr3, get_cr3, set_ts, clear_ts): Implement macros.
* i386/i386/seg.h [MACH_HYP]: Define KERNEL_RING macro. Include
<mach/xen.h>
[MACH_XEN] (fill_descriptor): Register descriptor with hypervisor.
* i386/i386/spl.S: Include <i386/xen.h> and <i386/i386/asm.h>
[MACH_XEN] (pic_mask): #define to int_mask.
[MACH_XEN] (SETMASK): Implement.
* i386/i386/vm_param.h [MACH_XEN] (HYP_VIRT_START): New macro.
[MACH_XEN]: Set VM_MAX_KERNEL_ADDRESS to HYP_VIRT_START-
LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS. Increase
KERNEL_STACK_SIZE and INTSTACK_SIZE to 4 pages.
* i386/i386at/conf.c [MACH_HYP]: Remove hardware devices, add hypervisor
console device.
* i386/i386at/cons_conf.c [MACH_HYP]: Add hypervisor console device.
* i386/i386at/model_dep.c: Include <sys/types.h>, <mach/xen.h>.
[MACH_XEN] Include <xen/console.h>, <xen/store.h>, <xen/evt.h>,
<xen/xen.h>.
[MACH_PSEUDO_PHYS]: New boot_info, mfn_list, pfn_list variables.
[MACH_XEN]: New la_shift variable.
[MACH_HYP] (avail_next, mem_size_init): Drop BIOS skipping mecanism.
[MACH_HYP] (machine_init): Call hyp_init(), drop hardware
initialization.
[MACH_HYP] (machine_idle): Call hyp_idle().
[MACH_HYP] (halt_cpu): Call hyp_halt().
[MACH_HYP] (halt_all_cpus): Call hyp_reboot() or hyp_halt().
[MACH_HYP] (i386at_init): Initialize with hypervisor.
[MACH_XEN] (c_boot_entry): Add Xen-specific initialization.
[MACH_HYP] (init_alloc_aligned, pmap_valid_page): Drop zones skipping
mecanism.
* i386/intel/pmap.c: Include <mach/xen.h>.
[MACH_PSEUDO_PHYS] (WRITE_PTE): Do page translation.
[MACH_HYP] (INVALIDATE_TLB): Request invalidation from hypervisor.
[MACH_XEN] (pmap_map_bd, pmap_create, pmap_destroy, pmap_remove_range)
(pmap_page_protect, pmap_protect, pmap_enter, pmap_change_wiring)
(pmap_attribute_clear, pmap_unmap_page_zero, pmap_collect): Request MMU
update from hypervisor.
[MACH_XEN] (pmap_bootstrap): Request pagetable initialization from
hypervisor.
[MACH_XEN] (pmap_set_page_readwrite, pmap_set_page_readonly)
(pmap_set_page_readonly_init, pmap_clear_bootstrap_pagetable)
(pmap_map_mfn): New functions.
* i386/intel/pmap.h [MACH_XEN] (INTEL_PTE_GLOBAL): Disable global page
support.
[MACH_PSEUDO_PHYS] (pte_to_pa): Do page translation.
[MACH_XEN] (pmap_set_page_readwrite, pmap_set_page_readonly)
(pmap_set_page_readonly_init, pmap_clear_bootstrap_pagetable)
(pmap_map_mfn): Declare functions.
* i386/i386/xen.h: New file.
* i386/xen/xen.c: New file.
* i386/xen/xen_boothdr.S: New file.
* i386/xen/xen_locore.S: New file.
* include/mach/xen.h: New file.
* kern/bootstrap.c [MACH_XEN] (boot_info): Declare variable.
[MACH_XEN] (bootstrap_create): Rebase multiboot header.
* kern/debug.c: Include <mach/xen.h>.
[MACH_HYP] (panic): Call hyp_crash() without delay.
* linux/dev/include/asm-i386/segment.h [MACH_HYP] (KERNEL_CS)
(KERNEL_DS): Use ring 1.
* xen/block.c: New file.
* xen/block.h: Likewise.
* xen/console.c: Likewise.
* xen/console.h: Likewise.
* xen/evt.c: Likewise.
* xen/evt.h: Likewise.
* xen/grant.c: Likewise.
* xen/grant.h: Likewise.
* xen/net.c: Likewise.
* xen/net.h: Likewise.
* xen/ring.c: Likewise.
* xen/ring.h: Likewise.
* xen/store.c: Likewise.
* xen/store.h: Likewise.
* xen/time.c: Likewise.
* xen/time.h: Likewise.
* xen/xen.c: Likewise.
* xen/xen.h: Likewise.
* xen/public/COPYING: Import file from Xen.
* xen/public/callback.h: Likewise.
* xen/public/dom0_ops.h: Likewise.
* xen/public/domctl.h: Likewise.
* xen/public/elfnote.h: Likewise.
* xen/public/elfstructs.h: Likewise.
* xen/public/event_channel.h: Likewise.
* xen/public/features.h: Likewise.
* xen/public/grant_table.h: Likewise.
* xen/public/kexec.h: Likewise.
* xen/public/libelf.h: Likewise.
* xen/public/memory.h: Likewise.
* xen/public/nmi.h: Likewise.
* xen/public/physdev.h: Likewise.
* xen/public/platform.h: Likewise.
* xen/public/sched.h: Likewise.
* xen/public/sysctl.h: Likewise.
* xen/public/trace.h: Likewise.
* xen/public/vcpu.h: Likewise.
* xen/public/version.h: Likewise.
* xen/public/xen-compat.h: Likewise.
* xen/public/xen.h: Likewise.
* xen/public/xencomm.h: Likewise.
* xen/public/xenoprof.h: Likewise.
* xen/public/arch-x86/xen-mca.h: Likewise.
* xen/public/arch-x86/xen-x86_32.h: Likewise.
* xen/public/arch-x86/xen-x86_64.h: Likewise.
* xen/public/arch-x86/xen.h: Likewise.
* xen/public/arch-x86_32.h: Likewise.
* xen/public/arch-x86_64.h: Likewise.
* xen/public/io/blkif.h: Likewise.
* xen/public/io/console.h: Likewise.
* xen/public/io/fbif.h: Likewise.
* xen/public/io/fsif.h: Likewise.
* xen/public/io/kbdif.h: Likewise.
* xen/public/io/netif.h: Likewise.
* xen/public/io/pciif.h: Likewise.
* xen/public/io/protocols.h: Likewise.
* xen/public/io/ring.h: Likewise.
* xen/public/io/tpmif.h: Likewise.
* xen/public/io/xenbus.h: Likewise.
* xen/public/io/xs_wire.h: Likewise.
104 files changed, 12952 insertions, 55 deletions
diff --git a/Makefile.am b/Makefile.am index 25fd403..2907518 100644 --- a/Makefile.am +++ b/Makefile.am @@ -137,7 +137,7 @@ clib_routines := memcmp memcpy memmove memset bcopy bzero \ strchr strstr strsep strpbrk strtok \ htonl htons ntohl ntohs \ udivdi3 __udivdi3 \ - etext _edata end _end # actually ld magic, not libc. + _START _start etext _edata end _end # actually ld magic, not libc. gnumach-undef: gnumach.$(OBJEXT) $(NM) -u $< | sed 's/ *U *//' | sort -u > $@ MOSTLYCLEANFILES += gnumach-undef diff --git a/Makefrag.am b/Makefrag.am index 05aaeb4..0377e80 100644 --- a/Makefrag.am +++ b/Makefrag.am @@ -375,7 +375,8 @@ include_mach_HEADERS = \ include/mach/vm_param.h \ include/mach/vm_prot.h \ include/mach/vm_statistics.h \ - include/mach/inline.h + include/mach/inline.h \ + include/mach/xen.h # If we name this `*_execdir', Automake won't add it to `install-data'... include_mach_eXecdir = $(includedir)/mach/exec @@ -506,6 +507,11 @@ include linux/Makefrag.am # # Platform specific parts. # + +# Xen. +if PLATFORM_xen +include xen/Makefrag.am +endif # # Architecture specific parts. diff --git a/configure.ac b/configure.ac index 4e37590..fca0dd3 100644 --- a/configure.ac +++ b/configure.ac @@ -36,19 +36,22 @@ dnl We require GNU make. # # Deduce the architecture we're building for. # +# TODO: Should we also support constructs like `i686_xen-pc-gnu' or +# `i686-pc_xen-gnu'? AC_CANONICAL_HOST AC_ARG_ENABLE([platform], AS_HELP_STRING([--enable-platform=PLATFORM], [specify the platform to build a - kernel for. Defaults to `at' for `i?86'. No other possibilities.]), + kernel for. Defaults to `at' for `i?86'. The other possibility is + `xen'.]), [host_platform=$enable_platform], [host_platform=default]) [# Supported configurations. case $host_platform:$host_cpu in default:i?86) host_platform=at;; - at:i?86) + at:i?86 | xen:i?86) :;; *)] AC_MSG_ERROR([unsupported combination of cpu type `$host_cpu' and platform @@ -124,6 +127,9 @@ esac] # PC AT. # TODO. Currently handled in `i386/configfrag.ac'. +# Xen. +m4_include([xen/configfrag.ac]) + # Machine-specific configuration. # ix86. diff --git a/device/cons.c b/device/cons.c index fb96d69..e3e95ff 100644 --- a/device/cons.c +++ b/device/cons.c @@ -260,6 +260,15 @@ cnputc(c) kmsg_putchar (c); #endif +#if defined(MACH_HYP) && 0 + { + /* Also output on hypervisor's emergency console, for + * debugging */ + unsigned char d = c; + hyp_console_write(&d, 1); + } +#endif /* MACH_HYP */ + if (cn_tab) { (*cn_tab->cn_putc)(cn_tab->cn_dev, c); if (c == '\n') diff --git a/device/ds_routines.c b/device/ds_routines.c index 5b8fb3e..5b57338 100644 --- a/device/ds_routines.c +++ b/device/ds_routines.c @@ -104,6 +104,10 @@ extern struct device_emulation_ops linux_pcmcia_emulation_ops; #endif #endif #endif +#ifdef MACH_HYP +extern struct device_emulation_ops hyp_block_emulation_ops; +extern struct device_emulation_ops hyp_net_emulation_ops; +#endif extern struct device_emulation_ops mach_device_emulation_ops; /* List of emulations. */ @@ -118,6 +122,10 @@ static struct device_emulation_ops *emulation_list[] = #endif #endif #endif +#ifdef MACH_HYP + &hyp_block_emulation_ops, + &hyp_net_emulation_ops, +#endif &mach_device_emulation_ops, }; diff --git a/doc/mach.texi b/doc/mach.texi index bb451e5..858880a 100644 --- a/doc/mach.texi +++ b/doc/mach.texi @@ -527,8 +527,8 @@ unpageable memory footprint of the kernel. @xref{Kernel Debugger}. @table @code @item --enable-pae @acronym{PAE, Physical Address Extension} feature (@samp{ix86}-only), -which is available on modern @samp{ix86} processors; disabled by -default. +which is available on modern @samp{ix86} processors; on @samp{ix86-at} disabled +by default, on @samp{ix86-xen} enabled by default. @end table @subsection Turning device drivers on or off diff --git a/i386/Makefrag.am b/i386/Makefrag.am index bad0ce9..876761c 100644 --- a/i386/Makefrag.am +++ b/i386/Makefrag.am @@ -19,33 +19,37 @@ libkernel_a_SOURCES += \ i386/i386at/autoconf.c \ + i386/i386at/conf.c \ + i386/i386at/cons_conf.c \ + i386/i386at/idt.h \ + i386/i386at/kd_event.c \ + i386/i386at/kd_event.h \ + i386/i386at/kd_queue.c \ + i386/i386at/kd_queue.h \ + i386/i386at/model_dep.c \ + i386/include/mach/sa/stdarg.h + +if PLATFORM_at +libkernel_a_SOURCES += \ i386/i386at/boothdr.S \ i386/i386at/com.c \ i386/i386at/comreg.h \ - i386/i386at/conf.c \ - i386/i386at/cons_conf.c \ i386/i386at/cram.h \ i386/i386at/disk.h \ i386/i386at/i8250.h \ - i386/i386at/idt.h \ i386/i386at/immc.c \ i386/i386at/int_init.c \ i386/i386at/interrupt.S \ i386/i386at/kd.c \ i386/i386at/kd.h \ - i386/i386at/kd_event.c \ - i386/i386at/kd_event.h \ i386/i386at/kd_mouse.c \ i386/i386at/kd_mouse.h \ - i386/i386at/kd_queue.c \ - i386/i386at/kd_queue.h \ i386/i386at/kdasm.S \ i386/i386at/kdsoft.h \ - i386/i386at/model_dep.c \ i386/i386at/pic_isa.c \ i386/i386at/rtc.c \ - i386/i386at/rtc.h \ - i386/include/mach/sa/stdarg.h + i386/i386at/rtc.h +endif # # `lpr' device support. @@ -80,11 +84,9 @@ libkernel_a_SOURCES += \ i386/i386/fpu.h \ i386/i386/gdt.c \ i386/i386/gdt.h \ - i386/i386/hardclock.c \ i386/i386/idt-gen.h \ i386/i386/idt.c \ i386/i386/idt_inittab.S \ - i386/i386/io_map.c \ i386/i386/io_perm.c \ i386/i386/io_perm.h \ i386/i386/ipl.h \ @@ -107,11 +109,7 @@ libkernel_a_SOURCES += \ i386/i386/pcb.c \ i386/i386/pcb.h \ i386/i386/phys.c \ - i386/i386/pic.c \ - i386/i386/pic.h \ i386/i386/pio.h \ - i386/i386/pit.c \ - i386/i386/pit.h \ i386/i386/pmap.h \ i386/i386/proc_reg.h \ i386/i386/sched_param.h \ @@ -139,6 +137,15 @@ libkernel_a_SOURCES += \ EXTRA_DIST += \ i386/i386/mach_i386.srv +if PLATFORM_at +libkernel_a_SOURCES += \ + i386/i386/hardclock.c \ + i386/i386/io_map.c \ + i386/i386/pic.c \ + i386/i386/pic.h \ + i386/i386/pit.c \ + i386/i386/pit.h +endif # # KDB support. @@ -225,3 +232,11 @@ EXTRA_DIST += \ # Instead of listing each file individually... EXTRA_DIST += \ i386/include + +# +# Platform specific parts. +# + +if PLATFORM_xen +include i386/xen/Makefrag.am +endif diff --git a/i386/configfrag.ac b/i386/configfrag.ac index f95aa86..1132b69 100644 --- a/i386/configfrag.ac +++ b/i386/configfrag.ac @@ -51,6 +51,12 @@ case $host_platform:$host_cpu in # i386/bogus/platforms.h] AC_DEFINE([AT386], [1], [AT386])[;; + xen:i?86) + # TODO. That should probably not be needed. + ncom=1 + # TODO. That should probably not be needed. + # i386/bogus/platforms.h] + AC_DEFINE([AT386], [1], [AT386])[;; *) :;; esac] @@ -105,9 +111,11 @@ if [ x"$enable_lpr" = xyes ]; then] AC_ARG_ENABLE([pae], - AS_HELP_STRING([--enable-pae], [PAE feature (ix86-only); disabled by - default])) + AS_HELP_STRING([--enable-pae], [PAE support (ix86-only); on ix86-at disabled + by default, on ix86-xen enabled by default])) [case $host_platform:$host_cpu in + xen:i?86) + enable_pae=${enable_pae-yes};; *:i?86) :;; *) diff --git a/i386/i386/debug_trace.S b/i386/i386/debug_trace.S index e741516..f275e1b 100644 --- a/i386/i386/debug_trace.S +++ b/i386/i386/debug_trace.S @@ -24,6 +24,7 @@ #ifdef DEBUG #include <mach/machine/asm.h> +#include <i386/xen.h> #include "debug.h" diff --git a/i386/i386/fpu.c b/i386/i386/fpu.c index 109d0d7..2a4b9c0 100644 --- a/i386/i386/fpu.c +++ b/i386/i386/fpu.c @@ -109,6 +109,10 @@ void init_fpu() { unsigned short status, control; + +#ifdef MACH_HYP + clear_ts(); +#else /* MACH_HYP */ unsigned int native = 0; if (machine_slot[cpu_number()].cpu_type >= CPU_TYPE_I486) @@ -120,6 +124,7 @@ init_fpu() * the control and status registers. */ set_cr0((get_cr0() & ~(CR0_EM|CR0_TS)) | native); /* allow use of FPU */ +#endif /* MACH_HYP */ fninit(); status = fnstsw(); @@ -153,8 +158,10 @@ init_fpu() struct i386_xfp_save save; unsigned long mask; fp_kind = FP_387X; +#ifndef MACH_HYP printf("Enabling FXSR\n"); set_cr4(get_cr4() | CR4_OSFXSR); +#endif /* MACH_HYP */ fxsave(&save); mask = save.fp_mxcsr_mask; if (!mask) @@ -163,10 +170,14 @@ init_fpu() } else fp_kind = FP_387; } +#ifdef MACH_HYP + set_ts(); +#else /* MACH_HYP */ /* * Trap wait instructions. Turn off FPU for now. */ set_cr0(get_cr0() | CR0_TS | CR0_MP); +#endif /* MACH_HYP */ } else { /* @@ -675,6 +686,7 @@ fpexterrflt() /*NOTREACHED*/ } +#ifndef MACH_XEN /* * FPU error. Called by AST. */ @@ -731,6 +743,7 @@ ASSERT_IPL(SPL0); thread->pcb->ims.ifps->fp_save_state.fp_status); /*NOTREACHED*/ } +#endif /* MACH_XEN */ /* * Save FPU state. @@ -846,7 +859,7 @@ fp_state_alloc() } } -#if AT386 +#if AT386 && !defined(MACH_XEN) /* * Handle a coprocessor error interrupt on the AT386. * This comes in on line 5 of the slave PIC at SPL1. diff --git a/i386/i386/gdt.c b/i386/i386/gdt.c index 845e7c6..b5fb033 100644 --- a/i386/i386/gdt.c +++ b/i386/i386/gdt.c @@ -31,11 +31,18 @@ * Global descriptor table. */ #include <mach/machine/vm_types.h> +#include <mach/xen.h> + +#include <intel/pmap.h> #include "vm_param.h" #include "seg.h" #include "gdt.h" +#ifdef MACH_XEN +/* It is actually defined in xen_boothdr.S */ +extern +#endif /* MACH_XEN */ struct real_descriptor gdt[GDTSZ]; void @@ -50,11 +57,21 @@ gdt_init() LINEAR_MIN_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS, LINEAR_MAX_KERNEL_ADDRESS - (LINEAR_MIN_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) - 1, ACC_PL_K|ACC_DATA_W, SZ_32); +#ifndef MACH_HYP fill_gdt_descriptor(LINEAR_DS, 0, 0xffffffff, ACC_PL_K|ACC_DATA_W, SZ_32); +#endif /* MACH_HYP */ +#ifdef MACH_XEN + unsigned long frame = kv_to_mfn(gdt); + pmap_set_page_readonly(gdt); + if (hyp_set_gdt(kv_to_la(&frame), GDTSZ)) + panic("couldn't set gdt\n"); + if (hyp_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments)) + panic("couldn't set 4gb segments vm assist"); +#else /* MACH_XEN */ /* Load the new GDT. */ { struct pseudo_descriptor pdesc; @@ -63,6 +80,7 @@ gdt_init() pdesc.linear_base = kvtolin(&gdt); lgdt(&pdesc); } +#endif /* MACH_XEN */ /* Reload all the segment registers from the new GDT. We must load ds and es with 0 before loading them with KERNEL_DS @@ -79,5 +97,14 @@ gdt_init() "movw %w1,%%es\n" "movw %w1,%%ss\n" : : "i" (KERNEL_CS), "r" (KERNEL_DS), "r" (0)); +#ifdef MACH_XEN +#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS + /* things now get shifted */ +#ifdef MACH_PSEUDO_PHYS + pfn_list = (void*) pfn_list + VM_MIN_KERNEL_ADDRESS - LINEAR_MIN_KERNEL_ADDRESS; +#endif /* MACH_PSEUDO_PHYS */ + la_shift += LINEAR_MIN_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS; +#endif +#endif /* MACH_XEN */ } diff --git a/i386/i386/gdt.h b/i386/i386/gdt.h index 50e01e6..41ace79 100644 --- a/i386/i386/gdt.h +++ b/i386/i386/gdt.h @@ -40,12 +40,16 @@ */ #define KERNEL_CS (0x08 | KERNEL_RING) /* kernel code */ #define KERNEL_DS (0x10 | KERNEL_RING) /* kernel data */ +#ifndef MACH_XEN #define KERNEL_LDT 0x18 /* master LDT */ +#endif /* MACH_XEN */ #define KERNEL_TSS 0x20 /* master TSS (uniprocessor) */ #define USER_LDT 0x28 /* place for per-thread LDT */ #define USER_TSS 0x30 /* place for per-thread TSS that holds IO bitmap */ +#ifndef MACH_HYP #define LINEAR_DS 0x38 /* linear mapping */ +#endif /* MACH_HYP */ /* 0x40 was USER_FPREGS, now free */ #define USER_GDT 0x48 /* user-defined GDT entries */ diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym index 868bf09..b1670e8 100644 --- a/i386/i386/i386asm.sym +++ b/i386/i386/i386asm.sym @@ -45,6 +45,7 @@ #include <i386/gdt.h> #include <i386/ldt.h> #include <i386/mp_desc.h> +#include <i386/xen.h> offset thread th pcb @@ -90,6 +91,9 @@ expr VM_MIN_ADDRESS expr VM_MAX_ADDRESS expr VM_MIN_KERNEL_ADDRESS KERNELBASE expr KERNEL_STACK_SIZE +#if VM_MIN_KERNEL_ADDRESS == LINEAR_MIN_KERNEL_ADDRESS +expr PFN_LIST pfn_list +#endif #if PAE expr PDPSHIFT @@ -117,7 +121,9 @@ expr KERNEL_RING expr KERNEL_CS expr KERNEL_DS expr KERNEL_TSS +#ifndef MACH_XEN expr KERNEL_LDT +#endif /* MACH_XEN */ expr (VM_MIN_KERNEL_ADDRESS>>PDESHIFT)*sizeof(pt_entry_t) KERNELBASEPDE @@ -135,3 +141,12 @@ expr TIMER_HIGH_UNIT offset thread th system_timer offset thread th user_timer #endif + +#ifdef MACH_XEN +offset shared_info si vcpu_info[0].evtchn_upcall_mask CPU_CLI +offset shared_info si vcpu_info[0].evtchn_upcall_pending CPU_PENDING +offset shared_info si vcpu_info[0].evtchn_pending_sel CPU_PENDING_SEL +offset shared_info si evtchn_pending PENDING +offset shared_info si evtchn_mask EVTMASK +offset shared_info si vcpu_info[0].arch.cr2 CR2 +#endif /* MACH_XEN */ diff --git a/i386/i386/idt.c b/i386/i386/idt.c index 1a8f917..b5e3d08 100644 --- a/i386/i386/idt.c +++ b/i386/i386/idt.c @@ -38,6 +38,10 @@ extern struct idt_init_entry idt_inittab[]; void idt_init() { +#ifdef MACH_HYP + if (hyp_set_trap_table(kvtolin(idt_inittab))) + panic("couldn't set trap table\n"); +#else /* MACH_HYP */ struct idt_init_entry *iie = idt_inittab; /* Initialize the exception vectors from the idt_inittab. */ @@ -55,5 +59,6 @@ void idt_init() pdesc.linear_base = kvtolin(&idt); lidt(&pdesc); } +#endif /* MACH_HYP */ } diff --git a/i386/i386/idt_inittab.S b/i386/i386/idt_inittab.S index 7718568..4dcad8d 100644 --- a/i386/i386/idt_inittab.S +++ b/i386/i386/idt_inittab.S @@ -25,7 +25,8 @@ */ #include <mach/machine/asm.h> -#include "seg.h" +#include <i386/seg.h> +#include <i386/i386asm.h> /* We'll be using macros to fill in a table in data hunk 2 @@ -38,12 +39,22 @@ ENTRY(idt_inittab) /* * Interrupt descriptor table and code vectors for it. */ +#ifdef MACH_XEN +#define IDT_ENTRY(n,entry,type) \ + .data 2 ;\ + .byte n ;\ + .byte (((type)&ACC_PL)>>5)|((((type)&(ACC_TYPE|ACC_A))==ACC_INTR_GATE)<<2) ;\ + .word KERNEL_CS ;\ + .long entry ;\ + .text +#else /* MACH_XEN */ #define IDT_ENTRY(n,entry,type) \ .data 2 ;\ .long entry ;\ .word n ;\ .word type ;\ .text +#endif /* MACH_XEN */ /* * No error code. Clear error code and push trap number. @@ -118,4 +129,7 @@ EXCEPTION(0x1f,t_trap_1f) /* Terminator */ .data 2 .long 0 +#ifdef MACH_XEN + .long 0 +#endif /* MACH_XEN */ diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c index 03d9a04..66432f3 100644 --- a/i386/i386/ktss.c +++ b/i386/i386/ktss.c @@ -45,6 +45,12 @@ ktss_init() /* XXX temporary exception stack */ static int exception_stack[1024]; +#ifdef MACH_XEN + /* Xen won't allow us to do any I/O by default anyway, just register + * exception stack */ + if (hyp_stack_switch(KERNEL_DS, (unsigned)(exception_stack+1024))) + panic("couldn't register exception stack\n"); +#else /* MACH_XEN */ /* Initialize the master TSS descriptor. */ fill_gdt_descriptor(KERNEL_TSS, kvtolin(&ktss), sizeof(struct task_tss) - 1, @@ -59,5 +65,6 @@ ktss_init() /* Load the TSS. */ ltr(KERNEL_TSS); +#endif /* MACH_XEN */ } diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c index 7299377..0ef7a8c 100644 --- a/i386/i386/ldt.c +++ b/i386/i386/ldt.c @@ -28,6 +28,9 @@ * same LDT. */ #include <mach/machine/vm_types.h> +#include <mach/xen.h> + +#include <intel/pmap.h> #include "vm_param.h" #include "seg.h" @@ -36,15 +39,23 @@ extern int syscall(); +#ifdef MACH_XEN +/* It is actually defined in xen_boothdr.S */ +extern +#endif /* MACH_XEN */ struct real_descriptor ldt[LDTSZ]; void ldt_init() { +#ifdef MACH_XEN + pmap_set_page_readwrite(ldt); +#else /* MACH_XEN */ /* Initialize the master LDT descriptor in the GDT. */ fill_gdt_descriptor(KERNEL_LDT, kvtolin(&ldt), sizeof(ldt)-1, ACC_PL_K|ACC_LDT, 0); +#endif /* MACH_XEN */ /* Initialize the LDT descriptors. */ fill_ldt_gate(USER_SCALL, @@ -61,5 +72,9 @@ ldt_init() ACC_PL_U|ACC_DATA_W, SZ_32); /* Activate the LDT. */ +#ifdef MACH_HYP + hyp_set_ldt(&ldt, LDTSZ); +#else /* MACH_HYP */ lldt(KERNEL_LDT); +#endif /* MACH_HYP */ } diff --git a/i386/i386/locore.S b/i386/i386/locore.S index 13a44d9..663db43 100644 --- a/i386/i386/locore.S +++ b/i386/i386/locore.S @@ -36,6 +36,7 @@ #include <i386/ldt.h> #include <i386/i386asm.h> #include <i386/cpu_number.h> +#include <i386/xen.h> /* * Fault recovery. @@ -323,8 +324,9 @@ ENTRY(t_segnp) trap_check_kernel_exit: testl $(EFL_VM),16(%esp) /* is trap from V86 mode? */ jnz EXT(alltraps) /* isn`t kernel trap if so */ - testl $3,12(%esp) /* is trap from kernel mode? */ - jne EXT(alltraps) /* if so: */ + /* Note: handling KERNEL_RING value by hand */ + testl $2,12(%esp) /* is trap from kernel mode? */ + jnz EXT(alltraps) /* if so: */ /* check for the kernel exit sequence */ cmpl $_kret_iret,8(%esp) /* on IRET? */ je fault_iret @@ -410,7 +412,8 @@ push_segregs: ENTRY(t_debug) testl $(EFL_VM),8(%esp) /* is trap from V86 mode? */ jnz 0f /* isn`t kernel trap if so */ - testl $3,4(%esp) /* is trap from kernel mode? */ + /* Note: handling KERNEL_RING value by hand */ + testl $2,4(%esp) /* is trap from kernel mode? */ jnz 0f /* if so: */ cmpl $syscall_entry,(%esp) /* system call entry? */ jne 0f /* if so: */ @@ -429,7 +432,11 @@ ENTRY(t_debug) ENTRY(t_page_fault) pushl $(T_PAGE_FAULT) /* mark a page fault trap */ pusha /* save the general registers */ +#ifdef MACH_XEN + movl %ss:hyp_shared_info+CR2,%eax +#else /* MACH_XEN */ movl %cr2,%eax /* get the faulting address */ +#endif /* MACH_XEN */ movl %eax,12(%esp) /* save in esp save slot */ jmp trap_push_segs /* continue fault */ @@ -465,7 +472,8 @@ trap_set_segs: cld /* clear direction flag */ testl $(EFL_VM),R_EFLAGS(%esp) /* in V86 mode? */ jnz trap_from_user /* user mode trap if so */ - testb $3,R_CS(%esp) /* user mode trap? */ + /* Note: handling KERNEL_RING value by hand */ + testb $2,R_CS(%esp) /* user mode trap? */ jz trap_from_kernel /* kernel trap if not */ trap_from_user: @@ -679,7 +687,8 @@ LEXT(return_to_iret) /* ( label for kdb_kintr and hardclock) */ testl $(EFL_VM),I_EFL(%esp) /* if in V86 */ jnz 0f /* or */ - testb $3,I_CS(%esp) /* user mode, */ + /* Note: handling KERNEL_RING value by hand */ + testb $2,I_CS(%esp) /* user mode, */ jz 1f /* check for ASTs */ 0: cmpl $0,CX(EXT(need_ast),%edx) @@ -1156,9 +1165,14 @@ ENTRY(discover_x86_cpu_type) movl %esp,%ebp /* Save stack pointer */ and $~0x3,%esp /* Align stack pointer */ +#ifdef MACH_HYP +#warning Assuming not Cyrix CPU +#else /* MACH_HYP */ inb $0xe8,%al /* Enable ID flag for Cyrix CPU ... */ andb $0x80,%al /* ... in CCR4 reg bit7 */ outb %al,$0xe8 +#endif /* MACH_HYP */ + pushfl /* Fetch flags ... */ popl %eax /* ... into eax */ movl %eax,%ecx /* Save original flags for return */ @@ -1266,13 +1280,24 @@ Entry(copyoutmsg) * XXX only have to do this on 386's. */ copyout_retry: +#ifdef MACH_HYP + movl cr3,%ecx /* point to page directory */ +#else /* MACH_HYP */ movl %cr3,%ecx /* point to page directory */ +#endif /* MACH_HYP */ #if PAE movl %edi,%eax /* get page directory pointer bits */ shrl $(PDPSHIFT),%eax /* from user address */ movl KERNELBASE(%ecx,%eax,PTE_SIZE),%ecx /* get page directory pointer */ +#ifdef MACH_PSEUDO_PHYS + shrl $(PTESHIFT),%ecx + movl pfn_list,%eax + movl (%eax,%ecx,4),%ecx /* mfn_to_pfn */ + shll $(PTESHIFT),%ecx +#else /* MACH_PSEUDO_PHYS */ andl $(PTE_PFN),%ecx /* isolate page frame address */ +#endif /* MACH_PSEUDO_PHYS */ #endif /* PAE */ movl %edi,%eax /* get page directory bits */ shrl $(PDESHIFT),%eax /* from user address */ @@ -1283,7 +1308,14 @@ copyout_retry: /* get page directory pointer */ testl $(PTE_V),%ecx /* present? */ jz 0f /* if not, fault is OK */ +#ifdef MACH_PSEUDO_PHYS + shrl $(PTESHIFT),%ecx + movl pfn_list,%eax + movl (%eax,%ecx,4),%ecx /* mfn_to_pfn */ + shll $(PTESHIFT),%ecx +#else /* MACH_PSEUDO_PHYS */ andl $(PTE_PFN),%ecx /* isolate page frame address */ +#endif /* MACH_PSEUDO_PHYS */ movl %edi,%eax /* get page table bits */ shrl $(PTESHIFT),%eax andl $(PTEMASK),%eax /* from user address */ @@ -1297,9 +1329,17 @@ copyout_retry: /* * Not writable - must fake a fault. Turn off access to the page. */ +#ifdef MACH_HYP + pushl %edx + pushl %ecx + call hyp_invalidate_pte + popl %ecx + popl %edx +#else /* MACH_HYP */ andl $(PTE_INVALID),(%ecx) /* turn off valid bit */ movl %cr3,%eax /* invalidate TLB */ movl %eax,%cr3 +#endif /* MACH_HYP */ 0: /* diff --git a/i386/i386/mp_desc.c b/i386/i386/mp_desc.c index 54660d5..2fd5ec2 100644 --- a/i386/i386/mp_desc.c +++ b/i386/i386/mp_desc.c @@ -31,6 +31,7 @@ #include <kern/cpu_number.h> #include <kern/debug.h> #include <mach/machine.h> +#include <mach/xen.h> #include <vm/vm_kern.h> #include <i386/mp_desc.h> @@ -149,6 +150,9 @@ mp_desc_init(mycpu) * Fix up the entries in the GDT to point to * this LDT and this TSS. */ +#ifdef MACH_HYP + panic("TODO %s:%d\n",__FILE__,__LINE__); +#else /* MACH_HYP */ fill_descriptor(&mpt->gdt[sel_idx(KERNEL_LDT)], (unsigned)&mpt->ldt, LDTSZ * sizeof(struct real_descriptor) - 1, @@ -161,6 +165,7 @@ mp_desc_init(mycpu) mpt->ktss.tss.ss0 = KERNEL_DS; mpt->ktss.tss.io_bit_map_offset = IOPB_INVAL; mpt->ktss.barrier = 0xFF; +#endif /* MACH_HYP */ return mpt; } diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c index 3226195..b9c52dd 100644 --- a/i386/i386/pcb.c +++ b/i386/i386/pcb.c @@ -31,6 +31,7 @@ #include <mach/kern_return.h> #include <mach/thread_status.h> #include <mach/exec/exec.h> +#include <mach/xen.h> #include "vm_param.h" #include <kern/counters.h> @@ -152,7 +153,12 @@ void switch_ktss(pcb) ? (int) (&pcb->iss + 1) : (int) (&pcb->iss.v86_segs); +#ifdef MACH_XEN + /* No IO mask here */ + hyp_stack_switch(KERNEL_DS, pcb_stack_top); +#else /* MACH_XEN */ curr_ktss(mycpu)->tss.esp0 = pcb_stack_top; +#endif /* MACH_XEN */ } { @@ -164,22 +170,47 @@ void switch_ktss(pcb) /* * Use system LDT. */ +#ifdef MACH_HYP + hyp_set_ldt(&ldt, LDTSZ); +#else /* MACH_HYP */ set_ldt(KERNEL_LDT); +#endif /* MACH_HYP */ } else { /* * Thread has its own LDT. */ +#ifdef MACH_HYP + hyp_set_ldt(tldt->ldt, + (tldt->desc.limit_low|(tldt->desc.limit_high<<16)) / + sizeof(struct real_descriptor)); +#else /* MACH_HYP */ *gdt_desc_p(mycpu,USER_LDT) = tldt->desc; set_ldt(USER_LDT); +#endif /* MACH_HYP */ } } +#ifdef MACH_XEN + { + int i; + for (i=0; i < USER_GDT_SLOTS; i++) { + if (memcmp(gdt_desc_p (mycpu, USER_GDT + (i << 3)), + &pcb->ims.user_gdt[i], sizeof pcb->ims.user_gdt[i])) { + if (hyp_do_update_descriptor(kv_to_ma(gdt_desc_p (mycpu, USER_GDT + (i << 3))), + *(unsigned long long *) &pcb->ims.user_gdt[i])) + panic("couldn't set user gdt %d\n",i); + } + } + } +#else /* MACH_XEN */ + /* Copy in the per-thread GDT slots. No reloading is necessary because just restoring the segment registers on the way back to user mode reloads the shadow registers from the in-memory GDT. */ memcpy (gdt_desc_p (mycpu, USER_GDT), pcb->ims.user_gdt, sizeof pcb->ims.user_gdt); +#endif /* MACH_XEN */ /* * Load the floating-point context, if necessary. diff --git a/i386/i386/phys.c b/i386/i386/phys.c index 2c30f17..925593b 100644 --- a/i386/i386/phys.c +++ b/i386/i386/phys.c @@ -27,6 +27,7 @@ #include <string.h> #include <mach/boolean.h> +#include <mach/xen.h> #include <kern/task.h> #include <kern/thread.h> #include <vm/vm_map.h> @@ -104,5 +105,9 @@ vm_offset_t addr; if ((pte = pmap_pte(kernel_pmap, addr)) == PT_ENTRY_NULL) return 0; - return i386_trunc_page(*pte) | (addr & INTEL_OFFMASK); + return i386_trunc_page( +#ifdef MACH_PSEUDO_PHYS + ma_to_pa +#endif /* MACH_PSEUDO_PHYS */ + (*pte)) | (addr & INTEL_OFFMASK); } diff --git a/i386/i386/proc_reg.h b/i386/i386/proc_reg.h index d9f32bc..64d8c43 100644 --- a/i386/i386/proc_reg.h +++ b/i386/i386/proc_reg.h @@ -72,8 +72,10 @@ #ifndef __ASSEMBLER__ #ifdef __GNUC__ +#ifndef MACH_HYP #include <i386/gdt.h> #include <i386/ldt.h> +#endif /* MACH_HYP */ static inline unsigned get_eflags(void) @@ -122,6 +124,16 @@ set_eflags(unsigned eflags) _temp__; \ }) +#ifdef MACH_HYP +extern unsigned long cr3; +#define get_cr3() (cr3) +#define set_cr3(value) \ + ({ \ + cr3 = (value); \ + if (!hyp_set_cr3(value)) \ + panic("set_cr3"); \ + }) +#else /* MACH_HYP */ #define get_cr3() \ ({ \ register unsigned int _temp__; \ @@ -134,9 +146,11 @@ set_eflags(unsigned eflags) register unsigned int _temp__ = (value); \ asm volatile("mov %0, %%cr3" : : "r" (_temp__)); \ }) +#endif /* MACH_HYP */ #define flush_tlb() set_cr3(get_cr3()) +#ifndef MACH_HYP #define invlpg(addr) \ ({ \ asm volatile("invlpg (%0)" : : "r" (addr)); \ @@ -164,6 +178,7 @@ set_eflags(unsigned eflags) : "+r" (var) : "r" (end), \ "q" (LINEAR_DS), "q" (KERNEL_DS), "i" (PAGE_SIZE)); \ }) +#endif /* MACH_HYP */ #define get_cr4() \ ({ \ @@ -179,11 +194,18 @@ set_eflags(unsigned eflags) }) +#ifdef MACH_HYP +#define set_ts() \ + hyp_fpu_taskswitch(1) +#define clear_ts() \ + hyp_fpu_taskswitch(0) +#else /* MACH_HYP */ #define set_ts() \ set_cr0(get_cr0() | CR0_TS) #define clear_ts() \ asm volatile("clts") +#endif /* MACH_HYP */ #define get_tr() \ ({ \ diff --git a/i386/i386/seg.h b/i386/i386/seg.h index 9a09af5..01b1a2e 100644 --- a/i386/i386/seg.h +++ b/i386/i386/seg.h @@ -37,7 +37,12 @@ * i386 segmentation. */ +/* Note: the value of KERNEL_RING is handled by hand in locore.S */ +#ifdef MACH_HYP +#define KERNEL_RING 1 +#else /* MACH_HYP */ #define KERNEL_RING 0 +#endif /* MACH_HYP */ #ifndef __ASSEMBLER__ @@ -118,6 +123,7 @@ struct real_gate { #ifndef __ASSEMBLER__ #include <mach/inline.h> +#include <mach/xen.h> /* Format of a "pseudo-descriptor", used for loading the IDT and GDT. */ @@ -152,9 +158,15 @@ MACH_INLINE void lldt(unsigned short ldt_selector) /* Fill a segment descriptor. */ MACH_INLINE void -fill_descriptor(struct real_descriptor *desc, unsigned base, unsigned limit, +fill_descriptor(struct real_descriptor *_desc, unsigned base, unsigned limit, unsigned char access, unsigned char sizebits) { + /* TODO: when !MACH_XEN, setting desc and just memcpy isn't simpler actually */ +#ifdef MACH_XEN + struct real_descriptor __desc, *desc = &__desc; +#else /* MACH_XEN */ + struct real_descriptor *desc = _desc; +#endif /* MACH_XEN */ if (limit > 0xfffff) { limit >>= 12; @@ -167,6 +179,10 @@ fill_descriptor(struct real_descriptor *desc, unsigned base, unsigned limit, desc->limit_high = limit >> 16; desc->granularity = sizebits; desc->base_high = base >> 24; +#ifdef MACH_XEN + if (hyp_do_update_descriptor(kv_to_ma(_desc), *(unsigned long long*)desc)) + panic("couldn't update descriptor(%p to %08lx%08lx)\n", kv_to_ma(_desc), *(((unsigned long*)desc)+1), *(unsigned long *)desc); +#endif /* MACH_XEN */ } /* Fill a gate with particular values. */ diff --git a/i386/i386/spl.S b/i386/i386/spl.S index f77b556..f1d4b45 100644 --- a/i386/i386/spl.S +++ b/i386/i386/spl.S @@ -20,6 +20,8 @@ #include <mach/machine/asm.h> #include <i386/ipl.h> #include <i386/pic.h> +#include <i386/i386asm.h> +#include <i386/xen.h> /* * Set IPL to the specified value. @@ -42,6 +44,7 @@ /* * Program PICs with mask in %eax. */ +#ifndef MACH_XEN #define SETMASK() \ cmpl EXT(curr_pic_mask),%eax; \ je 9f; \ @@ -50,6 +53,21 @@ movb %ah,%al; \ outb %al,$(PIC_SLAVE_OCW); \ 9: +#else /* MACH_XEN */ +#define pic_mask int_mask +#define SETMASK() \ + pushl %ebx; \ + movl %eax,%ebx; \ + xchgl %eax,hyp_shared_info+EVTMASK; \ + notl %ebx; \ + andl %eax,%ebx; /* Get unmasked events */ \ + testl hyp_shared_info+PENDING, %ebx; \ + popl %ebx; \ + jz 9f; /* Check whether there was some pending */ \ +lock orl $1,hyp_shared_info+CPU_PENDING_SEL; /* Yes, activate it */ \ + movb $1,hyp_shared_info+CPU_PENDING; \ +9: +#endif /* MACH_XEN */ ENTRY(spl0) movl EXT(curr_ipl),%eax /* save current ipl */ diff --git a/i386/i386/trap.c b/i386/i386/trap.c index 4361fcd..28a9e0c 100644 --- a/i386/i386/trap.c +++ b/i386/i386/trap.c @@ -585,6 +585,7 @@ i386_astintr() int mycpu = cpu_number(); (void) splsched(); /* block interrupts to check reasons */ +#ifndef MACH_XEN if (need_ast[mycpu] & AST_I386_FP) { /* * AST was for delayed floating-point exception - @@ -596,7 +597,9 @@ i386_astintr() fpastintr(); } - else { + else +#endif /* MACH_XEN */ + { /* * Not an FPU trap. Handle the AST. * Interrupts are still blocked. diff --git a/i386/i386/user_ldt.c b/i386/i386/user_ldt.c index 942ad07..dfe6b1e 100644 --- a/i386/i386/user_ldt.c +++ b/i386/i386/user_ldt.c @@ -39,6 +39,7 @@ #include <i386/seg.h> #include <i386/thread.h> #include <i386/user_ldt.h> +#include <stddef.h> #include "ldt.h" #include "vm_param.h" @@ -195,9 +196,17 @@ i386_set_ldt(thread, first_selector, desc_list, count, desc_list_inline) if (new_ldt == 0) { simple_unlock(&pcb->lock); +#ifdef MACH_XEN + /* LDT needs to be aligned on a page */ + vm_offset_t alloc = kalloc(ldt_size_needed + PAGE_SIZE + offsetof(struct user_ldt, ldt)); + new_ldt = (user_ldt_t) (round_page((alloc + offsetof(struct user_ldt, ldt))) - offsetof(struct user_ldt, ldt)); + new_ldt->alloc = alloc; + +#else /* MACH_XEN */ new_ldt = (user_ldt_t) kalloc(ldt_size_needed + sizeof(struct real_descriptor)); +#endif /* MACH_XEN */ /* * Build a descriptor that describes the * LDT itself @@ -263,9 +272,19 @@ i386_set_ldt(thread, first_selector, desc_list, count, desc_list_inline) simple_unlock(&pcb->lock); if (new_ldt) +#ifdef MACH_XEN + { + int i; + for (i=0; i<(new_ldt->desc.limit_low + 1)/sizeof(struct real_descriptor); i+=PAGE_SIZE/sizeof(struct real_descriptor)) + pmap_set_page_readwrite(&new_ldt->ldt[i]); + kfree(new_ldt->alloc, new_ldt->desc.limit_low + 1 + + PAGE_SIZE + offsetof(struct user_ldt, ldt)); + } +#else /* MACH_XEN */ kfree((vm_offset_t)new_ldt, new_ldt->desc.limit_low + 1 + sizeof(struct real_descriptor)); +#endif /* MACH_XEN */ /* * Free the descriptor list, if it was @@ -398,9 +417,17 @@ void user_ldt_free(user_ldt) user_ldt_t user_ldt; { +#ifdef MACH_XEN + int i; + for (i=0; i<(user_ldt->desc.limit_low + 1)/sizeof(struct real_descriptor); i+=PAGE_SIZE/sizeof(struct real_descriptor)) + pmap_set_page_readwrite(&user_ldt->ldt[i]); + kfree(user_ldt->alloc, user_ldt->desc.limit_low + 1 + + PAGE_SIZE + offsetof(struct user_ldt, ldt)); +#else /* MACH_XEN */ kfree((vm_offset_t)user_ldt, user_ldt->desc.limit_low + 1 + sizeof(struct real_descriptor)); +#endif /* MACH_XEN */ } diff --git a/i386/i386/user_ldt.h b/i386/i386/user_ldt.h index dd3ad4b..8d16ed8 100644 --- a/i386/i386/user_ldt.h +++ b/i386/i386/user_ldt.h @@ -36,6 +36,9 @@ #include <i386/seg.h> struct user_ldt { +#ifdef MACH_XEN + vm_offset_t alloc; /* allocation before alignment */ +#endif /* MACH_XEN */ struct real_descriptor desc; /* descriptor for self */ struct real_descriptor ldt[1]; /* descriptor table (variable) */ }; diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h index 8e92e79..95df604 100644 --- a/i386/i386/vm_param.h +++ b/i386/i386/vm_param.h @@ -25,10 +25,25 @@ /* XXX use xu/vm_param.h */ #include <mach/vm_param.h> +#include <xen/public/xen.h> /* The kernel address space is 1GB, starting at virtual address 0. */ -#define VM_MIN_KERNEL_ADDRESS (0x00000000) -#define VM_MAX_KERNEL_ADDRESS ((LINEAR_MAX_KERNEL_ADDRESS - LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS)) +#ifdef MACH_XEN +#define VM_MIN_KERNEL_ADDRESS 0x20000000UL +#else /* MACH_XEN */ +#define VM_MIN_KERNEL_ADDRESS 0x00000000UL +#endif /* MACH_XEN */ + +#ifdef MACH_XEN +#if PAE +#define HYP_VIRT_START HYPERVISOR_VIRT_START_PAE +#else /* PAE */ +#define HYP_VIRT_START HYPERVISOR_VIRT_START_NONPAE +#endif /* PAE */ +#define VM_MAX_KERNEL_ADDRESS (HYP_VIRT_START - LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS) +#else /* MACH_XEN */ +#define VM_MAX_KERNEL_ADDRESS (LINEAR_MAX_KERNEL_ADDRESS - LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS) +#endif /* MACH_XEN */ /* The kernel virtual address space is actually located at high linear addresses. @@ -36,8 +51,14 @@ #define LINEAR_MIN_KERNEL_ADDRESS (VM_MAX_ADDRESS) #define LINEAR_MAX_KERNEL_ADDRESS (0xffffffffUL) +#ifdef MACH_XEN +/* need room for mmu updates (2*8bytes) */ +#define KERNEL_STACK_SIZE (4*I386_PGBYTES) +#define INTSTACK_SIZE (4*I386_PGBYTES) +#else /* MACH_XEN */ #define KERNEL_STACK_SIZE (1*I386_PGBYTES) #define INTSTACK_SIZE (1*I386_PGBYTES) +#endif /* MACH_XEN */ /* interrupt stack size */ /* diff --git a/i386/i386/xen.h b/i386/i386/xen.h new file mode 100644 index 0000000..a7fb641 --- /dev/null +++ b/i386/i386/xen.h @@ -0,0 +1,357 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_HYPCALL_H +#define XEN_HYPCALL_H + +#ifdef MACH_XEN +#ifndef __ASSEMBLER__ +#include <kern/printf.h> +#include <mach/machine/vm_types.h> +#include <mach/vm_param.h> +#include <mach/inline.h> +#include <machine/vm_param.h> +#include <intel/pmap.h> +#include <kern/debug.h> +#include <xen/public/xen.h> + +/* TODO: this should be moved in appropriate non-Xen place. */ +#define barrier() __asm__ __volatile__ ("": : :"memory") +#define mb() __asm__ __volatile__("lock; addl $0,0(%esp)") +#define rmb() mb() +#define wmb() mb() +MACH_INLINE unsigned long xchgl(volatile unsigned long *ptr, unsigned long x) +{ + __asm__ __volatile__("xchgl %0, %1" + : "=r" (x) + : "m" (*(ptr)), "0" (x): "memory"); + return x; +} +#define _TOSTR(x) #x +#define TOSTR(x) _TOSTR (x) + + + +/* x86-specific hypercall interface. */ +#define _hypcall0(type, name) \ +MACH_INLINE type hyp_##name(void) \ +{ \ + long __ret; \ + asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \ + : "=a" (__ret) \ + : : "memory"); \ + return __ret; \ +} + +#define _hypcall1(type, name, type1, arg1) \ +MACH_INLINE type hyp_##name(type1 arg1) \ +{ \ + long __ret; \ + long foo1; \ + asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \ + : "=a" (__ret), \ + "=b" (foo1) \ + : "1" ((long)arg1) \ + : "memory"); \ + return __ret; \ +} + +#define _hypcall2(type, name, type1, arg1, type2, arg2) \ +MACH_INLINE type hyp_##name(type1 arg1, type2 arg2) \ +{ \ + long __ret; \ + long foo1, foo2; \ + asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \ + : "=a" (__ret), \ + "=b" (foo1), \ + "=c" (foo2) \ + : "1" ((long)arg1), \ + "2" ((long)arg2) \ + : "memory"); \ + return __ret; \ +} + +#define _hypcall3(type, name, type1, arg1, type2, arg2, type3, arg3) \ +MACH_INLINE type hyp_##name(type1 arg1, type2 arg2, type3 arg3) \ +{ \ + long __ret; \ + long foo1, foo2, foo3; \ + asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \ + : "=a" (__ret), \ + "=b" (foo1), \ + "=c" (foo2), \ + "=d" (foo3) \ + : "1" ((long)arg1), \ + "2" ((long)arg2), \ + "3" ((long)arg3) \ + : "memory"); \ + return __ret; \ +} + +#define _hypcall4(type, name, type1, arg1, type2, arg2, type3, arg3, type4, arg4) \ +MACH_INLINE type hyp_##name(type1 arg1, type2 arg2, type3 arg3, type4 arg4) \ +{ \ + long __ret; \ + long foo1, foo2, foo3, foo4; \ + asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \ + : "=a" (__ret), \ + "=b" (foo1), \ + "=c" (foo2), \ + "=d" (foo3), \ + "=S" (foo4) \ + : "1" ((long)arg1), \ + "2" ((long)arg2), \ + "3" ((long)arg3), \ + "4" ((long)arg4) \ + : "memory"); \ + return __ret; \ +} + +#define _hypcall5(type, name, type1, arg1, type2, arg2, type3, arg3, type4, arg4, type5, arg5) \ +MACH_INLINE type hyp_##name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, type5 arg5) \ +{ \ + long __ret; \ + long foo1, foo2, foo3, foo4, foo5; \ + asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \ + : "=a" (__ret), \ + "=b" (foo1), \ + "=c" (foo2), \ + "=d" (foo3), \ + "=S" (foo4), \ + "=D" (foo5) \ + : "1" ((long)arg1), \ + "2" ((long)arg2), \ + "3" ((long)arg3), \ + "4" ((long)arg4), \ + "5" ((long)arg5) \ + : "memory"); \ + return __ret; \ +} + +/* x86 Hypercalls */ + +/* Note: since Hypervisor uses flat memory model, remember to always use + * kvtolin when giving pointers as parameters for the hypercall to read data + * at. Use kv_to_la when they may be used before GDT got set up. */ + +_hypcall1(long, set_trap_table, vm_offset_t /* struct trap_info * */, traps); + +_hypcall4(int, mmu_update, vm_offset_t /* struct mmu_update * */, req, int, count, vm_offset_t /* int * */, success_count, domid_t, domid) +MACH_INLINE int hyp_mmu_update_pte(unsigned long pte, unsigned long long val) +{ + struct mmu_update update = + { + .ptr = pte, + .val = val, + }; + int count; + hyp_mmu_update(kv_to_la(&update), 1, kv_to_la(&count), DOMID_SELF); + return count; +} +/* Note: make sure this fits in KERNEL_STACK_SIZE */ +#define HYP_BATCH_MMU_UPDATES 256 + +#define hyp_mmu_update_la(la, val) hyp_mmu_update_pte( \ + (unsigned long)(((pt_entry_t*)(kernel_pmap->dirbase[lin2pdenum((unsigned long)la)] & INTEL_PTE_PFN)) \ + + ptenum((unsigned long)la)), val) + +_hypcall2(long, set_gdt, vm_offset_t /* unsigned long * */, frame_list, unsigned int, entries) + +_hypcall2(long, stack_switch, unsigned long, ss, unsigned long, esp); + +_hypcall4(long, set_callbacks, unsigned long, es, void *, ea, + unsigned long, fss, void *, fsa); +_hypcall1(long, fpu_taskswitch, int, set); + +_hypcall4(long, update_descriptor, unsigned long, ma_lo, unsigned long, ma_hi, unsigned long, desc_lo, unsigned long, desc_hi); +#define hyp_do_update_descriptor(ma, desc) ({ \ + unsigned long long __desc = (desc); \ + hyp_update_descriptor(ma, 0, __desc, __desc >> 32); \ +}) + +#include <xen/public/memory.h> +_hypcall2(long, memory_op, unsigned long, cmd, vm_offset_t /* void * */, arg); +MACH_INLINE void hyp_free_mfn(unsigned long mfn) +{ + struct xen_memory_reservation reservation; + reservation.extent_start = (void*) kvtolin(&mfn); + reservation.nr_extents = 1; + reservation.extent_order = 0; + reservation.address_bits = 0; + reservation.domid = DOMID_SELF; + if (hyp_memory_op(XENMEM_decrease_reservation, kvtolin(&reservation)) != 1) + panic("couldn't free page %d\n", mfn); +} + +_hypcall4(int, update_va_mapping, unsigned long, va, unsigned long, val_lo, unsigned long, val_hi, unsigned long, flags); +#define hyp_do_update_va_mapping(va, val, flags) ({ \ + unsigned long long __val = (val); \ + hyp_update_va_mapping(va, __val & 0xffffffffU, __val >> 32, flags); \ +}) + +MACH_INLINE void hyp_free_page(unsigned long pfn, void *va) +{ + /* save mfn */ + unsigned long mfn = pfn_to_mfn(pfn); + + /* remove from mappings */ + if (hyp_do_update_va_mapping(kvtolin(va), 0, UVMF_INVLPG|UVMF_ALL)) + panic("couldn't clear page %d at %p\n", pfn, va); + +#ifdef MACH_PSEUDO_PHYS + /* drop machine page */ + mfn_list[pfn] = ~0; +#endif /* MACH_PSEUDO_PHYS */ + + /* and free from Xen */ + hyp_free_mfn(mfn); +} + +_hypcall4(int, mmuext_op, vm_offset_t /* struct mmuext_op * */, op, int, count, vm_offset_t /* int * */, success_count, domid_t, domid); +MACH_INLINE int hyp_mmuext_op_void(unsigned int cmd) +{ + struct mmuext_op op = { + .cmd = cmd, + }; + int count; + hyp_mmuext_op(kv_to_la(&op), 1, kv_to_la(&count), DOMID_SELF); + return count; +} +MACH_INLINE int hyp_mmuext_op_mfn(unsigned int cmd, unsigned long mfn) +{ + struct mmuext_op op = { + .cmd = cmd, + .arg1.mfn = mfn, + }; + int count; + hyp_mmuext_op(kv_to_la(&op), 1, kv_to_la(&count), DOMID_SELF); + return count; +} +MACH_INLINE void hyp_set_ldt(void *ldt, unsigned long nbentries) { + struct mmuext_op op = { + .cmd = MMUEXT_SET_LDT, + .arg1.linear_addr = kvtolin(ldt), + .arg2.nr_ents = nbentries, + }; + int count; + if (((unsigned long)ldt) & PAGE_MASK) + panic("ldt %p is not aligned on a page\n", ldt); + for (count=0; count<nbentries; count+= PAGE_SIZE/8) + pmap_set_page_readonly(ldt+count*8); + hyp_mmuext_op(kvtolin(&op), 1, kvtolin(&count), DOMID_SELF); + if (!count) + panic("couldn't set LDT\n"); +} +/* TODO: use xen_pfn_to_cr3/xen_cr3_to_pfn to cope with pdp above 4GB */ +#define hyp_set_cr3(value) hyp_mmuext_op_mfn(MMUEXT_NEW_BASEPTR, pa_to_mfn(value)) +MACH_INLINE void hyp_invlpg(vm_offset_t lin) { + struct mmuext_op ops; + int n; + ops.cmd = MMUEXT_INVLPG_ALL; + ops.arg1.linear_addr = lin; + hyp_mmuext_op(kvtolin(&ops), 1, kvtolin(&n), DOMID_SELF); + if (n < 1) + panic("couldn't invlpg\n"); +} + +_hypcall2(long, set_timer_op, unsigned long, absolute_lo, unsigned long, absolute_hi); +#define hyp_do_set_timer_op(absolute_nsec) ({ \ + unsigned long long __absolute = (absolute_nsec); \ + hyp_set_timer_op(__absolute, __absolute >> 32); \ +}) + +#include <xen/public/event_channel.h> +_hypcall1(int, event_channel_op, vm_offset_t /* evtchn_op_t * */, op); +MACH_INLINE int hyp_event_channel_send(evtchn_port_t port) { + evtchn_op_t op = { + .cmd = EVTCHNOP_send, + .u.send.port = port, + }; + return hyp_event_channel_op(kvtolin(&op)); +} +MACH_INLINE evtchn_port_t hyp_event_channel_alloc(domid_t domid) { + evtchn_op_t op = { + .cmd = EVTCHNOP_alloc_unbound, + .u.alloc_unbound.dom = DOMID_SELF, + .u.alloc_unbound.remote_dom = domid, + }; + if (hyp_event_channel_op(kvtolin(&op))) + panic("couldn't allocate event channel"); + return op.u.alloc_unbound.port; +} +MACH_INLINE evtchn_port_t hyp_event_channel_bind_virq(uint32_t virq, uint32_t vcpu) { + evtchn_op_t op = { .cmd = EVTCHNOP_bind_virq, .u.bind_virq = { .virq = virq, .vcpu = vcpu }}; + if (hyp_event_channel_op(kvtolin(&op))) + panic("can't bind virq %d\n",virq); + return op.u.bind_virq.port; +} + +_hypcall3(int, console_io, int, cmd, int, count, vm_offset_t /* const char * */, buffer); + +_hypcall3(long, grant_table_op, unsigned int, cmd, vm_offset_t /* void * */, uop, unsigned int, count); + +_hypcall2(long, vm_assist, unsigned int, cmd, unsigned int, type); + +_hypcall0(long, iret); + +#include <xen/public/sched.h> +_hypcall2(long, sched_op, int, cmd, vm_offset_t /* void* */, arg) +#define hyp_yield() hyp_sched_op(SCHEDOP_yield, 0) +#define hyp_block() hyp_sched_op(SCHEDOP_block, 0) +MACH_INLINE void __attribute__((noreturn)) hyp_crash(void) +{ + unsigned int shut = SHUTDOWN_crash; + hyp_sched_op(SCHEDOP_shutdown, kvtolin(&shut)); + /* really shouldn't return */ + printf("uh, shutdown returned?!\n"); + for(;;); +} + +MACH_INLINE void __attribute__((noreturn)) hyp_halt(void) +{ + unsigned int shut = SHUTDOWN_poweroff; + hyp_sched_op(SCHEDOP_shutdown, kvtolin(&shut)); + /* really shouldn't return */ + printf("uh, shutdown returned?!\n"); + for(;;); +} + +MACH_INLINE void __attribute__((noreturn)) hyp_reboot(void) +{ + unsigned int shut = SHUTDOWN_reboot; + hyp_sched_op(SCHEDOP_shutdown, kvtolin(&shut)); + /* really shouldn't return */ + printf("uh, reboot returned?!\n"); + for(;;); +} + +/* x86-specific */ +MACH_INLINE unsigned64_t hyp_cpu_clock(void) { + unsigned64_t tsc; + asm volatile("rdtsc":"=A"(tsc)); + return tsc; +} + +#else /* __ASSEMBLER__ */ +/* TODO: SMP */ +#define cli movb $0xff,hyp_shared_info+CPU_CLI +#define sti call hyp_sti +#endif /* ASSEMBLER */ +#endif /* MACH_XEN */ + +#endif /* XEN_HYPCALL_H */ diff --git a/i386/i386at/conf.c b/i386/i386at/conf.c index 23c2a6f..f5ab36c 100644 --- a/i386/i386at/conf.c +++ b/i386/i386at/conf.c @@ -34,6 +34,7 @@ extern int timeopen(), timeclose(); extern vm_offset_t timemmap(); #define timename "time" +#ifndef MACH_HYP extern int kdopen(), kdclose(), kdread(), kdwrite(); extern int kdgetstat(), kdsetstat(), kdportdeath(); extern vm_offset_t kdmmap(); @@ -50,17 +51,26 @@ extern int lpropen(), lprclose(), lprread(), lprwrite(); extern int lprgetstat(), lprsetstat(), lprportdeath(); #define lprname "lpr" #endif /* NLPR > 0 */ +#endif /* MACH_HYP */ extern int kbdopen(), kbdclose(), kbdread(); extern int kbdgetstat(), kbdsetstat(); #define kbdname "kbd" +#ifndef MACH_HYP extern int mouseopen(), mouseclose(), mouseread(), mousegetstat(); #define mousename "mouse" +#endif /* MACH_HYP */ extern int kmsgopen(), kmsgclose(), kmsgread(), kmsggetstat(); #define kmsgname "kmsg" +#ifdef MACH_HYP +extern int hypcnopen(), hypcnclose(), hypcnread(), hypcnwrite(); +extern int hypcngetstat(), hypcnsetstat(), hypcnportdeath(); +#define hypcnname "hyp" +#endif /* MACH_HYP */ + /* * List of devices - console must be at slot 0 */ @@ -79,16 +89,19 @@ struct dev_ops dev_name_list[] = nodev, nulldev, nulldev, 0, nodev }, +#ifndef MACH_HYP { kdname, kdopen, kdclose, kdread, kdwrite, kdgetstat, kdsetstat, kdmmap, nodev, nulldev, kdportdeath, 0, nodev }, +#endif /* MACH_HYP */ { timename, timeopen, timeclose, nulldev, nulldev, nulldev, nulldev, timemmap, nodev, nulldev, nulldev, 0, nodev }, +#ifndef MACH_HYP #if NCOM > 0 { comname, comopen, comclose, comread, comwrite, comgetstat, comsetstat, nomap, @@ -107,6 +120,7 @@ struct dev_ops dev_name_list[] = nodev, mousegetstat, nulldev, nomap, nodev, nulldev, nulldev, 0, nodev }, +#endif /* MACH_HYP */ { kbdname, kbdopen, kbdclose, kbdread, nodev, kbdgetstat, kbdsetstat, nomap, @@ -120,6 +134,13 @@ struct dev_ops dev_name_list[] = nodev }, #endif +#ifdef MACH_HYP + { hypcnname, hypcnopen, hypcnclose, hypcnread, + hypcnwrite, hypcngetstat, hypcnsetstat, nomap, + nodev, nulldev, hypcnportdeath, 0, + nodev }, +#endif /* MACH_HYP */ + }; int dev_name_count = sizeof(dev_name_list)/sizeof(dev_name_list[0]); diff --git a/i386/i386at/cons_conf.c b/i386/i386at/cons_conf.c index 8784ed9..ea8ccb5 100644 --- a/i386/i386at/cons_conf.c +++ b/i386/i386at/cons_conf.c @@ -30,19 +30,27 @@ #include <sys/types.h> #include <device/cons.h> +#ifdef MACH_HYP +extern int hypcnprobe(), hypcninit(), hypcngetc(), hypcnputc(); +#else /* MACH_HYP */ extern int kdcnprobe(), kdcninit(), kdcngetc(), kdcnputc(); #if NCOM > 0 && RCLINE >= 0 extern int comcnprobe(), comcninit(), comcngetc(), comcnputc(); #endif +#endif /* MACH_HYP */ /* * The rest of the consdev fields are filled in by the respective * cnprobe routine. */ struct consdev constab[] = { +#ifdef MACH_HYP + {"hyp", hypcnprobe, hypcninit, hypcngetc, hypcnputc}, +#else /* MACH_HYP */ {"kd", kdcnprobe, kdcninit, kdcngetc, kdcnputc}, #if NCOM > 0 && RCLINE >= 0 && 1 {"com", comcnprobe, comcninit, comcngetc, comcnputc}, #endif +#endif /* MACH_HYP */ {0} }; diff --git a/i386/i386at/model_dep.c b/i386/i386at/model_dep.c index 3ebe2e6..61605a1 100644 --- a/i386/i386at/model_dep.c +++ b/i386/i386at/model_dep.c @@ -40,6 +40,7 @@ #include <mach/vm_prot.h> #include <mach/machine.h> #include <mach/machine/multiboot.h> +#include <mach/xen.h> #include <i386/vm_param.h> #include <kern/assert.h> @@ -48,6 +49,7 @@ #include <kern/mach_clock.h> #include <kern/printf.h> #include <sys/time.h> +#include <sys/types.h> #include <vm/vm_page.h> #include <i386/fpu.h> #include <i386/gdt.h> @@ -65,6 +67,12 @@ #include <i386at/int_init.h> #include <i386at/kd.h> #include <i386at/rtc.h> +#ifdef MACH_XEN +#include <xen/console.h> +#include <xen/store.h> +#include <xen/evt.h> +#include <xen/xen.h> +#endif /* MACH_XEN */ /* Location of the kernel's symbol table. Both of these are 0 if none is available. */ @@ -81,7 +89,20 @@ vm_offset_t phys_first_addr = 0; vm_offset_t phys_last_addr; /* A copy of the multiboot info structure passed by the boot loader. */ +#ifdef MACH_XEN +struct start_info boot_info; +#ifdef MACH_PSEUDO_PHYS +unsigned long *mfn_list; +#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS +unsigned long *pfn_list = (void*) PFN_LIST; +#endif +#endif /* MACH_PSEUDO_PHYS */ +#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS +unsigned long la_shift = VM_MIN_KERNEL_ADDRESS; +#endif +#else /* MACH_XEN */ struct multiboot_info boot_info; +#endif /* MACH_XEN */ /* Command line supplied to kernel. */ char *kernel_cmdline = ""; @@ -90,7 +111,11 @@ char *kernel_cmdline = ""; it gets bumped up through physical memory that exists and is not occupied by boot gunk. It is not necessarily page-aligned. */ -static vm_offset_t avail_next = 0x1000; /* XX end of BIOS data area */ +static vm_offset_t avail_next +#ifndef MACH_HYP + = 0x1000 /* XX end of BIOS data area */ +#endif /* MACH_HYP */ + ; /* Possibly overestimated amount of available memory still remaining to be handed to the VM system. */ @@ -135,6 +160,9 @@ void machine_init(void) */ init_fpu(); +#ifdef MACH_HYP + hyp_init(); +#else /* MACH_HYP */ #ifdef LINUX_DEV /* * Initialize Linux drivers. @@ -146,16 +174,19 @@ void machine_init(void) * Find the devices */ probeio(); +#endif /* MACH_HYP */ /* * Get the time */ inittodr(); +#ifndef MACH_HYP /* * Tell the BIOS not to clear and test memory. */ *(unsigned short *)phystokv(0x472) = 0x1234; +#endif /* MACH_HYP */ /* * Unmap page 0 to trap NULL references. @@ -166,8 +197,17 @@ void machine_init(void) /* Conserve power on processor CPU. */ void machine_idle (int cpu) { +#ifdef MACH_HYP + hyp_idle(); +#else /* MACH_HYP */ assert (cpu == cpu_number ()); asm volatile ("hlt" : : : "memory"); +#endif /* MACH_HYP */ +} + +void machine_relax () +{ + asm volatile ("rep; nop" : : : "memory"); } /* @@ -175,9 +215,13 @@ void machine_idle (int cpu) */ void halt_cpu(void) { +#ifdef MACH_HYP + hyp_halt(); +#else /* MACH_HYP */ asm volatile("cli"); while (TRUE) machine_idle (cpu_number ()); +#endif /* MACH_HYP */ } /* @@ -187,10 +231,16 @@ void halt_all_cpus(reboot) boolean_t reboot; { if (reboot) { +#ifdef MACH_HYP + hyp_reboot(); +#endif /* MACH_HYP */ kdreboot(); } else { rebootflag = 1; +#ifdef MACH_HYP + hyp_halt(); +#endif /* MACH_HYP */ printf("In tight loop: hit ctl-alt-del to reboot\n"); (void) spl0(); } @@ -215,22 +265,26 @@ void db_reset_cpu(void) void mem_size_init(void) { - vm_size_t phys_last_kb; - /* Physical memory on all PCs starts at physical address 0. XX make it a constant. */ phys_first_addr = 0; - phys_last_kb = 0x400 + boot_info.mem_upper; +#ifdef MACH_HYP + if (boot_info.nr_pages >= 0x100000) { + printf("Truncating memory size to 4GiB\n"); + phys_last_addr = 0xffffffffU; + } else + phys_last_addr = boot_info.nr_pages * 0x1000; +#else /* MACH_HYP */ + /* TODO: support mmap */ + vm_size_t phys_last_kb = 0x400 + boot_info.mem_upper; /* Avoid 4GiB overflow. */ if (phys_last_kb < 0x400 || phys_last_kb >= 0x400000) { printf("Truncating memory size to 4GiB\n"); - phys_last_kb = 0x400000 - 1; - } - - /* TODO: support mmap */ - - phys_last_addr = phys_last_kb * 0x400; + phys_last_addr = 0xffffffffU; + } else + phys_last_addr = phys_last_kb * 0x400; +#endif /* MACH_HYP */ printf("AT386 boot: physical memory from 0x%x to 0x%x\n", phys_first_addr, phys_last_addr); @@ -240,14 +294,20 @@ mem_size_init(void) if (phys_last_addr > ((VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / 6) * 5) { phys_last_addr = ((VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / 6) * 5; printf("Truncating memory size to %dMiB\n", (phys_last_addr - phys_first_addr) / (1024 * 1024)); + /* TODO Xen: free lost memory */ } phys_first_addr = round_page(phys_first_addr); phys_last_addr = trunc_page(phys_last_addr); +#ifdef MACH_HYP + /* Memory is just contiguous */ + avail_remaining = phys_last_addr; +#else /* MACH_HYP */ avail_remaining = phys_last_addr - (0x100000 - (boot_info.mem_lower * 0x400) - 0x1000); +#endif /* MACH_HYP */ } /* @@ -263,13 +323,20 @@ i386at_init(void) /* * Initialize the PIC prior to any possible call to an spl. */ +#ifndef MACH_HYP picinit(); +#else /* MACH_HYP */ + hyp_intrinit(); +#endif /* MACH_HYP */ /* * Find memory size parameters. */ mem_size_init(); +#ifdef MACH_XEN + kernel_cmdline = (char*) boot_info.cmd_line; +#else /* MACH_XEN */ /* Copy content pointed by boot_info before losing access to it when it * is too far in physical memory. */ if (boot_info.flags & MULTIBOOT_CMDLINE) { @@ -304,6 +371,7 @@ i386at_init(void) m[i].string = addr; } } +#endif /* MACH_XEN */ /* * Initialize kernel physical map, mapping the @@ -325,19 +393,42 @@ i386at_init(void) kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS)] = kernel_page_dir[lin2pdenum(LINEAR_MIN_KERNEL_ADDRESS)]; #if PAE + /* PAE page tables are 2MB only */ kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 1] = kernel_page_dir[lin2pdenum(LINEAR_MIN_KERNEL_ADDRESS) + 1]; + kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 2] = + kernel_page_dir[lin2pdenum(LINEAR_MIN_KERNEL_ADDRESS) + 2]; +#endif /* PAE */ +#ifdef MACH_XEN + { + int i; + for (i = 0; i < PDPNUM; i++) + pmap_set_page_readonly_init((void*) kernel_page_dir + i * INTEL_PGBYTES); +#if PAE + pmap_set_page_readonly_init(kernel_pmap->pdpbase); +#endif /* PAE */ + } +#endif /* MACH_XEN */ +#if PAE set_cr3((unsigned)_kvtophys(kernel_pmap->pdpbase)); +#ifndef MACH_HYP if (!CPU_HAS_FEATURE(CPU_FEATURE_PAE)) panic("CPU doesn't have support for PAE."); set_cr4(get_cr4() | CR4_PAE); +#endif /* MACH_HYP */ #else set_cr3((unsigned)_kvtophys(kernel_page_dir)); #endif /* PAE */ +#ifndef MACH_HYP if (CPU_HAS_FEATURE(CPU_FEATURE_PGE)) set_cr4(get_cr4() | CR4_PGE); + /* already set by Hypervisor */ set_cr0(get_cr0() | CR0_PG | CR0_WP); +#endif /* MACH_HYP */ flush_instr_queue(); +#ifdef MACH_XEN + pmap_clear_bootstrap_pagetable((void *)boot_info.pt_base); +#endif /* MACH_XEN */ /* Interrupt stacks are allocated in physical memory, while kernel stacks are allocated in kernel virtual memory, @@ -349,18 +440,47 @@ i386at_init(void) */ gdt_init(); idt_init(); +#ifndef MACH_HYP int_init(); +#endif /* MACH_HYP */ ldt_init(); ktss_init(); /* Get rid of the temporary direct mapping and flush it out of the TLB. */ +#ifdef MACH_XEN +#ifdef MACH_PSEUDO_PHYS + if (!hyp_mmu_update_pte(kv_to_ma(&kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS)]), 0)) +#else /* MACH_PSEUDO_PHYS */ + if (hyp_do_update_va_mapping(VM_MIN_KERNEL_ADDRESS, 0, UVMF_INVLPG | UVMF_ALL)) +#endif /* MACH_PSEUDO_PHYS */ + printf("couldn't unmap frame 0\n"); +#if PAE +#ifdef MACH_PSEUDO_PHYS + if (!hyp_mmu_update_pte(kv_to_ma(&kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 1]), 0)) +#else /* MACH_PSEUDO_PHYS */ + if (hyp_do_update_va_mapping(VM_MIN_KERNEL_ADDRESS + INTEL_PGBYTES, 0, UVMF_INVLPG | UVMF_ALL)) +#endif /* MACH_PSEUDO_PHYS */ + printf("couldn't unmap frame 1\n"); +#ifdef MACH_PSEUDO_PHYS + if (!hyp_mmu_update_pte(kv_to_ma(&kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 2]), 0)) +#else /* MACH_PSEUDO_PHYS */ + if (hyp_do_update_va_mapping(VM_MIN_KERNEL_ADDRESS + 2*INTEL_PGBYTES, 0, UVMF_INVLPG | UVMF_ALL)) +#endif /* MACH_PSEUDO_PHYS */ + printf("couldn't unmap frame 2\n"); +#endif /* PAE */ + hyp_free_page(0, (void*) VM_MIN_KERNEL_ADDRESS); +#else /* MACH_XEN */ kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS)] = 0; #if PAE kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 1] = 0; + kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 2] = 0; #endif /* PAE */ +#endif /* MACH_XEN */ flush_tlb(); - +#ifdef MACH_XEN + hyp_p2m_init(); +#endif /* MACH_XEN */ /* XXX We'll just use the initialization stack we're already running on as the interrupt stack for now. Later this will have to change, @@ -384,6 +504,15 @@ void c_boot_entry(vm_offset_t bi) printf(version); printf("\n"); +#ifdef MACH_XEN + printf("Running on %s.\n", boot_info.magic); + if (boot_info.flags & SIF_PRIVILEGED) + panic("Mach can't run as dom0."); +#ifdef MACH_PSEUDO_PHYS + mfn_list = (void*)boot_info.mfn_list; +#endif +#else /* MACH_XEN */ + #if MACH_KDB /* * Locate the kernel's symbol table, if the boot loader provided it. @@ -405,6 +534,7 @@ void c_boot_entry(vm_offset_t bi) symtab_size, strtab_size); } #endif /* MACH_KDB */ +#endif /* MACH_XEN */ cpu_type = discover_x86_cpu_type (); @@ -525,6 +655,12 @@ boolean_t init_alloc_aligned(vm_size_t size, vm_offset_t *addrp) { vm_offset_t addr; + +#ifdef MACH_HYP + /* There is none */ + if (!avail_next) + avail_next = _kvtophys(boot_info.pt_base) + (boot_info.nr_pt_frames + 3) * 0x1000; +#else /* MACH_HYP */ extern char start[], end[]; int i; static int wrapped = 0; @@ -543,11 +679,14 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp) : 0; retry: +#endif /* MACH_HYP */ /* Page-align the start address. */ avail_next = round_page(avail_next); +#ifndef MACH_HYP /* Start with memory above 16MB, reserving the low memory for later. */ + /* Don't care on Xen */ if (!wrapped && phys_last_addr > 16 * 1024*1024) { if (avail_next < 16 * 1024*1024) @@ -563,9 +702,15 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp) wrapped = 1; } } +#endif /* MACH_HYP */ /* Check if we have reached the end of memory. */ - if (avail_next == (wrapped ? 16 * 1024*1024 : phys_last_addr)) + if (avail_next == + ( +#ifndef MACH_HYP + wrapped ? 16 * 1024*1024 : +#endif /* MACH_HYP */ + phys_last_addr)) return FALSE; /* Tentatively assign the current location to the caller. */ @@ -575,6 +720,7 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp) and see where that puts us. */ avail_next += size; +#ifndef MACH_HYP /* Skip past the I/O and ROM area. */ if ((avail_next > (boot_info.mem_lower * 0x400)) && (addr < 0x100000)) { @@ -620,6 +766,7 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp) /* XXX string */ } } +#endif /* MACH_HYP */ avail_remaining -= size; @@ -649,6 +796,11 @@ boolean_t pmap_valid_page(x) vm_offset_t x; { /* XXX is this OK? What does it matter for? */ - return (((phys_first_addr <= x) && (x < phys_last_addr)) && - !(((boot_info.mem_lower * 1024) <= x) && (x < 1024*1024))); + return (((phys_first_addr <= x) && (x < phys_last_addr)) +#ifndef MACH_HYP + && !( + ((boot_info.mem_lower * 1024) <= x) && + (x < 1024*1024)) +#endif /* MACH_HYP */ + ); } diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c index c633fd9..ee19c4b 100644 --- a/i386/intel/pmap.c +++ b/i386/intel/pmap.c @@ -77,13 +77,18 @@ #include <vm/vm_user.h> #include <mach/machine/vm_param.h> +#include <mach/xen.h> #include <machine/thread.h> #include <i386/cpu_number.h> #include <i386/proc_reg.h> #include <i386/locore.h> #include <i386/model_dep.h> +#ifdef MACH_PSEUDO_PHYS +#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = pte_entry?pa_to_ma(pte_entry):0; +#else /* MACH_PSEUDO_PHYS */ #define WRITE_PTE(pte_p, pte_entry) *(pte_p) = (pte_entry); +#endif /* MACH_PSEUDO_PHYS */ /* * Private data structures. @@ -325,6 +330,19 @@ lock_data_t pmap_system_lock; #define MAX_TBIS_SIZE 32 /* > this -> TBIA */ /* XXX */ +#ifdef MACH_HYP +#if 1 +#define INVALIDATE_TLB(pmap, s, e) hyp_mmuext_op_void(MMUEXT_TLB_FLUSH_LOCAL) +#else +#define INVALIDATE_TLB(pmap, s, e) do { \ + if (__builtin_constant_p((e) - (s)) \ + && (e) - (s) == PAGE_SIZE) \ + hyp_invlpg((pmap) == kernel_pmap ? kvtolin(s) : (s)); \ + else \ + hyp_mmuext_op_void(MMUEXT_TLB_FLUSH_LOCAL); \ +} while(0) +#endif +#else /* MACH_HYP */ #if 0 /* It is hard to know when a TLB flush becomes less expensive than a bunch of * invlpgs. But it surely is more expensive than just one invlpg. */ @@ -338,6 +356,7 @@ lock_data_t pmap_system_lock; #else #define INVALIDATE_TLB(pmap, s, e) flush_tlb() #endif +#endif /* MACH_HYP */ #if NCPUS > 1 @@ -507,6 +526,10 @@ vm_offset_t pmap_map_bd(virt, start, end, prot) register pt_entry_t template; register pt_entry_t *pte; int spl; +#ifdef MACH_XEN + int n, i = 0; + struct mmu_update update[HYP_BATCH_MMU_UPDATES]; +#endif /* MACH_XEN */ template = pa_to_pte(start) | INTEL_PTE_NCACHE|INTEL_PTE_WTHRU @@ -521,11 +544,30 @@ vm_offset_t pmap_map_bd(virt, start, end, prot) pte = pmap_pte(kernel_pmap, virt); if (pte == PT_ENTRY_NULL) panic("pmap_map_bd: Invalid kernel address\n"); +#ifdef MACH_XEN + update[i].ptr = kv_to_ma(pte); + update[i].val = pa_to_ma(template); + i++; + if (i == HYP_BATCH_MMU_UPDATES) { + hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF); + if (n != i) + panic("couldn't pmap_map_bd\n"); + i = 0; + } +#else /* MACH_XEN */ WRITE_PTE(pte, template) +#endif /* MACH_XEN */ pte_increment_pa(template); virt += PAGE_SIZE; start += PAGE_SIZE; } +#ifdef MACH_XEN + if (i > HYP_BATCH_MMU_UPDATES) + panic("overflowed array in pmap_map_bd"); + hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF); + if (n != i) + panic("couldn't pmap_map_bd\n"); +#endif /* MACH_XEN */ PMAP_READ_UNLOCK(pmap, spl); return(virt); } @@ -583,6 +625,8 @@ void pmap_bootstrap() /* * Allocate and clear a kernel page directory. */ + /* Note: initial Xen mapping holds at least 512kB free mapped page. + * We use that for directly building our linear mapping. */ #if PAE { vm_offset_t addr; @@ -604,6 +648,53 @@ void pmap_bootstrap() kernel_pmap->dirbase[i] = 0; } +#ifdef MACH_XEN + /* + * Xen may only provide as few as 512KB extra bootstrap linear memory, + * which is far from enough to map all available memory, so we need to + * map more bootstrap linear memory. We here map 1 (resp. 4 for PAE) + * other L1 table(s), thus 4MiB extra memory (resp. 8MiB), which is + * enough for a pagetable mapping 4GiB. + */ +#ifdef PAE +#define NSUP_L1 4 +#else +#define NSUP_L1 1 +#endif + pt_entry_t *l1_map[NSUP_L1]; + { + pt_entry_t *base = (pt_entry_t*) boot_info.pt_base; + int i; + int n_l1map; +#ifdef PAE + pt_entry_t *l2_map = (pt_entry_t*) phystokv(pte_to_pa(base[0])); +#else /* PAE */ + pt_entry_t *l2_map = base; +#endif /* PAE */ + for (n_l1map = 0, i = lin2pdenum(VM_MIN_KERNEL_ADDRESS); i < NPTES; i++) { + if (!(l2_map[i] & INTEL_PTE_VALID)) { + struct mmu_update update; + int j, n; + + l1_map[n_l1map] = (pt_entry_t*) phystokv(pmap_grab_page()); + for (j = 0; j < NPTES; j++) + l1_map[n_l1map][j] = intel_ptob(pfn_to_mfn((i - lin2pdenum(VM_MIN_KERNEL_ADDRESS)) * NPTES + j)) | INTEL_PTE_VALID | INTEL_PTE_WRITE; + pmap_set_page_readonly_init(l1_map[n_l1map]); + if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn (l1_map[n_l1map]))) + panic("couldn't pin page %p(%p)", l1_map[n_l1map], kv_to_ma (l1_map[n_l1map])); + update.ptr = kv_to_ma(&l2_map[i]); + update.val = kv_to_ma(l1_map[n_l1map]) | INTEL_PTE_VALID | INTEL_PTE_WRITE; + hyp_mmu_update(kv_to_la(&update), 1, kv_to_la(&n), DOMID_SELF); + if (n != 1) + panic("couldn't complete bootstrap map"); + /* added the last L1 table, can stop */ + if (++n_l1map >= NSUP_L1) + break; + } + } + } +#endif /* MACH_XEN */ + /* * Allocate and set up the kernel page tables. */ @@ -640,19 +731,42 @@ void pmap_bootstrap() WRITE_PTE(pte, 0); } else +#ifdef MACH_XEN + if (va == (vm_offset_t) &hyp_shared_info) + { + *pte = boot_info.shared_info | INTEL_PTE_VALID | INTEL_PTE_WRITE; + va += INTEL_PGBYTES; + } + else +#endif /* MACH_XEN */ { extern char _start[], etext[]; - if ((va >= (vm_offset_t)_start) + if (((va >= (vm_offset_t) _start) && (va + INTEL_PGBYTES <= (vm_offset_t)etext)) +#ifdef MACH_XEN + || (va >= (vm_offset_t) boot_info.pt_base + && (va + INTEL_PGBYTES <= + (vm_offset_t) ptable + INTEL_PGBYTES)) +#endif /* MACH_XEN */ + ) { WRITE_PTE(pte, pa_to_pte(_kvtophys(va)) | INTEL_PTE_VALID | global); } else { - WRITE_PTE(pte, pa_to_pte(_kvtophys(va)) - | INTEL_PTE_VALID | INTEL_PTE_WRITE | global); +#ifdef MACH_XEN + int i; + for (i = 0; i < NSUP_L1; i++) + if (va == (vm_offset_t) l1_map[i]) + WRITE_PTE(pte, pa_to_pte(_kvtophys(va)) + | INTEL_PTE_VALID | global); + if (i == NSUP_L1) +#endif /* MACH_XEN */ + WRITE_PTE(pte, pa_to_pte(_kvtophys(va)) + | INTEL_PTE_VALID | INTEL_PTE_WRITE | global) + } va += INTEL_PGBYTES; } @@ -662,6 +776,11 @@ void pmap_bootstrap() WRITE_PTE(pte, 0); va += INTEL_PGBYTES; } +#ifdef MACH_XEN + pmap_set_page_readonly_init(ptable); + if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn (ptable))) + panic("couldn't pin page %p(%p)\n", ptable, kv_to_ma (ptable)); +#endif /* MACH_XEN */ } } @@ -669,6 +788,100 @@ void pmap_bootstrap() soon after we return from here. */ } +#ifdef MACH_XEN +/* These are only required because of Xen security policies */ + +/* Set back a page read write */ +void pmap_set_page_readwrite(void *_vaddr) { + vm_offset_t vaddr = (vm_offset_t) _vaddr; + vm_offset_t paddr = kvtophys(vaddr); + vm_offset_t canon_vaddr = phystokv(paddr); + if (hyp_do_update_va_mapping (kvtolin(vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID | INTEL_PTE_WRITE, UVMF_NONE)) + panic("couldn't set hiMMU readwrite for addr %p(%p)\n", vaddr, pa_to_ma (paddr)); + if (canon_vaddr != vaddr) + if (hyp_do_update_va_mapping (kvtolin(canon_vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID | INTEL_PTE_WRITE, UVMF_NONE)) + panic("couldn't set hiMMU readwrite for paddr %p(%p)\n", canon_vaddr, pa_to_ma (paddr)); +} + +/* Set a page read only (so as to pin it for instance) */ +void pmap_set_page_readonly(void *_vaddr) { + vm_offset_t vaddr = (vm_offset_t) _vaddr; + vm_offset_t paddr = kvtophys(vaddr); + vm_offset_t canon_vaddr = phystokv(paddr); + if (*pmap_pde(kernel_pmap, vaddr) & INTEL_PTE_VALID) { + if (hyp_do_update_va_mapping (kvtolin(vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID, UVMF_NONE)) + panic("couldn't set hiMMU readonly for vaddr %p(%p)\n", vaddr, pa_to_ma (paddr)); + } + if (canon_vaddr != vaddr && + *pmap_pde(kernel_pmap, canon_vaddr) & INTEL_PTE_VALID) { + if (hyp_do_update_va_mapping (kvtolin(canon_vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID, UVMF_NONE)) + panic("couldn't set hiMMU readonly for vaddr %p canon_vaddr %p paddr %p (%p)\n", vaddr, canon_vaddr, paddr, pa_to_ma (paddr)); + } +} + +/* This needs to be called instead of pmap_set_page_readonly as long as RC3 + * still points to the bootstrap dirbase. */ +void pmap_set_page_readonly_init(void *_vaddr) { + vm_offset_t vaddr = (vm_offset_t) _vaddr; +#if PAE + pt_entry_t *pdpbase = (void*) boot_info.pt_base; + vm_offset_t dirbase = ptetokv(pdpbase[0]); +#else + vm_offset_t dirbase = boot_info.pt_base; +#endif + struct pmap linear_pmap = { + .dirbase = (void*) dirbase, + }; + /* Modify our future kernel map (can't use update_va_mapping for this)... */ + if (*pmap_pde(kernel_pmap, vaddr) & INTEL_PTE_VALID) + if (!hyp_mmu_update_la (kvtolin(vaddr), pa_to_pte (kv_to_ma(vaddr)) | INTEL_PTE_VALID)) + panic("couldn't set hiMMU readonly for vaddr %p(%p)\n", vaddr, kv_to_ma (vaddr)); + /* ... and the bootstrap map. */ + if (*pmap_pde(&linear_pmap, vaddr) & INTEL_PTE_VALID) + if (hyp_do_update_va_mapping (vaddr, pa_to_pte (kv_to_ma(vaddr)) | INTEL_PTE_VALID, UVMF_NONE)) + panic("couldn't set MMU readonly for vaddr %p(%p)\n", vaddr, kv_to_ma (vaddr)); +} + +void pmap_clear_bootstrap_pagetable(pt_entry_t *base) { + int i; + pt_entry_t *dir; + vm_offset_t va = 0; +#if PAE + int j; +#endif /* PAE */ + if (!hyp_mmuext_op_mfn (MMUEXT_UNPIN_TABLE, kv_to_mfn(base))) + panic("pmap_clear_bootstrap_pagetable: couldn't unpin page %p(%p)\n", base, kv_to_ma(base)); +#if PAE + for (j = 0; j < PDPNUM; j++) + { + pt_entry_t pdpe = base[j]; + if (pdpe & INTEL_PTE_VALID) { + dir = (pt_entry_t *) phystokv(pte_to_pa(pdpe)); +#else /* PAE */ + dir = base; +#endif /* PAE */ + for (i = 0; i < NPTES; i++) { + pt_entry_t pde = dir[i]; + unsigned long pfn = mfn_to_pfn(atop(pde)); + void *pgt = (void*) phystokv(ptoa(pfn)); + if (pde & INTEL_PTE_VALID) + hyp_free_page(pfn, pgt); + va += NPTES * INTEL_PGBYTES; + if (va >= HYP_VIRT_START) + break; + } +#if PAE + hyp_free_page(atop(_kvtophys(dir)), dir); + } else + va += NPTES * NPTES * INTEL_PGBYTES; + if (va >= HYP_VIRT_START) + break; + } +#endif /* PAE */ + hyp_free_page(atop(_kvtophys(base)), base); +} +#endif /* MACH_XEN */ + void pmap_virtual_space(startp, endp) vm_offset_t *startp; vm_offset_t *endp; @@ -823,6 +1036,29 @@ pmap_page_table_page_alloc() return pa; } +#ifdef MACH_XEN +void pmap_map_mfn(void *_addr, unsigned long mfn) { + vm_offset_t addr = (vm_offset_t) _addr; + pt_entry_t *pte, *pdp; + vm_offset_t ptp; + if ((pte = pmap_pte(kernel_pmap, addr)) == PT_ENTRY_NULL) { + ptp = phystokv(pmap_page_table_page_alloc()); + pmap_set_page_readonly((void*) ptp); + if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, pa_to_mfn(ptp))) + panic("couldn't pin page %p(%p)\n",ptp,kv_to_ma(ptp)); + pdp = pmap_pde(kernel_pmap, addr); + if (!hyp_mmu_update_pte(kv_to_ma(pdp), + pa_to_pte(kv_to_ma(ptp)) | INTEL_PTE_VALID + | INTEL_PTE_USER + | INTEL_PTE_WRITE)) + panic("%s:%d could not set pde %p(%p) to %p(%p)\n",__FILE__,__LINE__,kvtophys((vm_offset_t)pdp),kv_to_ma(pdp), ptp, pa_to_ma(ptp)); + pte = pmap_pte(kernel_pmap, addr); + } + if (!hyp_mmu_update_pte(kv_to_ma(pte), ptoa(mfn) | INTEL_PTE_VALID | INTEL_PTE_WRITE)) + panic("%s:%d could not set pte %p(%p) to %p(%p)\n",__FILE__,__LINE__,pte,kv_to_ma(pte), ptoa(mfn), pa_to_ma(ptoa(mfn))); +} +#endif /* MACH_XEN */ + /* * Deallocate a page-table page. * The page-table page must have all mappings removed, @@ -884,6 +1120,13 @@ pmap_t pmap_create(size) panic("pmap_create"); memcpy(p->dirbase, kernel_page_dir, PDPNUM * INTEL_PGBYTES); +#ifdef MACH_XEN + { + int i; + for (i = 0; i < PDPNUM; i++) + pmap_set_page_readonly((void*) p->dirbase + i * INTEL_PGBYTES); + } +#endif /* MACH_XEN */ #if PAE if (kmem_alloc_wired(kernel_map, @@ -895,6 +1138,9 @@ pmap_t pmap_create(size) for (i = 0; i < PDPNUM; i++) WRITE_PTE(&p->pdpbase[i], pa_to_pte(kvtophys((vm_offset_t) p->dirbase + i * INTEL_PGBYTES)) | INTEL_PTE_VALID); } +#ifdef MACH_XEN + pmap_set_page_readonly(p->pdpbase); +#endif /* MACH_XEN */ #endif /* PAE */ p->ref_count = 1; @@ -954,14 +1200,29 @@ void pmap_destroy(p) if (m == VM_PAGE_NULL) panic("pmap_destroy: pte page not in object"); vm_page_lock_queues(); +#ifdef MACH_XEN + if (!hyp_mmuext_op_mfn (MMUEXT_UNPIN_TABLE, pa_to_mfn(pa))) + panic("pmap_destroy: couldn't unpin page %p(%p)\n", pa, kv_to_ma(pa)); + pmap_set_page_readwrite((void*) phystokv(pa)); +#endif /* MACH_XEN */ vm_page_free(m); inuse_ptepages_count--; vm_page_unlock_queues(); vm_object_unlock(pmap_object); } } +#ifdef MACH_XEN + { + int i; + for (i = 0; i < PDPNUM; i++) + pmap_set_page_readwrite((void*) p->dirbase + i * INTEL_PGBYTES); + } +#endif /* MACH_XEN */ kmem_free(kernel_map, (vm_offset_t)p->dirbase, PDPNUM * INTEL_PGBYTES); #if PAE +#ifdef MACH_XEN + pmap_set_page_readwrite(p->pdpbase); +#endif /* MACH_XEN */ kmem_free(kernel_map, (vm_offset_t)p->pdpbase, INTEL_PGBYTES); #endif /* PAE */ zfree(pmap_zone, (vm_offset_t) p); @@ -1007,6 +1268,10 @@ void pmap_remove_range(pmap, va, spte, epte) int num_removed, num_unwired; int pai; vm_offset_t pa; +#ifdef MACH_XEN + int n, ii = 0; + struct mmu_update update[HYP_BATCH_MMU_UPDATES]; +#endif /* MACH_XEN */ #if DEBUG_PTE_PAGE if (pmap != kernel_pmap) @@ -1035,7 +1300,19 @@ void pmap_remove_range(pmap, va, spte, epte) register int i = ptes_per_vm_page; register pt_entry_t *lpte = cpte; do { +#ifdef MACH_XEN + update[ii].ptr = kv_to_ma(lpte); + update[ii].val = 0; + ii++; + if (ii == HYP_BATCH_MMU_UPDATES) { + hyp_mmu_update(kvtolin(&update), ii, kvtolin(&n), DOMID_SELF); + if (n != ii) + panic("couldn't pmap_remove_range\n"); + ii = 0; + } +#else /* MACH_XEN */ *lpte = 0; +#endif /* MACH_XEN */ lpte++; } while (--i > 0); continue; @@ -1056,7 +1333,19 @@ void pmap_remove_range(pmap, va, spte, epte) do { pmap_phys_attributes[pai] |= *lpte & (PHYS_MODIFIED|PHYS_REFERENCED); +#ifdef MACH_XEN + update[ii].ptr = kv_to_ma(lpte); + update[ii].val = 0; + ii++; + if (ii == HYP_BATCH_MMU_UPDATES) { + hyp_mmu_update(kvtolin(&update), ii, kvtolin(&n), DOMID_SELF); + if (n != ii) + panic("couldn't pmap_remove_range\n"); + ii = 0; + } +#else /* MACH_XEN */ *lpte = 0; +#endif /* MACH_XEN */ lpte++; } while (--i > 0); } @@ -1102,6 +1391,14 @@ void pmap_remove_range(pmap, va, spte, epte) } } +#ifdef MACH_XEN + if (ii > HYP_BATCH_MMU_UPDATES) + panic("overflowed array in pmap_remove_range"); + hyp_mmu_update(kvtolin(&update), ii, kvtolin(&n), DOMID_SELF); + if (n != ii) + panic("couldn't pmap_remove_range\n"); +#endif /* MACH_XEN */ + /* * Update the counts */ @@ -1246,7 +1543,12 @@ void pmap_page_protect(phys, prot) do { pmap_phys_attributes[pai] |= *pte & (PHYS_MODIFIED|PHYS_REFERENCED); +#ifdef MACH_XEN + if (!hyp_mmu_update_pte(kv_to_ma(pte++), 0)) + panic("%s:%d could not clear pte %p\n",__FILE__,__LINE__,pte-1); +#else /* MACH_XEN */ *pte++ = 0; +#endif /* MACH_XEN */ } while (--i > 0); } @@ -1276,7 +1578,12 @@ void pmap_page_protect(phys, prot) register int i = ptes_per_vm_page; do { +#ifdef MACH_XEN + if (!hyp_mmu_update_pte(kv_to_ma(pte), *pte & ~INTEL_PTE_WRITE)) + panic("%s:%d could not enable write on pte %p\n",__FILE__,__LINE__,pte); +#else /* MACH_XEN */ *pte &= ~INTEL_PTE_WRITE; +#endif /* MACH_XEN */ pte++; } while (--i > 0); @@ -1365,11 +1672,36 @@ void pmap_protect(map, s, e, prot) spte = &spte[ptenum(s)]; epte = &spte[intel_btop(l-s)]; +#ifdef MACH_XEN + int n, i = 0; + struct mmu_update update[HYP_BATCH_MMU_UPDATES]; +#endif /* MACH_XEN */ + while (spte < epte) { - if (*spte & INTEL_PTE_VALID) + if (*spte & INTEL_PTE_VALID) { +#ifdef MACH_XEN + update[i].ptr = kv_to_ma(spte); + update[i].val = *spte & ~INTEL_PTE_WRITE; + i++; + if (i == HYP_BATCH_MMU_UPDATES) { + hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF); + if (n != i) + panic("couldn't pmap_protect\n"); + i = 0; + } +#else /* MACH_XEN */ *spte &= ~INTEL_PTE_WRITE; +#endif /* MACH_XEN */ + } spte++; } +#ifdef MACH_XEN + if (i > HYP_BATCH_MMU_UPDATES) + panic("overflowed array in pmap_protect"); + hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF); + if (n != i) + panic("couldn't pmap_protect\n"); +#endif /* MACH_XEN */ } s = l; pde++; @@ -1412,6 +1744,8 @@ if (pmap_debug) printf("pmap(%x, %x)\n", v, pa); if (pmap == PMAP_NULL) return; + if (pmap == kernel_pmap && (v < kernel_virtual_start || v >= kernel_virtual_end)) + panic("pmap_enter(%p, %p) falls in physical memory area!\n", v, pa); if (pmap == kernel_pmap && (prot & VM_PROT_WRITE) == 0 && !wired /* hack for io_wire */ ) { /* @@ -1502,9 +1836,20 @@ Retry: /*XX pdp = &pmap->dirbase[pdenum(v) & ~(i-1)];*/ pdp = pmap_pde(pmap, v); do { +#ifdef MACH_XEN + pmap_set_page_readonly((void *) ptp); + if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn(ptp))) + panic("couldn't pin page %p(%p)\n",ptp,kv_to_ma(ptp)); + if (!hyp_mmu_update_pte(pa_to_ma(kvtophys((vm_offset_t)pdp)), + pa_to_pte(pa_to_ma(kvtophys(ptp))) | INTEL_PTE_VALID + | INTEL_PTE_USER + | INTEL_PTE_WRITE)) + panic("%s:%d could not set pde %p(%p,%p) to %p(%p,%p) %p\n",__FILE__,__LINE__, pdp, kvtophys((vm_offset_t)pdp), pa_to_ma(kvtophys((vm_offset_t)pdp)), ptp, kvtophys(ptp), pa_to_ma(kvtophys(ptp)), pa_to_pte(kv_to_ma(ptp))); +#else /* MACH_XEN */ *pdp = pa_to_pte(ptp) | INTEL_PTE_VALID | INTEL_PTE_USER | INTEL_PTE_WRITE; +#endif /* MACH_XEN */ pdp++; ptp += INTEL_PGBYTES; } while (--i > 0); @@ -1544,7 +1889,12 @@ Retry: do { if (*pte & INTEL_PTE_MOD) template |= INTEL_PTE_MOD; +#ifdef MACH_XEN + if (!hyp_mmu_update_pte(kv_to_ma(pte), pa_to_ma(template))) + panic("%s:%d could not set pte %p to %p\n",__FILE__,__LINE__,pte,template); +#else /* MACH_XEN */ WRITE_PTE(pte, template) +#endif /* MACH_XEN */ pte++; pte_increment_pa(template); } while (--i > 0); @@ -1649,7 +1999,12 @@ Retry: template |= INTEL_PTE_WIRED; i = ptes_per_vm_page; do { +#ifdef MACH_XEN + if (!(hyp_mmu_update_pte(kv_to_ma(pte), pa_to_ma(template)))) + panic("%s:%d could not set pte %p to %p\n",__FILE__,__LINE__,pte,template); +#else /* MACH_XEN */ WRITE_PTE(pte, template) +#endif /* MACH_XEN */ pte++; pte_increment_pa(template); } while (--i > 0); @@ -1704,7 +2059,12 @@ void pmap_change_wiring(map, v, wired) map->stats.wired_count--; i = ptes_per_vm_page; do { +#ifdef MACH_XEN + if (!(hyp_mmu_update_pte(kv_to_ma(pte), *pte & ~INTEL_PTE_WIRED))) + panic("%s:%d could not wire down pte %p\n",__FILE__,__LINE__,pte); +#else /* MACH_XEN */ *pte &= ~INTEL_PTE_WIRED; +#endif /* MACH_XEN */ pte++; } while (--i > 0); } @@ -1835,7 +2195,17 @@ void pmap_collect(p) register int i = ptes_per_vm_page; register pt_entry_t *pdep = pdp; do { +#ifdef MACH_XEN + unsigned long pte = *pdep; + void *ptable = (void*) ptetokv(pte); + if (!(hyp_mmu_update_pte(pa_to_ma(kvtophys((vm_offset_t)pdep++)), 0))) + panic("%s:%d could not clear pde %p\n",__FILE__,__LINE__,pdep-1); + if (!hyp_mmuext_op_mfn (MMUEXT_UNPIN_TABLE, kv_to_mfn(ptable))) + panic("couldn't unpin page %p(%p)\n", ptable, pa_to_ma(kvtophys((vm_offset_t)ptable))); + pmap_set_page_readwrite(ptable); +#else /* MACH_XEN */ *pdep++ = 0; +#endif /* MACH_XEN */ } while (--i > 0); } @@ -2052,7 +2422,12 @@ phys_attribute_clear(phys, bits) { register int i = ptes_per_vm_page; do { +#ifdef MACH_XEN + if (!(hyp_mmu_update_pte(kv_to_ma(pte), *pte & ~bits))) + panic("%s:%d could not clear bits %lx from pte %p\n",__FILE__,__LINE__,bits,pte); +#else /* MACH_XEN */ *pte &= ~bits; +#endif /* MACH_XEN */ } while (--i > 0); } PMAP_UPDATE_TLBS(pmap, va, va + PAGE_SIZE); @@ -2413,7 +2788,12 @@ pmap_unmap_page_zero () if (!pte) return; assert (pte); +#ifdef MACH_XEN + if (!hyp_mmu_update_pte(kv_to_ma(pte), 0)) + printf("couldn't unmap page 0\n"); +#else /* MACH_XEN */ *pte = 0; INVALIDATE_TLB(kernel_pmap, 0, PAGE_SIZE); +#endif /* MACH_XEN */ } #endif /* i386 */ diff --git a/i386/intel/pmap.h b/i386/intel/pmap.h index 7354a0f..a2b6442 100644 --- a/i386/intel/pmap.h +++ b/i386/intel/pmap.h @@ -126,12 +126,21 @@ typedef unsigned int pt_entry_t; #define INTEL_PTE_NCACHE 0x00000010 #define INTEL_PTE_REF 0x00000020 #define INTEL_PTE_MOD 0x00000040 +#ifdef MACH_XEN +/* Not supported */ +#define INTEL_PTE_GLOBAL 0x00000000 +#else /* MACH_XEN */ #define INTEL_PTE_GLOBAL 0x00000100 +#endif /* MACH_XEN */ #define INTEL_PTE_WIRED 0x00000200 #define INTEL_PTE_PFN 0xfffff000 #define pa_to_pte(a) ((a) & INTEL_PTE_PFN) +#ifdef MACH_PSEUDO_PHYS +#define pte_to_pa(p) ma_to_pa((p) & INTEL_PTE_PFN) +#else /* MACH_PSEUDO_PHYS */ #define pte_to_pa(p) ((p) & INTEL_PTE_PFN) +#endif /* MACH_PSEUDO_PHYS */ #define pte_increment_pa(p) ((p) += INTEL_OFFMASK+1) /* @@ -159,6 +168,14 @@ typedef struct pmap *pmap_t; #define PMAP_NULL ((pmap_t) 0) +#ifdef MACH_XEN +extern void pmap_set_page_readwrite(void *addr); +extern void pmap_set_page_readonly(void *addr); +extern void pmap_set_page_readonly_init(void *addr); +extern void pmap_map_mfn(void *addr, unsigned long mfn); +extern void pmap_clear_bootstrap_pagetable(pt_entry_t *addr); +#endif /* MACH_XEN */ + #if PAE #define set_pmap(pmap) set_cr3(kvtophys((vm_offset_t)(pmap)->pdpbase)) #else /* PAE */ diff --git a/i386/xen/Makefrag.am b/i386/xen/Makefrag.am new file mode 100644 index 0000000..b15b7db --- /dev/null +++ b/i386/xen/Makefrag.am @@ -0,0 +1,33 @@ +# Makefile fragment for the ix86 specific part of the Xen platform. + +# Copyright (C) 2007 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation; either version 2, or (at your option) any later +# version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + +# +# Xen support. +# + +libkernel_a_SOURCES += \ + i386/xen/xen.c \ + i386/xen/xen_locore.S \ + i386/xen/xen_boothdr.S + + +if PLATFORM_xen +gnumach_LINKFLAGS += \ + --defsym _START=0x20000000 \ + -T '$(srcdir)'/i386/ldscript +endif diff --git a/i386/xen/xen.c b/i386/xen/xen.c new file mode 100644 index 0000000..aa3c2cc --- /dev/null +++ b/i386/xen/xen.c @@ -0,0 +1,77 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <kern/printf.h> +#include <kern/debug.h> + +#include <mach/machine/eflags.h> +#include <machine/thread.h> +#include <machine/ipl.h> + +#include <machine/model_dep.h> + +unsigned long cr3; + +struct failsafe_callback_regs { + unsigned int ds; + unsigned int es; + unsigned int fs; + unsigned int gs; + unsigned int ip; + unsigned int cs_and_mask; + unsigned int flags; +}; + +void hyp_failsafe_c_callback(struct failsafe_callback_regs *regs) { + printf("Fail-Safe callback!\n"); + printf("IP: %08X CS: %4X DS: %4X ES: %4X FS: %4X GS: %4X FLAGS %08X MASK %04X\n", regs->ip, regs->cs_and_mask & 0xffff, regs->ds, regs->es, regs->fs, regs->gs, regs->flags, regs->cs_and_mask >> 16); + panic("failsafe"); +} + +extern void clock_interrupt(); +extern void return_to_iret; + +void hypclock_machine_intr(int old_ipl, void *ret_addr, struct i386_interrupt_state *regs, unsigned64_t delta) { + if (ret_addr == &return_to_iret) { + clock_interrupt(delta/1000, /* usec per tick */ + (regs->efl & EFL_VM) || /* user mode */ + ((regs->cs & 0x02) != 0), /* user mode */ + old_ipl == SPL0); /* base priority */ + } else + clock_interrupt(delta/1000, FALSE, FALSE); +} + +void hyp_p2m_init(void) { + unsigned long nb_pfns = atop(phys_last_addr); +#ifdef MACH_PSEUDO_PHYS +#define P2M_PAGE_ENTRIES (PAGE_SIZE / sizeof(unsigned long)) + unsigned long *l3 = (unsigned long *)phystokv(pmap_grab_page()), *l2 = NULL; + unsigned long i; + + for (i = 0; i < (nb_pfns + P2M_PAGE_ENTRIES) / P2M_PAGE_ENTRIES; i++) { + if (!(i % P2M_PAGE_ENTRIES)) { + l2 = (unsigned long *) phystokv(pmap_grab_page()); + l3[i / P2M_PAGE_ENTRIES] = kv_to_mfn(l2); + } + l2[i % P2M_PAGE_ENTRIES] = kv_to_mfn(&mfn_list[i * P2M_PAGE_ENTRIES]); + } + + hyp_shared_info.arch.pfn_to_mfn_frame_list_list = kv_to_mfn(l3); +#endif + hyp_shared_info.arch.max_pfn = nb_pfns; +} diff --git a/i386/xen/xen_boothdr.S b/i386/xen/xen_boothdr.S new file mode 100644 index 0000000..3d84e0c --- /dev/null +++ b/i386/xen/xen_boothdr.S @@ -0,0 +1,167 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <xen/public/elfnote.h> + +.section __xen_guest + .ascii "GUEST_OS=GNU Mach" + .ascii ",GUEST_VERSION=1.3" + .ascii ",XEN_VER=xen-3.0" + .ascii ",VIRT_BASE=0x20000000" + .ascii ",ELF_PADDR_OFFSET=0x20000000" + .ascii ",HYPERCALL_PAGE=0x2" +#if PAE + .ascii ",PAE=yes" +#else + .ascii ",PAE=no" +#endif + .ascii ",LOADER=generic" +#ifndef MACH_PSEUDO_PHYS + .ascii ",FEATURES=!auto_translated_physmap" +#endif + .byte 0 + +/* Macro taken from linux/include/linux/elfnote.h */ +#define ELFNOTE(name, type, desctype, descdata) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz "name" ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection ; + + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "GNU Mach") + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "1.3") + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .long, _START) + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .long, _START) + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .long, start) + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .long, hypcalls) +#if PAE + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") +#else + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "no") +#endif + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, "" +#ifndef MACH_PSEUDO_PHYS + "!auto_translated_physmap" +#endif + ) + +#include <mach/machine/asm.h> + +#include <i386/i386/i386asm.h> + + .text + .globl gdt, ldt + .globl start, _start, gdt +start: +_start: + + /* Switch to our own interrupt stack. */ + movl $(_intstack+INTSTACK_SIZE),%eax + movl %eax,%esp + + /* Reset EFLAGS to a known state. */ + pushl $0 + popf + + /* Push the start_info pointer to be the second argument. */ + subl $KERNELBASE,%esi + pushl %esi + + /* Jump into C code. */ + call EXT(c_boot_entry) + +/* Those need to be aligned on page boundaries. */ +.global hyp_shared_info, hypcalls + + .org (start + 0x1000) +hyp_shared_info: + .org hyp_shared_info + 0x1000 + +/* Labels just for debuggers */ +#define hypcall(name, n) \ + .org hypcalls + n*32 ; \ +__hyp_##name: + +hypcalls: + hypcall(set_trap_table, 0) + hypcall(mmu_update, 1) + hypcall(set_gdt, 2) + hypcall(stack_switch, 3) + hypcall(set_callbacks, 4) + hypcall(fpu_taskswitch, 5) + hypcall(sched_op_compat, 6) + hypcall(platform_op, 7) + hypcall(set_debugreg, 8) + hypcall(get_debugreg, 9) + hypcall(update_descriptor, 10) + hypcall(memory_op, 12) + hypcall(multicall, 13) + hypcall(update_va_mapping, 14) + hypcall(set_timer_op, 15) + hypcall(event_channel_op_compat, 16) + hypcall(xen_version, 17) + hypcall(console_io, 18) + hypcall(physdev_op_compat, 19) + hypcall(grant_table_op, 20) + hypcall(vm_assist, 21) + hypcall(update_va_mapping_otherdomain, 22) + hypcall(iret, 23) + hypcall(vcpu_op, 24) + hypcall(set_segment_base, 25) + hypcall(mmuext_op, 26) + hypcall(acm_op, 27) + hypcall(nmi_op, 28) + hypcall(sched_op, 29) + hypcall(callback_op, 30) + hypcall(xenoprof_op, 31) + hypcall(event_channel_op, 32) + hypcall(physdev_op, 33) + hypcall(hvm_op, 34) + hypcall(sysctl, 35) + hypcall(domctl, 36) + hypcall(kexec_op, 37) + + hypcall(arch_0, 48) + hypcall(arch_1, 49) + hypcall(arch_2, 50) + hypcall(arch_3, 51) + hypcall(arch_4, 52) + hypcall(arch_5, 53) + hypcall(arch_6, 54) + hypcall(arch_7, 55) + + .org hypcalls + 0x1000 + +gdt: + .org gdt + 0x1000 + +ldt: + .org ldt + 0x1000 + +stack: + .long _intstack+INTSTACK_SIZE,0xe021 + .comm _intstack,INTSTACK_SIZE + diff --git a/i386/xen/xen_locore.S b/i386/xen/xen_locore.S new file mode 100644 index 0000000..51f823f --- /dev/null +++ b/i386/xen/xen_locore.S @@ -0,0 +1,110 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <mach/machine/asm.h> + +#include <i386/i386asm.h> +#include <i386/cpu_number.h> +#include <i386/xen.h> + + .data 2 +int_active: + .long 0 + + + .text + .globl hyp_callback, hyp_failsafe_callback + P2ALIGN(TEXT_ALIGN) +hyp_callback: + pushl %eax + jmp EXT(all_intrs) + +ENTRY(interrupt) + incl int_active /* currently handling interrupts */ + call EXT(hyp_c_callback) /* call generic interrupt routine */ + decl int_active /* stopped handling interrupts */ + sti + ret + +/* FIXME: if we're _very_ unlucky, we may be re-interrupted, filling stack + * + * Far from trivial, see mini-os. That said, maybe we could just, before poping + * everything (which is _not_ destructive), save sp into a known place and use + * it+jmp back? + * + * Mmm, there seems to be an iret hypcall that does exactly what we want: + * perform iret, and if IF is set, clear the interrupt mask. + */ + +/* Pfff, we have to check pending interrupts ourselves. Some other DomUs just make an hypercall for retriggering the irq. Not sure it's really easier/faster */ +ENTRY(hyp_sti) + pushl %ebp + movl %esp, %ebp +_hyp_sti: + movb $0,hyp_shared_info+CPU_CLI /* Enable interrupts */ + cmpl $0,int_active /* Check whether we were already checking pending interrupts */ + jz 0f + popl %ebp + ret /* Already active, just return */ +0: + /* Not active, check pending interrupts by hand */ + /* no memory barrier needed on x86 */ + cmpb $0,hyp_shared_info+CPU_PENDING + jne 0f + popl %ebp + ret +0: + movb $0xff,hyp_shared_info+CPU_CLI +1: + pushl %eax + pushl %ecx + pushl %edx + incl int_active /* currently handling interrupts */ + + pushl $0 + pushl $0 + call EXT(hyp_c_callback) + popl %edx + popl %edx + + popl %edx + popl %ecx + popl %eax + decl int_active /* stopped handling interrupts */ + cmpb $0,hyp_shared_info+CPU_PENDING + jne 1b + jmp _hyp_sti + +/* Hypervisor failed to reload segments. Dump them. */ +hyp_failsafe_callback: +#if 1 + /* load sane segments */ + mov %ss, %ax + mov %ax, %ds + mov %ax, %es + mov %ax, %fs + mov %ax, %gs + push %esp + call EXT(hyp_failsafe_c_callback) +#else + popl %ds + popl %es + popl %fs + popl %gs + iret +#endif diff --git a/include/mach/xen.h b/include/mach/xen.h new file mode 100644 index 0000000..f954701 --- /dev/null +++ b/include/mach/xen.h @@ -0,0 +1,80 @@ + +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef _MACH_XEN_H +#define _MACH_XEN_H +#ifdef MACH_XEN +#include <sys/types.h> +#include <xen/public/xen.h> +#include <i386/vm_param.h> + +extern struct start_info boot_info; + +extern volatile struct shared_info hyp_shared_info; + +/* Memory translations */ + +/* pa are physical addresses, from 0 to size of memory */ +/* ma are machine addresses, i.e. _real_ hardware adresses */ +/* la are linear addresses, i.e. without segmentation */ + +/* This might also be useful out of Xen */ +#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS +extern unsigned long la_shift; +#else +#define la_shift LINEAR_MIN_KERNEL_ADDRESS +#endif +#define la_to_pa(a) ((vm_offset_t)(((vm_offset_t)(a)) - la_shift)) +#define pa_to_la(a) ((vm_offset_t)(((vm_offset_t)(a)) + la_shift)) + +#define kv_to_la(a) pa_to_la(_kvtophys(a)) +#define la_to_kv(a) phystokv(la_to_pa(a)) + +#ifdef MACH_PSEUDO_PHYS +#if PAE +#define PFN_LIST MACH2PHYS_VIRT_START_PAE +#else +#define PFN_LIST MACH2PHYS_VIRT_START_NONPAE +#endif +#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS +extern unsigned long *pfn_list; +#else +#define pfn_list ((unsigned long *) PFN_LIST) +#endif +#define mfn_to_pfn(n) (pfn_list[n]) + +extern unsigned long *mfn_list; +#define pfn_to_mfn(n) (mfn_list[n]) +#else +#define mfn_to_pfn(n) (n) +#define pfn_to_mfn(n) (n) +#endif /* MACH_PSEUDO_PHYS */ + +#define pa_to_mfn(a) (pfn_to_mfn(atop(a))) +#define pa_to_ma(a) ({ vm_offset_t __a = (vm_offset_t) (a); ptoa(pa_to_mfn(__a)) | (__a & PAGE_MASK); }) +#define ma_to_pa(a) ({ vm_offset_t __a = (vm_offset_t) (a); (mfn_to_pfn(atop((vm_offset_t)(__a))) << PAGE_SHIFT) | (__a & PAGE_MASK); }) + +#define kv_to_mfn(a) pa_to_mfn(_kvtophys(a)) +#define kv_to_ma(a) pa_to_ma(_kvtophys(a)) +#define mfn_to_kv(mfn) (phystokv(ma_to_pa(ptoa(mfn)))) + +#include <machine/xen.h> + +#endif /* MACH_XEN */ +#endif /* _MACH_XEN_H */ diff --git a/kern/bootstrap.c b/kern/bootstrap.c index b7db2df..cf10d67 100644 --- a/kern/bootstrap.c +++ b/kern/bootstrap.c @@ -63,7 +63,12 @@ #else #include <mach/machine/multiboot.h> #include <mach/exec/exec.h> +#ifdef MACH_XEN +#include <mach/xen.h> +extern struct start_info boot_info; /* XXX put this in a header! */ +#else /* MACH_XEN */ extern struct multiboot_info boot_info; /* XXX put this in a header! */ +#endif /* MACH_XEN */ #endif #include "boot_script.h" @@ -101,10 +106,23 @@ task_insert_send_right( void bootstrap_create() { + int compat; +#ifdef MACH_XEN + struct multiboot_module *bmods = ((struct multiboot_module *) + boot_info.mod_start); + int n; + for (n = 0; bmods[n].mod_start; n++) { + bmods[n].mod_start = kvtophys(bmods[n].mod_start + (vm_offset_t) bmods); + bmods[n].mod_end = kvtophys(bmods[n].mod_end + (vm_offset_t) bmods); + bmods[n].string = kvtophys(bmods[n].string + (vm_offset_t) bmods); + } + boot_info.mods_count = n; + boot_info.flags |= MULTIBOOT_MODS; +#else /* MACH_XEN */ struct multiboot_module *bmods = ((struct multiboot_module *) phystokv(boot_info.mods_addr)); - int compat; +#endif /* MACH_XEN */ if (!(boot_info.flags & MULTIBOOT_MODS) || (boot_info.mods_count == 0)) panic ("No bootstrap code loaded with the kernel!"); diff --git a/kern/debug.c b/kern/debug.c index 67b04ad..57a8126 100644 --- a/kern/debug.c +++ b/kern/debug.c @@ -24,6 +24,8 @@ * the rights to redistribute these changes. */ +#include <mach/xen.h> + #include <kern/printf.h> #include <stdarg.h> @@ -164,6 +166,9 @@ panic(const char *s, ...) #if MACH_KDB Debugger("panic"); #else +# ifdef MACH_HYP + hyp_crash(); +# else /* Give the user time to see the message */ { int i = 1000; /* seconds */ @@ -172,6 +177,7 @@ panic(const char *s, ...) } halt_all_cpus (reboot_on_panic); +# endif /* MACH_HYP */ #endif } diff --git a/linux/dev/include/asm-i386/segment.h b/linux/dev/include/asm-i386/segment.h index c7b3ff5..300ba53 100644 --- a/linux/dev/include/asm-i386/segment.h +++ b/linux/dev/include/asm-i386/segment.h @@ -3,8 +3,13 @@ #ifdef MACH +#ifdef MACH_HYP +#define KERNEL_CS 0x09 +#define KERNEL_DS 0x11 +#else /* MACH_HYP */ #define KERNEL_CS 0x08 #define KERNEL_DS 0x10 +#endif /* MACH_HYP */ #define USER_CS 0x17 #define USER_DS 0x1F diff --git a/xen/Makefrag.am b/xen/Makefrag.am new file mode 100644 index 0000000..c5792a9 --- /dev/null +++ b/xen/Makefrag.am @@ -0,0 +1,32 @@ +# Makefile fragment for the Xen platform. + +# Copyright (C) 2007 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation; either version 2, or (at your option) any later +# version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + +# +# Xen support. +# + +libkernel_a_SOURCES += \ + xen/block.c \ + xen/console.c \ + xen/evt.c \ + xen/grant.c \ + xen/net.c \ + xen/ring.c \ + xen/store.c \ + xen/time.c \ + xen/xen.c diff --git a/xen/block.c b/xen/block.c new file mode 100644 index 0000000..3c188bf --- /dev/null +++ b/xen/block.c @@ -0,0 +1,689 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <mach/mig_errors.h> +#include <ipc/ipc_port.h> +#include <ipc/ipc_space.h> +#include <vm/vm_kern.h> +#include <vm/vm_user.h> +#include <device/device_types.h> +#include <device/device_port.h> +#include <device/disk_status.h> +#include <device/device_reply.user.h> +#include <device/device_emul.h> +#include <device/ds_routines.h> +#include <xen/public/io/blkif.h> +#include <xen/evt.h> +#include <string.h> +#include <util/atoi.h> +#include "store.h" +#include "block.h" +#include "grant.h" +#include "ring.h" +#include "xen.h" + +/* Hypervisor part */ + +struct block_data { + struct device device; + char *name; + int open_count; + char *backend; + domid_t domid; + char *vbd; + int handle; + unsigned info; + dev_mode_t mode; + unsigned sector_size; + unsigned long nr_sectors; + ipc_port_t port; + blkif_front_ring_t ring; + evtchn_port_t evt; + simple_lock_data_t lock; + simple_lock_data_t pushlock; +}; + +static int n_vbds; +static struct block_data *vbd_data; + +struct device_emulation_ops hyp_block_emulation_ops; + +static void hyp_block_intr(int unit) { + struct block_data *bd = &vbd_data[unit]; + blkif_response_t *rsp; + int more; + io_return_t *err; + + simple_lock(&bd->lock); + more = RING_HAS_UNCONSUMED_RESPONSES(&bd->ring); + while (more) { + rmb(); /* make sure we see responses */ + rsp = RING_GET_RESPONSE(&bd->ring, bd->ring.rsp_cons++); + err = (void *) (unsigned long) rsp->id; + switch (rsp->status) { + case BLKIF_RSP_ERROR: + *err = D_IO_ERROR; + break; + case BLKIF_RSP_OKAY: + break; + default: + printf("Unrecognized blkif status %d\n", rsp->status); + goto drop; + } + thread_wakeup(err); +drop: + thread_wakeup_one(bd); + RING_FINAL_CHECK_FOR_RESPONSES(&bd->ring, more); + } + simple_unlock(&bd->lock); +} + +#define VBD_PATH "device/vbd" +void hyp_block_init(void) { + char **vbds, **vbd; + char *c; + int i, disk, partition; + int n; + int grant; + char port_name[10]; + char *prefix; + char device_name[32]; + domid_t domid; + evtchn_port_t evt; + hyp_store_transaction_t t; + vm_offset_t addr; + struct block_data *bd; + blkif_sring_t *ring; + + vbds = hyp_store_ls(0, 1, VBD_PATH); + if (!vbds) { + printf("hd: No block device (%s). Hoping you don't need any\n", hyp_store_error); + n_vbds = 0; + return; + } + + n = 0; + for (vbd = vbds; *vbd; vbd++) + n++; + + vbd_data = (void*) kalloc(n * sizeof(*vbd_data)); + if (!vbd_data) { + printf("hd: No memory room for VBD\n"); + n_vbds = 0; + return; + } + n_vbds = n; + + for (n = 0; n < n_vbds; n++) { + bd = &vbd_data[n]; + mach_atoi((u_char *) vbds[n], &bd->handle); + if (bd->handle == MACH_ATOI_DEFAULT) + continue; + + bd->open_count = -2; + bd->vbd = vbds[n]; + + /* Get virtual number. */ + i = hyp_store_read_int(0, 5, VBD_PATH, "/", vbds[n], "/", "virtual-device"); + if (i == -1) + panic("hd: couldn't virtual device of VBD %s\n",vbds[n]); + if ((i >> 28) == 1) { + /* xvd, new format */ + prefix = "xvd"; + disk = (i >> 8) & ((1 << 20) - 1); + partition = i & ((1 << 8) - 1); + } else if ((i >> 8) == 202) { + /* xvd, old format */ + prefix = "xvd"; + disk = (i >> 4) & ((1 << 4) - 1); + partition = i & ((1 << 4) - 1); + } else if ((i >> 8) == 8) { + /* SCSI */ + prefix = "sd"; + disk = (i >> 4) & ((1 << 4) - 1); + partition = i & ((1 << 4) - 1); + } else if ((i >> 8) == 22) { + /* IDE secondary */ + prefix = "hd"; + disk = ((i >> 6) & ((1 << 2) - 1)) + 2; + partition = i & ((1 << 6) - 1); + } else if ((i >> 8) == 3) { + /* IDE primary */ + prefix = "hd"; + disk = (i >> 6) & ((1 << 2) - 1); + partition = i & ((1 << 6) - 1); + } + if (partition) + sprintf(device_name, "%s%us%u", prefix, disk, partition); + else + sprintf(device_name, "%s%u", prefix, disk); + bd->name = (char*) kalloc(strlen(device_name)); + strcpy(bd->name, device_name); + + /* Get domain id of backend driver. */ + i = hyp_store_read_int(0, 5, VBD_PATH, "/", vbds[n], "/", "backend-id"); + if (i == -1) + panic("%s: couldn't read backend domid (%s)", device_name, hyp_store_error); + bd->domid = domid = i; + + do { + t = hyp_store_transaction_start(); + + /* Get a page for ring */ + if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS) + panic("%s: couldn't allocate space for store ring\n", device_name); + ring = (void*) addr; + SHARED_RING_INIT(ring); + FRONT_RING_INIT(&bd->ring, ring, PAGE_SIZE); + grant = hyp_grant_give(domid, atop(kvtophys(addr)), 0); + + /* and give it to backend. */ + i = sprintf(port_name, "%u", grant); + c = hyp_store_write(t, port_name, 5, VBD_PATH, "/", vbds[n], "/", "ring-ref"); + if (!c) + panic("%s: couldn't store ring reference (%s)", device_name, hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + + /* Allocate an event channel and give it to backend. */ + bd->evt = evt = hyp_event_channel_alloc(domid); + hyp_evt_handler(evt, hyp_block_intr, n, SPL7); + i = sprintf(port_name, "%lu", evt); + c = hyp_store_write(t, port_name, 5, VBD_PATH, "/", vbds[n], "/", "event-channel"); + if (!c) + panic("%s: couldn't store event channel (%s)", device_name, hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + c = hyp_store_write(t, hyp_store_state_initialized, 5, VBD_PATH, "/", vbds[n], "/", "state"); + if (!c) + panic("%s: couldn't store state (%s)", device_name, hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + } while (!hyp_store_transaction_stop(t)); + + c = hyp_store_read(0, 5, VBD_PATH, "/", vbds[n], "/", "backend"); + if (!c) + panic("%s: couldn't get path to backend (%s)", device_name, hyp_store_error); + bd->backend = c; + + while(1) { + i = hyp_store_read_int(0, 3, bd->backend, "/", "state"); + if (i == MACH_ATOI_DEFAULT) + panic("can't read state from %s", bd->backend); + if (i == XenbusStateConnected) + break; + hyp_yield(); + } + + i = hyp_store_read_int(0, 3, bd->backend, "/", "sectors"); + if (i == -1) + panic("%s: couldn't get number of sectors (%s)", device_name, hyp_store_error); + bd->nr_sectors = i; + + i = hyp_store_read_int(0, 3, bd->backend, "/", "sector-size"); + if (i == -1) + panic("%s: couldn't get sector size (%s)", device_name, hyp_store_error); + if (i & ~(2*(i-1)+1)) + panic("sector size %d is not a power of 2\n", i); + if (i > PAGE_SIZE || PAGE_SIZE % i != 0) + panic("%s: couldn't handle sector size %d with pages of size %d\n", device_name, i, PAGE_SIZE); + bd->sector_size = i; + + i = hyp_store_read_int(0, 3, bd->backend, "/", "info"); + if (i == -1) + panic("%s: couldn't get info (%s)", device_name, hyp_store_error); + bd->info = i; + + c = hyp_store_read(0, 3, bd->backend, "/", "mode"); + if (!c) + panic("%s: couldn't get backend's mode (%s)", device_name, hyp_store_error); + if ((c[0] == 'w') && !(bd->info & VDISK_READONLY)) + bd->mode = D_READ|D_WRITE; + else + bd->mode = D_READ; + + c = hyp_store_read(0, 3, bd->backend, "/", "params"); + if (!c) + panic("%s: couldn't get backend's real device (%s)", device_name, hyp_store_error); + + /* TODO: change suffix */ + printf("%s: dom%d's VBD %s (%s,%c%s) %ldMB\n", device_name, domid, + vbds[n], c, bd->mode & D_WRITE ? 'w' : 'r', + bd->info & VDISK_CDROM ? ", cdrom" : "", + bd->nr_sectors / ((1<<20) / 512)); + kfree((vm_offset_t) c, strlen(c)+1); + + c = hyp_store_write(0, hyp_store_state_connected, 5, VBD_PATH, "/", bd->vbd, "/", "state"); + if (!c) + panic("couldn't store state for %s (%s)", device_name, hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + + bd->open_count = -1; + bd->device.emul_ops = &hyp_block_emulation_ops; + bd->device.emul_data = bd; + simple_lock_init(&bd->lock); + simple_lock_init(&bd->pushlock); + } +} + +static ipc_port_t +dev_to_port(void *d) +{ + struct block_data *b = d; + if (!d) + return IP_NULL; + return ipc_port_make_send(b->port); +} + +static int +device_close(void *devp) +{ + struct block_data *bd = devp; + if (--bd->open_count < 0) + panic("too many closes on %s", bd->name); + printf("close, %s count %d\n", bd->name, bd->open_count); + if (bd->open_count) + return 0; + ipc_kobject_set(bd->port, IKO_NULL, IKOT_NONE); + ipc_port_dealloc_kernel(bd->port); + return 0; +} + +static io_return_t +device_open (ipc_port_t reply_port, mach_msg_type_name_t reply_port_type, + dev_mode_t mode, char *name, device_t *devp /* out */) +{ + int i, err = 0; + ipc_port_t port, notify; + struct block_data *bd; + + for (i = 0; i < n_vbds; i++) + if (!strcmp(name, vbd_data[i].name)) + break; + + if (i == n_vbds) + return D_NO_SUCH_DEVICE; + + bd = &vbd_data[i]; + if (bd->open_count == -2) + /* couldn't be initialized */ + return D_NO_SUCH_DEVICE; + + if ((mode & D_WRITE) && !(bd->mode & D_WRITE)) + return D_READ_ONLY; + + if (bd->open_count >= 0) { + *devp = &bd->device ; + bd->open_count++ ; + printf("re-open, %s count %d\n", bd->name, bd->open_count); + return D_SUCCESS; + } + + bd->open_count = 1; + printf("%s count %d\n", bd->name, bd->open_count); + + port = ipc_port_alloc_kernel(); + if (port == IP_NULL) { + err = KERN_RESOURCE_SHORTAGE; + goto out; + } + bd->port = port; + + *devp = &bd->device; + + ipc_kobject_set (port, (ipc_kobject_t) &bd->device, IKOT_DEVICE); + + notify = ipc_port_make_sonce (bd->port); + ip_lock (bd->port); + ipc_port_nsrequest (bd->port, 1, notify, ¬ify); + assert (notify == IP_NULL); + +out: + if (IP_VALID (reply_port)) + ds_device_open_reply (reply_port, reply_port_type, D_SUCCESS, port); + else + device_close(bd); + return MIG_NO_REPLY; +} + +static io_return_t +device_read (void *d, ipc_port_t reply_port, + mach_msg_type_name_t reply_port_type, dev_mode_t mode, + recnum_t bn, int count, io_buf_ptr_t *data, + unsigned *bytes_read) +{ + int resid, amt; + io_return_t err = 0; + vm_page_t pages[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + grant_ref_t gref[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + int nbpages; + vm_map_copy_t copy; + vm_offset_t offset, alloc_offset, o; + vm_object_t object; + vm_page_t m; + vm_size_t len, size; + struct block_data *bd = d; + struct blkif_request *req; + + *data = 0; + *bytes_read = 0; + + if (count < 0) + return D_INVALID_SIZE; + if (count == 0) + return 0; + + /* Allocate an object to hold the data. */ + size = round_page (count); + object = vm_object_allocate (size); + if (! object) + { + err = D_NO_MEMORY; + goto out; + } + alloc_offset = offset = 0; + resid = count; + + while (resid && !err) + { + unsigned reqn; + int i; + int last_sect; + + nbpages = 0; + + /* Determine size of I/O this time around. */ + len = round_page(offset + resid) - offset; + if (len > PAGE_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST) + len = PAGE_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST; + + /* Allocate pages. */ + while (alloc_offset < offset + len) + { + while ((m = vm_page_grab (FALSE)) == 0) + VM_PAGE_WAIT (0); + assert (! m->active && ! m->inactive); + m->busy = TRUE; + assert(nbpages < BLKIF_MAX_SEGMENTS_PER_REQUEST); + pages[nbpages++] = m; + alloc_offset += PAGE_SIZE; + } + + /* Do the read. */ + amt = len; + if (amt > resid) + amt = resid; + + /* allocate a request */ + spl_t spl = splsched(); + while(1) { + simple_lock(&bd->lock); + if (!RING_FULL(&bd->ring)) + break; + thread_sleep(bd, &bd->lock, FALSE); + } + mb(); + reqn = bd->ring.req_prod_pvt++;; + simple_lock(&bd->pushlock); + simple_unlock(&bd->lock); + (void) splx(spl); + + req = RING_GET_REQUEST(&bd->ring, reqn); + req->operation = BLKIF_OP_READ; + req->nr_segments = nbpages; + req->handle = bd->handle; + req->id = (unsigned64_t) (unsigned long) &err; /* pointer on the stack */ + req->sector_number = bn + offset / 512; + for (i = 0; i < nbpages; i++) { + req->seg[i].gref = gref[i] = hyp_grant_give(bd->domid, atop(pages[i]->phys_addr), 0); + req->seg[i].first_sect = 0; + req->seg[i].last_sect = PAGE_SIZE/512 - 1; + } + last_sect = ((amt - 1) & PAGE_MASK) / 512; + req->seg[nbpages-1].last_sect = last_sect; + + memset((void*) phystokv(pages[nbpages-1]->phys_addr + + (last_sect + 1) * 512), + 0, PAGE_SIZE - (last_sect + 1) * 512); + + /* no need for a lock: as long as the request is not pushed, the event won't be triggered */ + assert_wait((event_t) &err, FALSE); + + int notify; + wmb(); /* make sure it sees requests */ + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bd->ring, notify); + if (notify) + hyp_event_channel_send(bd->evt); + simple_unlock(&bd->pushlock); + + thread_block(NULL); + + if (err) + printf("error reading %d bytes at sector %d\n", amt, + bn + offset / 512); + + for (i = 0; i < nbpages; i++) + hyp_grant_takeback(gref[i]); + + /* Compute number of pages to insert in object. */ + o = offset; + + resid -= amt; + if (resid == 0) + offset = o + len; + else + offset += amt; + + /* Add pages to the object. */ + vm_object_lock (object); + for (i = 0; i < nbpages; i++) + { + m = pages[i]; + assert (m->busy); + vm_page_lock_queues (); + PAGE_WAKEUP_DONE (m); + m->dirty = TRUE; + vm_page_insert (m, object, o); + vm_page_unlock_queues (); + o += PAGE_SIZE; + } + vm_object_unlock (object); + } + +out: + if (! err) + err = vm_map_copyin_object (object, 0, round_page (count), ©); + if (! err) + { + *data = (io_buf_ptr_t) copy; + *bytes_read = count - resid; + } + else + vm_object_deallocate (object); + return err; +} + +static io_return_t +device_write(void *d, ipc_port_t reply_port, + mach_msg_type_name_t reply_port_type, dev_mode_t mode, + recnum_t bn, io_buf_ptr_t data, unsigned int count, + int *bytes_written) +{ + io_return_t err = 0; + vm_map_copy_t copy = (vm_map_copy_t) data; + vm_offset_t aligned_buffer = 0; + int copy_npages = atop(round_page(count)); + vm_offset_t phys_addrs[copy_npages]; + struct block_data *bd = d; + blkif_request_t *req; + grant_ref_t gref[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + unsigned reqn, size; + int i, nbpages, j; + + if (!(bd->mode & D_WRITE)) + return D_READ_ONLY; + + if (count == 0) { + vm_map_copy_discard(copy); + return 0; + } + + if (count % bd->sector_size) + return D_INVALID_SIZE; + + if (count > copy->size) + return D_INVALID_SIZE; + + if (copy->type != VM_MAP_COPY_PAGE_LIST || copy->offset & PAGE_MASK) { + /* Unaligned write. Has to copy data before passing it to the backend. */ + kern_return_t kr; + vm_offset_t buffer; + + kr = kmem_alloc(device_io_map, &aligned_buffer, count); + if (kr != KERN_SUCCESS) + return kr; + + kr = vm_map_copyout(device_io_map, &buffer, vm_map_copy_copy(copy)); + if (kr != KERN_SUCCESS) { + kmem_free(device_io_map, aligned_buffer, count); + return kr; + } + + memcpy((void*) aligned_buffer, (void*) buffer, count); + + vm_deallocate (device_io_map, buffer, count); + + for (i = 0; i < copy_npages; i++) + phys_addrs[i] = kvtophys(aligned_buffer + ptoa(i)); + } else { + for (i = 0; i < copy_npages; i++) + phys_addrs[i] = copy->cpy_page_list[i]->phys_addr; + } + + for (i=0; i<copy_npages; i+=nbpages) { + + nbpages = BLKIF_MAX_SEGMENTS_PER_REQUEST; + if (nbpages > copy_npages-i) + nbpages = copy_npages-i; + + /* allocate a request */ + spl_t spl = splsched(); + while(1) { + simple_lock(&bd->lock); + if (!RING_FULL(&bd->ring)) + break; + thread_sleep(bd, &bd->lock, FALSE); + } + mb(); + reqn = bd->ring.req_prod_pvt++;; + simple_lock(&bd->pushlock); + simple_unlock(&bd->lock); + (void) splx(spl); + + req = RING_GET_REQUEST(&bd->ring, reqn); + req->operation = BLKIF_OP_WRITE; + req->nr_segments = nbpages; + req->handle = bd->handle; + req->id = (unsigned64_t) (unsigned long) &err; /* pointer on the stack */ + req->sector_number = bn + i*PAGE_SIZE / 512; + + for (j = 0; j < nbpages; j++) { + req->seg[j].gref = gref[j] = hyp_grant_give(bd->domid, atop(phys_addrs[i + j]), 1); + req->seg[j].first_sect = 0; + size = PAGE_SIZE; + if ((i + j + 1) * PAGE_SIZE > count) + size = count - (i + j) * PAGE_SIZE; + req->seg[j].last_sect = size/512 - 1; + } + + /* no need for a lock: as long as the request is not pushed, the event won't be triggered */ + assert_wait((event_t) &err, FALSE); + + int notify; + wmb(); /* make sure it sees requests */ + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bd->ring, notify); + if (notify) + hyp_event_channel_send(bd->evt); + simple_unlock(&bd->pushlock); + + thread_block(NULL); + + for (j = 0; j < nbpages; j++) + hyp_grant_takeback(gref[j]); + + if (err) { + printf("error writing %d bytes at sector %d\n", count, bn); + break; + } + } + + if (aligned_buffer) + kmem_free(device_io_map, aligned_buffer, count); + + vm_map_copy_discard (copy); + + if (!err) + *bytes_written = count; + + if (IP_VALID(reply_port)) + ds_device_write_reply (reply_port, reply_port_type, err, count); + + return MIG_NO_REPLY; +} + +static io_return_t +device_get_status(void *d, dev_flavor_t flavor, dev_status_t status, + mach_msg_type_number_t *status_count) +{ + struct block_data *bd = d; + + switch (flavor) + { + case DEV_GET_SIZE: + status[DEV_GET_SIZE_DEVICE_SIZE] = (unsigned long long) bd->nr_sectors * 512; + status[DEV_GET_SIZE_RECORD_SIZE] = bd->sector_size; + *status_count = DEV_GET_SIZE_COUNT; + break; + case DEV_GET_RECORDS: + status[DEV_GET_RECORDS_DEVICE_RECORDS] = ((unsigned long long) bd->nr_sectors * 512) / bd->sector_size; + status[DEV_GET_RECORDS_RECORD_SIZE] = bd->sector_size; + *status_count = DEV_GET_RECORDS_COUNT; + break; + default: + printf("TODO: block_%s(%d)\n", __func__, flavor); + return D_INVALID_OPERATION; + } + return D_SUCCESS; +} + +struct device_emulation_ops hyp_block_emulation_ops = { + NULL, /* dereference */ + NULL, /* deallocate */ + dev_to_port, + device_open, + device_close, + device_write, + NULL, /* write_inband */ + device_read, + NULL, /* read_inband */ + NULL, /* set_status */ + device_get_status, + NULL, /* set_filter */ + NULL, /* map */ + NULL, /* no_senders */ + NULL, /* write_trap */ + NULL, /* writev_trap */ +}; diff --git a/xen/block.h b/xen/block.h new file mode 100644 index 0000000..5955968 --- /dev/null +++ b/xen/block.h @@ -0,0 +1,24 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_BLOCK_H +#define XEN_BLOCK_H + +void hyp_block_init(void); + +#endif /* XEN_BLOCK_H */ diff --git a/xen/configfrag.ac b/xen/configfrag.ac new file mode 100644 index 0000000..eb68996 --- /dev/null +++ b/xen/configfrag.ac @@ -0,0 +1,44 @@ +dnl Configure fragment for the Xen platform. + +dnl Copyright (C) 2007 Free Software Foundation, Inc. + +dnl This program is free software; you can redistribute it and/or modify it +dnl under the terms of the GNU General Public License as published by the +dnl Free Software Foundation; either version 2, or (at your option) any later +dnl version. +dnl +dnl This program is distributed in the hope that it will be useful, but +dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +dnl for more details. +dnl +dnl You should have received a copy of the GNU General Public License along +dnl with this program; if not, write to the Free Software Foundation, Inc., +dnl 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + +# +# Xen platform. +# + +[if [ "$host_platform" = xen ]; then] + AC_DEFINE([MACH_XEN], [], [build a MachXen kernel]) + AC_DEFINE([MACH_HYP], [], [be a hypervisor guest]) + AM_CONDITIONAL([PLATFORM_xen], [true]) + + AC_ARG_ENABLE([pseudo-phys], + AS_HELP_STRING([--enable-pseudo-phys], [Pseudo physical support])) + [if [ x"$enable_pseudo_phys" = xno ]; then] + AM_CONDITIONAL([enable_pseudo_phys], [false]) + [else] + AC_DEFINE([MACH_PSEUDO_PHYS], [], [Enable pseudo physical memory support]) + AM_CONDITIONAL([enable_pseudo_phys], [true]) + [fi] + +[else] + AM_CONDITIONAL([PLATFORM_xen], [false]) + AM_CONDITIONAL([enable_pseudo_phys], [false]) +[fi] + +dnl Local Variables: +dnl mode: autoconf +dnl End: diff --git a/xen/console.c b/xen/console.c new file mode 100644 index 0000000..c65e6d2 --- /dev/null +++ b/xen/console.c @@ -0,0 +1,234 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <device/tty.h> +#include <device/cons.h> +#include <machine/pmap.h> +#include <machine/machspl.h> +#include <xen/public/io/console.h> +#include "console.h" +#include "ring.h" +#include "evt.h" + +/* Hypervisor part */ + +decl_simple_lock_data(static, outlock); +decl_simple_lock_data(static, inlock); +static struct xencons_interface *console; +static int kd_pollc; +int kb_mode; /* XXX: actually don't care. */ + +#undef hyp_console_write +void hyp_console_write(const char *str, int len) +{ + hyp_console_io (CONSOLEIO_write, len, kvtolin(str)); +} + +int hypputc(int c) +{ + if (!console) { + char d = c; + hyp_console_io(CONSOLEIO_write, 1, kvtolin(&d)); + } else { + spl_t spl = splhigh(); + simple_lock(&outlock); + while (hyp_ring_smash(console->out, console->out_prod, console->out_cons)) { + hyp_console_put("ring smash\n"); + /* TODO: are we allowed to sleep in putc? */ + hyp_yield(); + } + hyp_ring_cell(console->out, console->out_prod) = c; + wmb(); + console->out_prod++; + hyp_event_channel_send(boot_info.console_evtchn); + simple_unlock(&outlock); + splx(spl); + } + return 0; +} + +int hypcnputc(dev_t dev, int c) +{ + return hypputc(c); +} + +/* get char by polling, used by debugger */ +int hypcngetc(dev_t dev, int wait) +{ + int ret; + if (wait) + while (console->in_prod == console->in_cons) + hyp_yield(); + else + if (console->in_prod == console->in_cons) + return -1; + ret = hyp_ring_cell(console->in, console->in_cons); + mb(); + console->in_cons++; + hyp_event_channel_send(boot_info.console_evtchn); + return ret; +} + +void cnpollc(boolean_t on) { + if (on) { + kd_pollc++; + } else { + --kd_pollc; + } +} + +void kd_setleds1(u_char val) +{ + /* Can't do this. */ +} + +/* Mach part */ + +struct tty hypcn_tty; + +static void hypcnintr(int unit, spl_t spl, void *ret_addr, void *regs) { + struct tty *tp = &hypcn_tty; + if (kd_pollc) + return; + simple_lock(&inlock); + while (console->in_prod != console->in_cons) { + int c = hyp_ring_cell(console->in, console->in_cons); + mb(); + console->in_cons++; +#ifdef MACH_KDB + if (c == (char)'£') + panic("£ pressed"); +#endif /* MACH_KDB */ + if ((tp->t_state & (TS_ISOPEN|TS_WOPEN))) + (*linesw[tp->t_line].l_rint)(c, tp); + } + hyp_event_channel_send(boot_info.console_evtchn); + simple_unlock(&inlock); +} + +int hypcnread(int dev, io_req_t ior) +{ + struct tty *tp = &hypcn_tty; + tp->t_state |= TS_CARR_ON; + return char_read(tp, ior); +} + +int hypcnwrite(int dev, io_req_t ior) +{ + return char_write(&hypcn_tty, ior); +} + +void hypcnstart(struct tty *tp) +{ + spl_t o_pri; + int ch; + unsigned char c; + + if (tp->t_state & TS_TTSTOP) + return; + while (1) { + tp->t_state &= ~TS_BUSY; + if (tp->t_state & TS_TTSTOP) + break; + if ((tp->t_outq.c_cc <= 0) || (ch = getc(&tp->t_outq)) == -1) + break; + c = ch; + o_pri = splsoftclock(); + hypputc(c); + splx(o_pri); + } + if (tp->t_outq.c_cc <= TTLOWAT(tp)) { + tt_write_wakeup(tp); + } +} + +void hypcnstop() +{ +} + +io_return_t hypcngetstat(dev_t dev, int flavor, int *data, unsigned int *count) +{ + return tty_get_status(&hypcn_tty, flavor, data, count); +} + +io_return_t hypcnsetstat(dev_t dev, int flavor, int *data, unsigned int count) +{ + return tty_set_status(&hypcn_tty, flavor, data, count); +} + +int hypcnportdeath(dev_t dev, mach_port_t port) +{ + return tty_portdeath(&hypcn_tty, (ipc_port_t) port); +} + +int hypcnopen(dev_t dev, int flag, io_req_t ior) +{ + struct tty *tp = &hypcn_tty; + spl_t o_pri; + + o_pri = spltty(); + simple_lock(&tp->t_lock); + if (!(tp->t_state & (TS_ISOPEN|TS_WOPEN))) { + /* XXX ttychars allocates memory */ + simple_unlock(&tp->t_lock); + ttychars(tp); + simple_lock(&tp->t_lock); + tp->t_oproc = hypcnstart; + tp->t_stop = hypcnstop; + tp->t_ospeed = tp->t_ispeed = B9600; + tp->t_flags = ODDP|EVENP|ECHO|CRMOD|XTABS; + } + tp->t_state |= TS_CARR_ON; + simple_unlock(&tp->t_lock); + splx(o_pri); + return (char_open(dev, tp, flag, ior)); +} + +int hypcnclose(int dev, int flag) +{ + struct tty *tp = &hypcn_tty; + spl_t s = spltty(); + simple_lock(&tp->t_lock); + ttyclose(tp); + simple_unlock(&tp->t_lock); + splx(s); + return 0; +} + +int hypcnprobe(struct consdev *cp) +{ + struct xencons_interface *my_console; + my_console = (void*) mfn_to_kv(boot_info.console_mfn); + + cp->cn_dev = makedev(0, 0); + cp->cn_pri = CN_INTERNAL; + return 0; +} + +int hypcninit(struct consdev *cp) +{ + if (console) + return 0; + simple_lock_init(&outlock); + simple_lock_init(&inlock); + console = (void*) mfn_to_kv(boot_info.console_mfn); + pmap_set_page_readwrite(console); + hyp_evt_handler(boot_info.console_evtchn, hypcnintr, 0, SPL6); + return 0; +} diff --git a/xen/console.h b/xen/console.h new file mode 100644 index 0000000..fa13dc0 --- /dev/null +++ b/xen/console.h @@ -0,0 +1,33 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_CONSOLE_H +#define XEN_CONSOLE_H +#include <machine/xen.h> +#include <string.h> + +#define hyp_console_write(str, len) hyp_console_io (CONSOLEIO_write, (len), kvtolin(str)) + +#define hyp_console_put(str) ({ \ + const char *__str = (void*) (str); \ + hyp_console_write (__str, strlen (__str)); \ +}) + +extern void hyp_console_init(void); + +#endif /* XEN_CONSOLE_H */ diff --git a/xen/evt.c b/xen/evt.c new file mode 100644 index 0000000..345e1d0 --- /dev/null +++ b/xen/evt.c @@ -0,0 +1,109 @@ +/* + * Copyright (C) 2007 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <string.h> +#include <mach/xen.h> +#include <machine/xen.h> +#include <machine/ipl.h> +#include <machine/gdt.h> +#include <xen/console.h> +#include "evt.h" + +#define NEVNT (sizeof(unsigned long) * sizeof(unsigned long) * 8) +int int_mask[NSPL]; + +spl_t curr_ipl; + +void (*ivect[NEVNT])(); +int intpri[NEVNT]; +int iunit[NEVNT]; + +void hyp_c_callback(void *ret_addr, void *regs) +{ + int i, j, n; + int cpu = 0; + unsigned long pending_sel; + + hyp_shared_info.vcpu_info[cpu].evtchn_upcall_pending = 0; + /* no need for a barrier on x86, xchg is already one */ +#if !(defined(__i386__) || defined(__x86_64__)) + wmb(); +#endif + while ((pending_sel = xchgl(&hyp_shared_info.vcpu_info[cpu].evtchn_pending_sel, 0))) { + + for (i = 0; pending_sel; i++, pending_sel >>= 1) { + unsigned long pending; + + if (!(pending_sel & 1)) + continue; + + while ((pending = (hyp_shared_info.evtchn_pending[i] & ~hyp_shared_info.evtchn_mask[i]))) { + + n = i * sizeof(unsigned long); + for (j = 0; pending; j++, n++, pending >>= 1) { + if (!(pending & 1)) + continue; + + if (ivect[n]) { + spl_t spl = splx(intpri[n]); + asm ("lock; andl %1,%0":"=m"(hyp_shared_info.evtchn_pending[i]):"r"(~(1<<j))); + ivect[n](iunit[n], spl, ret_addr, regs); + splx_cli(spl); + } else { + printf("warning: lost unbound event %d\n", n); + asm ("lock; andl %1,%0":"=m"(hyp_shared_info.evtchn_pending[i]):"r"(~(1<<j))); + } + } + } + } + } +} + +void form_int_mask(void) +{ + unsigned int i, j, bit, mask; + + for (i=SPL0; i < NSPL; i++) { + for (j=0x00, bit=0x01, mask = 0; j < NEVNT; j++, bit<<=1) + if (intpri[j] <= i) + mask |= bit; + int_mask[i] = mask; + } +} + +extern void hyp_callback(void); +extern void hyp_failsafe_callback(void); + +void hyp_intrinit() { + form_int_mask(); + curr_ipl = SPLHI; + hyp_shared_info.evtchn_mask[0] = int_mask[SPLHI]; + hyp_set_callbacks(KERNEL_CS, hyp_callback, + KERNEL_CS, hyp_failsafe_callback); +} + +void hyp_evt_handler(evtchn_port_t port, void (*handler)(), int unit, spl_t spl) { + if (port > NEVNT) + panic("event channel port %d > %d not supported\n", port, NEVNT); + intpri[port] = spl; + iunit[port] = unit; + form_int_mask(); + wmb(); + ivect[port] = handler; +} diff --git a/xen/evt.h b/xen/evt.h new file mode 100644 index 0000000..a583977 --- /dev/null +++ b/xen/evt.h @@ -0,0 +1,29 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_EVT_H +#define XEN_EVT_H + +#include <machine/spl.h> + +void hyp_intrinit(void); +void form_int_mask(void); +void hyp_evt_handler(evtchn_port_t port, void (*handler)(), int unit, spl_t spl); +void hyp_c_callback(void *ret_addr, void *regs); + +#endif /* XEN_EVT_H */ diff --git a/xen/grant.c b/xen/grant.c new file mode 100644 index 0000000..505d202 --- /dev/null +++ b/xen/grant.c @@ -0,0 +1,142 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <mach/vm_param.h> +#include <machine/spl.h> +#include <vm/pmap.h> +#include <vm/vm_map.h> +#include <vm/vm_kern.h> +#include "grant.h" + +#define NR_RESERVED_ENTRIES 8 +#define NR_GRANT_PAGES 4 + +decl_simple_lock_data(static,lock); +static struct grant_entry *grants; +static vm_map_entry_t grants_map_entry; +static int last_grant = NR_RESERVED_ENTRIES; + +static grant_ref_t free_grants = -1; + +static grant_ref_t grant_alloc(void) { + grant_ref_t grant; + if (free_grants != -1) { + grant = free_grants; + free_grants = grants[grant].frame; + } else { + grant = last_grant++; + if (grant == (NR_GRANT_PAGES * PAGE_SIZE)/sizeof(*grants)) + panic("not enough grant entries, increase NR_GRANT_PAGES"); + } + return grant; +} + +static void grant_free(grant_ref_t grant) { + grants[grant].frame = free_grants; + free_grants = grant; +} + +static grant_ref_t grant_set(domid_t domid, unsigned long mfn, uint16_t flags) { + spl_t spl = splhigh(); + simple_lock(&lock); + + grant_ref_t grant = grant_alloc(); + grants[grant].domid = domid; + grants[grant].frame = mfn; + wmb(); + grants[grant].flags = flags; + + simple_unlock(&lock); + splx(spl); + return grant; +} + +grant_ref_t hyp_grant_give(domid_t domid, unsigned long frame, int readonly) { + return grant_set(domid, pfn_to_mfn(frame), + GTF_permit_access | (readonly ? GTF_readonly : 0)); +} + +grant_ref_t hyp_grant_accept_transfer(domid_t domid, unsigned long frame) { + return grant_set(domid, frame, GTF_accept_transfer); +} + +unsigned long hyp_grant_finish_transfer(grant_ref_t grant) { + unsigned long frame; + spl_t spl = splhigh(); + simple_lock(&lock); + + if (!(grants[grant].flags & GTF_transfer_committed)) + panic("grant transfer %x not committed\n", grant); + while (!(grants[grant].flags & GTF_transfer_completed)) + machine_relax(); + rmb(); + frame = grants[grant].frame; + grant_free(grant); + + simple_unlock(&lock); + splx(spl); + return frame; +} + +void hyp_grant_takeback(grant_ref_t grant) { + spl_t spl = splhigh(); + simple_lock(&lock); + + if (grants[grant].flags & (GTF_reading|GTF_writing)) + panic("grant %d still in use (%lx)\n", grant, grants[grant].flags); + + /* Note: this is not safe, a cmpxchg is needed, see grant_table.h */ + grants[grant].flags = 0; + wmb(); + + grant_free(grant); + + simple_unlock(&lock); + splx(spl); +} + +void *hyp_grant_address(grant_ref_t grant) { + return &grants[grant]; +} + +void hyp_grant_init(void) { + struct gnttab_setup_table setup; + unsigned long frame[NR_GRANT_PAGES]; + long ret; + int i; + vm_offset_t addr; + + setup.dom = DOMID_SELF; + setup.nr_frames = NR_GRANT_PAGES; + setup.frame_list = (void*) kvtolin(frame); + + ret = hyp_grant_table_op(GNTTABOP_setup_table, kvtolin(&setup), 1); + if (ret) + panic("setup grant table error %d", ret); + if (setup.status) + panic("setup grant table: %d\n", setup.status); + + simple_lock_init(&lock); + vm_map_find_entry(kernel_map, &addr, NR_GRANT_PAGES * PAGE_SIZE, + (vm_offset_t) 0, kernel_object, &grants_map_entry); + grants = (void*) addr; + + for (i = 0; i < NR_GRANT_PAGES; i++) + pmap_map_mfn((void *)grants + i * PAGE_SIZE, frame[i]); +} diff --git a/xen/grant.h b/xen/grant.h new file mode 100644 index 0000000..ff8617d --- /dev/null +++ b/xen/grant.h @@ -0,0 +1,33 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_GRANT_H +#define XEN_GRANT_H +#include <sys/types.h> +#include <machine/xen.h> +#include <xen/public/xen.h> +#include <xen/public/grant_table.h> + +void hyp_grant_init(void); +grant_ref_t hyp_grant_give(domid_t domid, unsigned long frame_nr, int readonly); +void hyp_grant_takeback(grant_ref_t grant); +grant_ref_t hyp_grant_accept_transfer(domid_t domid, unsigned long frame_nr); +unsigned long hyp_grant_finish_transfer(grant_ref_t grant); +void *hyp_grant_address(grant_ref_t grant); + +#endif /* XEN_GRANT_H */ diff --git a/xen/net.c b/xen/net.c new file mode 100644 index 0000000..1bb217b --- /dev/null +++ b/xen/net.c @@ -0,0 +1,665 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <mach/mig_errors.h> +#include <ipc/ipc_port.h> +#include <ipc/ipc_space.h> +#include <vm/vm_kern.h> +#include <device/device_types.h> +#include <device/device_port.h> +#include <device/if_hdr.h> +#include <device/if_ether.h> +#include <device/net_io.h> +#include <device/device_reply.user.h> +#include <device/device_emul.h> +#include <intel/pmap.h> +#include <xen/public/io/netif.h> +#include <xen/public/memory.h> +#include <string.h> +#include <util/atoi.h> +#include "evt.h" +#include "store.h" +#include "net.h" +#include "grant.h" +#include "ring.h" +#include "time.h" +#include "xen.h" + +/* Hypervisor part */ + +#define ADDRESS_SIZE 6 +#define WINDOW __RING_SIZE((netif_rx_sring_t*)0, PAGE_SIZE) + +/* TODO: use rx-copy instead, since we're memcpying anyway */ + +/* Are we paranoid enough to not leak anything to backend? */ +static const int paranoia = 0; + +struct net_data { + struct device device; + struct ifnet ifnet; + int open_count; + char *backend; + domid_t domid; + char *vif; + u_char address[ADDRESS_SIZE]; + int handle; + ipc_port_t port; + netif_tx_front_ring_t tx; + netif_rx_front_ring_t rx; + void *rx_buf[WINDOW]; + grant_ref_t rx_buf_gnt[WINDOW]; + unsigned long rx_buf_pfn[WINDOW]; + evtchn_port_t evt; + simple_lock_data_t lock; + simple_lock_data_t pushlock; +}; + +static int n_vifs; +static struct net_data *vif_data; + +struct device_emulation_ops hyp_net_emulation_ops; + +int hextoi(char *cp, int *nump) +{ + int number; + char *original; + char c; + + original = cp; + for (number = 0, c = *cp | 0x20; (('0' <= c) && (c <= '9')) || (('a' <= c) && (c <= 'f')); c = *(++cp)) { + number *= 16; + if (c <= '9') + number += c - '0'; + else + number += c - 'a' + 10; + } + if (original == cp) + *nump = -1; + else + *nump = number; + return(cp - original); +} + +static void enqueue_rx_buf(struct net_data *nd, int number) { + unsigned reqn = nd->rx.req_prod_pvt++; + netif_rx_request_t *req = RING_GET_REQUEST(&nd->rx, reqn); + + assert(number < WINDOW); + + req->id = number; + req->gref = nd->rx_buf_gnt[number] = hyp_grant_accept_transfer(nd->domid, nd->rx_buf_pfn[number]); + + /* give back page */ + hyp_free_page(nd->rx_buf_pfn[number], nd->rx_buf[number]); +} + +static void hyp_net_intr(int unit) { + ipc_kmsg_t kmsg; + struct ether_header *eh; + struct packet_header *ph; + netif_rx_response_t *rx_rsp; + netif_tx_response_t *tx_rsp; + void *data; + int len, more; + struct net_data *nd = &vif_data[unit]; + + simple_lock(&nd->lock); + if ((nd->rx.sring->rsp_prod - nd->rx.rsp_cons) >= (WINDOW*3)/4) + printf("window %ld a bit small!\n", WINDOW); + + more = RING_HAS_UNCONSUMED_RESPONSES(&nd->rx); + while (more) { + rmb(); /* make sure we see responses */ + rx_rsp = RING_GET_RESPONSE(&nd->rx, nd->rx.rsp_cons++); + + unsigned number = rx_rsp->id; + assert(number < WINDOW); + unsigned long mfn = hyp_grant_finish_transfer(nd->rx_buf_gnt[number]); + +#ifdef MACH_PSEUDO_PHYS + mfn_list[nd->rx_buf_pfn[number]] = mfn; +#endif /* MACH_PSEUDO_PHYS */ + pmap_map_mfn(nd->rx_buf[number], mfn); + + kmsg = net_kmsg_get(); + if (!kmsg) + /* gasp! Drop */ + goto drop; + + if (rx_rsp->status <= 0) + switch (rx_rsp->status) { + case NETIF_RSP_DROPPED: + printf("Packet dropped\n"); + goto drop; + case NETIF_RSP_ERROR: + panic("Packet error"); + case 0: + printf("nul packet\n"); + goto drop; + default: + printf("Unknown error %d\n", rx_rsp->status); + goto drop; + } + + data = nd->rx_buf[number] + rx_rsp->offset; + len = rx_rsp->status; + + eh = (void*) (net_kmsg(kmsg)->header); + ph = (void*) (net_kmsg(kmsg)->packet); + memcpy(eh, data, sizeof (struct ether_header)); + memcpy(ph + 1, data + sizeof (struct ether_header), len - sizeof(struct ether_header)); + RING_FINAL_CHECK_FOR_RESPONSES(&nd->rx, more); + enqueue_rx_buf(nd, number); + ph->type = eh->ether_type; + ph->length = len - sizeof(struct ether_header) + sizeof (struct packet_header); + + net_kmsg(kmsg)->sent = FALSE; /* Mark packet as received. */ + + net_packet(&nd->ifnet, kmsg, ph->length, ethernet_priority(kmsg)); + continue; + +drop: + RING_FINAL_CHECK_FOR_RESPONSES(&nd->rx, more); + enqueue_rx_buf(nd, number); + } + + /* commit new requests */ + int notify; + wmb(); /* make sure it sees requests */ + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&nd->rx, notify); + if (notify) + hyp_event_channel_send(nd->evt); + + /* Now the tx side */ + more = RING_HAS_UNCONSUMED_RESPONSES(&nd->tx); + spl_t s = splsched (); + while (more) { + rmb(); /* make sure we see responses */ + tx_rsp = RING_GET_RESPONSE(&nd->tx, nd->tx.rsp_cons++); + switch (tx_rsp->status) { + case NETIF_RSP_DROPPED: + printf("Packet dropped\n"); + break; + case NETIF_RSP_ERROR: + panic("Packet error"); + case NETIF_RSP_OKAY: + break; + default: + printf("Unknown error %d\n", tx_rsp->status); + goto drop_tx; + } + thread_wakeup((event_t) hyp_grant_address(tx_rsp->id)); +drop_tx: + thread_wakeup_one(nd); + RING_FINAL_CHECK_FOR_RESPONSES(&nd->tx, more); + } + splx(s); + + simple_unlock(&nd->lock); +} + +#define VIF_PATH "device/vif" +void hyp_net_init(void) { + char **vifs, **vif; + char *c; + int i; + int n; + int grant; + char port_name[10]; + domid_t domid; + evtchn_port_t evt; + hyp_store_transaction_t t; + vm_offset_t addr; + struct net_data *nd; + struct ifnet *ifp; + netif_tx_sring_t *tx_ring; + netif_rx_sring_t *rx_ring; + + vifs = hyp_store_ls(0, 1, VIF_PATH); + if (!vifs) { + printf("eth: No net device (%s). Hoping you don't need any\n", hyp_store_error); + n_vifs = 0; + return; + } + + n = 0; + for (vif = vifs; *vif; vif++) + n++; + + vif_data = (void*) kalloc(n * sizeof(*vif_data)); + if (!vif_data) { + printf("eth: No memory room for VIF\n"); + n_vifs = 0; + return; + } + n_vifs = n; + + for (n = 0; n < n_vifs; n++) { + nd = &vif_data[n]; + mach_atoi((u_char *) vifs[n], &nd->handle); + if (nd->handle == MACH_ATOI_DEFAULT) + continue; + + nd->open_count = -2; + nd->vif = vifs[n]; + + /* Get domain id of frontend driver. */ + i = hyp_store_read_int(0, 5, VIF_PATH, "/", vifs[n], "/", "backend-id"); + if (i == -1) + panic("eth: couldn't read frontend domid of VIF %s (%s)",vifs[n], hyp_store_error); + nd->domid = domid = i; + + do { + t = hyp_store_transaction_start(); + + /* Get a page for tx_ring */ + if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS) + panic("eth: couldn't allocate space for store tx_ring"); + tx_ring = (void*) addr; + SHARED_RING_INIT(tx_ring); + FRONT_RING_INIT(&nd->tx, tx_ring, PAGE_SIZE); + grant = hyp_grant_give(domid, atop(kvtophys(addr)), 0); + + /* and give it to backend. */ + i = sprintf(port_name, "%u", grant); + c = hyp_store_write(t, port_name, 5, VIF_PATH, "/", vifs[n], "/", "tx-ring-ref"); + if (!c) + panic("eth: couldn't store tx_ring reference for VIF %s (%s)", vifs[n], hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + + /* Get a page for rx_ring */ + if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS) + panic("eth: couldn't allocate space for store tx_ring"); + rx_ring = (void*) addr; + SHARED_RING_INIT(rx_ring); + FRONT_RING_INIT(&nd->rx, rx_ring, PAGE_SIZE); + grant = hyp_grant_give(domid, atop(kvtophys(addr)), 0); + + /* and give it to backend. */ + i = sprintf(port_name, "%u", grant); + c = hyp_store_write(t, port_name, 5, VIF_PATH, "/", vifs[n], "/", "rx-ring-ref"); + if (!c) + panic("eth: couldn't store rx_ring reference for VIF %s (%s)", vifs[n], hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + + /* tell we need csums. */ + c = hyp_store_write(t, "1", 5, VIF_PATH, "/", vifs[n], "/", "feature-no-csum-offload"); + if (!c) + panic("eth: couldn't store feature-no-csum-offload reference for VIF %s (%s)", vifs[n], hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + + /* Allocate an event channel and give it to backend. */ + nd->evt = evt = hyp_event_channel_alloc(domid); + i = sprintf(port_name, "%lu", evt); + c = hyp_store_write(t, port_name, 5, VIF_PATH, "/", vifs[n], "/", "event-channel"); + if (!c) + panic("eth: couldn't store event channel for VIF %s (%s)", vifs[n], hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + c = hyp_store_write(t, hyp_store_state_initialized, 5, VIF_PATH, "/", vifs[n], "/", "state"); + if (!c) + panic("eth: couldn't store state for VIF %s (%s)", vifs[n], hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + } while ((!hyp_store_transaction_stop(t))); + /* TODO randomly wait? */ + + c = hyp_store_read(0, 5, VIF_PATH, "/", vifs[n], "/", "backend"); + if (!c) + panic("eth: couldn't get path to VIF %s backend (%s)", vifs[n], hyp_store_error); + nd->backend = c; + + while(1) { + i = hyp_store_read_int(0, 3, nd->backend, "/", "state"); + if (i == MACH_ATOI_DEFAULT) + panic("can't read state from %s", nd->backend); + if (i == XenbusStateInitWait) + break; + hyp_yield(); + } + + c = hyp_store_read(0, 3, nd->backend, "/", "mac"); + if (!c) + panic("eth: couldn't get VIF %s's mac (%s)", vifs[n], hyp_store_error); + + for (i=0; ; i++) { + int val; + hextoi(&c[3*i], &val); + if (val == -1) + panic("eth: couldn't understand %dth number of VIF %s's mac %s", i, vifs[n], c); + nd->address[i] = val; + if (i==ADDRESS_SIZE-1) + break; + if (c[3*i+2] != ':') + panic("eth: couldn't understand %dth separator of VIF %s's mac %s", i, vifs[n], c); + } + kfree((vm_offset_t) c, strlen(c)+1); + + printf("eth%d: dom%d's VIF %s ", n, domid, vifs[n]); + for (i=0; ; i++) { + printf("%02x", nd->address[i]); + if (i==ADDRESS_SIZE-1) + break; + printf(":"); + } + printf("\n"); + + c = hyp_store_write(0, hyp_store_state_connected, 5, VIF_PATH, "/", nd->vif, "/", "state"); + if (!c) + panic("couldn't store state for eth%d (%s)", nd - vif_data, hyp_store_error); + kfree((vm_offset_t) c, strlen(c)+1); + + /* Get a page for packet reception */ + for (i= 0; i<WINDOW; i++) { + if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS) + panic("eth: couldn't allocate space for store tx_ring"); + nd->rx_buf[i] = (void*)phystokv(kvtophys(addr)); + nd->rx_buf_pfn[i] = atop(kvtophys((vm_offset_t)nd->rx_buf[i])); + if (hyp_do_update_va_mapping(kvtolin(addr), 0, UVMF_INVLPG|UVMF_ALL)) + panic("eth: couldn't clear rx kv buf %d at %p", i, addr); + /* and enqueue it to backend. */ + enqueue_rx_buf(nd, i); + } + int notify; + wmb(); /* make sure it sees requests */ + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&nd->rx, notify); + if (notify) + hyp_event_channel_send(nd->evt); + + + nd->open_count = -1; + nd->device.emul_ops = &hyp_net_emulation_ops; + nd->device.emul_data = nd; + simple_lock_init(&nd->lock); + simple_lock_init(&nd->pushlock); + + ifp = &nd->ifnet; + ifp->if_unit = n; + ifp->if_flags = IFF_UP | IFF_RUNNING; + ifp->if_header_size = 14; + ifp->if_header_format = HDR_ETHERNET; + /* Set to the maximum that we can handle in device_write. */ + ifp->if_mtu = PAGE_SIZE - ifp->if_header_size; + ifp->if_address_size = ADDRESS_SIZE; + ifp->if_address = (void*) nd->address; + if_init_queues (ifp); + + /* Now we can start receiving */ + hyp_evt_handler(evt, hyp_net_intr, n, SPL6); + } +} + +static ipc_port_t +dev_to_port(void *d) +{ + struct net_data *b = d; + if (!d) + return IP_NULL; + return ipc_port_make_send(b->port); +} + +static int +device_close(void *devp) +{ + struct net_data *nd = devp; + if (--nd->open_count < 0) + panic("too many closes on eth%d", nd - vif_data); + printf("close, eth%d count %d\n",nd-vif_data,nd->open_count); + if (nd->open_count) + return 0; + ipc_kobject_set(nd->port, IKO_NULL, IKOT_NONE); + ipc_port_dealloc_kernel(nd->port); + return 0; +} + +static io_return_t +device_open (ipc_port_t reply_port, mach_msg_type_name_t reply_port_type, + dev_mode_t mode, char *name, device_t *devp /* out */) +{ + int i, n, err = 0; + ipc_port_t port, notify; + struct net_data *nd; + + if (name[0] != 'e' || name[1] != 't' || name[2] != 'h' || name[3] < '0' || name[3] > '9') + return D_NO_SUCH_DEVICE; + i = mach_atoi((u_char *) &name[3], &n); + if (n == MACH_ATOI_DEFAULT) + return D_NO_SUCH_DEVICE; + if (name[3 + i]) + return D_NO_SUCH_DEVICE; + if (n >= n_vifs) + return D_NO_SUCH_DEVICE; + nd = &vif_data[n]; + if (nd->open_count == -2) + /* couldn't be initialized */ + return D_NO_SUCH_DEVICE; + + if (nd->open_count >= 0) { + *devp = &nd->device ; + nd->open_count++ ; + printf("re-open, eth%d count %d\n",nd-vif_data,nd->open_count); + return D_SUCCESS; + } + + nd->open_count = 1; + printf("eth%d count %d\n",nd-vif_data,nd->open_count); + + port = ipc_port_alloc_kernel(); + if (port == IP_NULL) { + err = KERN_RESOURCE_SHORTAGE; + goto out; + } + nd->port = port; + + *devp = &nd->device; + + ipc_kobject_set (port, (ipc_kobject_t) &nd->device, IKOT_DEVICE); + + notify = ipc_port_make_sonce (nd->port); + ip_lock (nd->port); + ipc_port_nsrequest (nd->port, 1, notify, ¬ify); + assert (notify == IP_NULL); + +out: + if (IP_VALID (reply_port)) + ds_device_open_reply (reply_port, reply_port_type, D_SUCCESS, dev_to_port(nd)); + else + device_close(nd); + return MIG_NO_REPLY; +} + +static io_return_t +device_write(void *d, ipc_port_t reply_port, + mach_msg_type_name_t reply_port_type, dev_mode_t mode, + recnum_t bn, io_buf_ptr_t data, unsigned int count, + int *bytes_written) +{ + vm_map_copy_t copy = (vm_map_copy_t) data; + grant_ref_t gref; + struct net_data *nd = d; + struct ifnet *ifp = &nd->ifnet; + netif_tx_request_t *req; + unsigned reqn; + vm_offset_t offset; + vm_page_t m; + vm_size_t size; + + /* The maximum that we can handle. */ + assert(ifp->if_header_size + ifp->if_mtu <= PAGE_SIZE); + + if (count < ifp->if_header_size || + count > ifp->if_header_size + ifp->if_mtu) + return D_INVALID_SIZE; + + assert(copy->type == VM_MAP_COPY_PAGE_LIST); + + assert(copy->cpy_npages <= 2); + assert(copy->cpy_npages >= 1); + + offset = copy->offset & PAGE_MASK; + if (paranoia || copy->cpy_npages == 2) { + /* have to copy :/ */ + while ((m = vm_page_grab(FALSE)) == 0) + VM_PAGE_WAIT (0); + assert (! m->active && ! m->inactive); + m->busy = TRUE; + + if (copy->cpy_npages == 1) + size = count; + else + size = PAGE_SIZE - offset; + + memcpy((void*)phystokv(m->phys_addr), (void*)phystokv(copy->cpy_page_list[0]->phys_addr + offset), size); + if (copy->cpy_npages == 2) + memcpy((void*)phystokv(m->phys_addr + size), (void*)phystokv(copy->cpy_page_list[1]->phys_addr), count - size); + + offset = 0; + } else + m = copy->cpy_page_list[0]; + + /* allocate a request */ + spl_t spl = splimp(); + while (1) { + simple_lock(&nd->lock); + if (!RING_FULL(&nd->tx)) + break; + thread_sleep(nd, &nd->lock, FALSE); + } + mb(); + reqn = nd->tx.req_prod_pvt++;; + simple_lock(&nd->pushlock); + simple_unlock(&nd->lock); + (void) splx(spl); + + req = RING_GET_REQUEST(&nd->tx, reqn); + req->gref = gref = hyp_grant_give(nd->domid, atop(m->phys_addr), 1); + req->offset = offset; + req->flags = 0; + req->id = gref; + req->size = count; + + assert_wait(hyp_grant_address(gref), FALSE); + + int notify; + wmb(); /* make sure it sees requests */ + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&nd->tx, notify); + if (notify) + hyp_event_channel_send(nd->evt); + simple_unlock(&nd->pushlock); + + thread_block(NULL); + + hyp_grant_takeback(gref); + + /* Send packet to filters. */ + { + struct packet_header *packet; + struct ether_header *header; + ipc_kmsg_t kmsg; + + kmsg = net_kmsg_get (); + + if (kmsg != IKM_NULL) + { + /* Suitable for Ethernet only. */ + header = (struct ether_header *) (net_kmsg (kmsg)->header); + packet = (struct packet_header *) (net_kmsg (kmsg)->packet); + memcpy (header, (void*)phystokv(m->phys_addr + offset), sizeof (struct ether_header)); + + /* packet is prefixed with a struct packet_header, + see include/device/net_status.h. */ + memcpy (packet + 1, (void*)phystokv(m->phys_addr + offset + sizeof (struct ether_header)), + count - sizeof (struct ether_header)); + packet->length = count - sizeof (struct ether_header) + + sizeof (struct packet_header); + packet->type = header->ether_type; + net_kmsg (kmsg)->sent = TRUE; /* Mark packet as sent. */ + spl_t s = splimp (); + net_packet (&nd->ifnet, kmsg, packet->length, + ethernet_priority (kmsg)); + splx (s); + } + } + + if (paranoia || copy->cpy_npages == 2) + VM_PAGE_FREE(m); + + vm_map_copy_discard (copy); + + *bytes_written = count; + + if (IP_VALID(reply_port)) + ds_device_write_reply (reply_port, reply_port_type, 0, count); + + return MIG_NO_REPLY; +} + +static io_return_t +device_get_status(void *d, dev_flavor_t flavor, dev_status_t status, + mach_msg_type_number_t *status_count) +{ + struct net_data *nd = d; + + return net_getstat (&nd->ifnet, flavor, status, status_count); +} + +static io_return_t +device_set_status(void *d, dev_flavor_t flavor, dev_status_t status, + mach_msg_type_number_t count) +{ + struct net_data *nd = d; + + switch (flavor) + { + default: + printf("TODO: net_%s(%p, 0x%x)\n", __func__, nd, flavor); + return D_INVALID_OPERATION; + } + return D_SUCCESS; +} + +static io_return_t +device_set_filter(void *d, ipc_port_t port, int priority, + filter_t * filter, unsigned filter_count) +{ + struct net_data *nd = d; + + if (!nd) + return D_NO_SUCH_DEVICE; + + return net_set_filter (&nd->ifnet, port, priority, filter, filter_count); +} + +struct device_emulation_ops hyp_net_emulation_ops = { + NULL, /* dereference */ + NULL, /* deallocate */ + dev_to_port, + device_open, + device_close, + device_write, + NULL, /* write_inband */ + NULL, + NULL, /* read_inband */ + device_set_status, /* set_status */ + device_get_status, + device_set_filter, /* set_filter */ + NULL, /* map */ + NULL, /* no_senders */ + NULL, /* write_trap */ + NULL, /* writev_trap */ +}; diff --git a/xen/net.h b/xen/net.h new file mode 100644 index 0000000..6683870 --- /dev/null +++ b/xen/net.h @@ -0,0 +1,24 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_NET_H +#define XEN_NET_H + +void hyp_net_init(void); + +#endif /* XEN_NET_H */ diff --git a/xen/public/COPYING b/xen/public/COPYING new file mode 100644 index 0000000..ffc6d61 --- /dev/null +++ b/xen/public/COPYING @@ -0,0 +1,38 @@ +XEN NOTICE +========== + +This copyright applies to all files within this subdirectory and its +subdirectories: + include/public/*.h + include/public/hvm/*.h + include/public/io/*.h + +The intention is that these files can be freely copied into the source +tree of an operating system when porting that OS to run on Xen. Doing +so does *not* cause the OS to become subject to the terms of the GPL. + +All other files in the Xen source distribution are covered by version +2 of the GNU General Public License except where explicitly stated +otherwise within individual source files. + + -- Keir Fraser (on behalf of the Xen team) + +===================================================================== + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to +deal in the Software without restriction, including without limitation the +rights to use, copy, modify, merge, publish, distribute, sublicense, and/or +sell copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. diff --git a/xen/public/arch-x86/xen-mca.h b/xen/public/arch-x86/xen-mca.h new file mode 100644 index 0000000..103d41f --- /dev/null +++ b/xen/public/arch-x86/xen-mca.h @@ -0,0 +1,279 @@ +/****************************************************************************** + * arch-x86/mca.h + * + * Contributed by Advanced Micro Devices, Inc. + * Author: Christoph Egger <Christoph.Egger@amd.com> + * + * Guest OS machine check interface to x86 Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +/* Full MCA functionality has the following Usecases from the guest side: + * + * Must have's: + * 1. Dom0 and DomU register machine check trap callback handlers + * (already done via "set_trap_table" hypercall) + * 2. Dom0 registers machine check event callback handler + * (doable via EVTCHNOP_bind_virq) + * 3. Dom0 and DomU fetches machine check data + * 4. Dom0 wants Xen to notify a DomU + * 5. Dom0 gets DomU ID from physical address + * 6. Dom0 wants Xen to kill DomU (already done for "xm destroy") + * + * Nice to have's: + * 7. Dom0 wants Xen to deactivate a physical CPU + * This is better done as separate task, physical CPU hotplugging, + * and hypercall(s) should be sysctl's + * 8. Page migration proposed from Xen NUMA work, where Dom0 can tell Xen to + * move a DomU (or Dom0 itself) away from a malicious page + * producing correctable errors. + * 9. offlining physical page: + * Xen free's and never re-uses a certain physical page. + * 10. Testfacility: Allow Dom0 to write values into machine check MSR's + * and tell Xen to trigger a machine check + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_MCA_H__ +#define __XEN_PUBLIC_ARCH_X86_MCA_H__ + +/* Hypercall */ +#define __HYPERVISOR_mca __HYPERVISOR_arch_0 + +#define XEN_MCA_INTERFACE_VERSION 0x03000001 + +/* IN: Dom0 calls hypercall from MC event handler. */ +#define XEN_MC_CORRECTABLE 0x0 +/* IN: Dom0/DomU calls hypercall from MC trap handler. */ +#define XEN_MC_TRAP 0x1 +/* XEN_MC_CORRECTABLE and XEN_MC_TRAP are mutually exclusive. */ + +/* OUT: All is ok */ +#define XEN_MC_OK 0x0 +/* OUT: Domain could not fetch data. */ +#define XEN_MC_FETCHFAILED 0x1 +/* OUT: There was no machine check data to fetch. */ +#define XEN_MC_NODATA 0x2 +/* OUT: Between notification time and this hypercall an other + * (most likely) correctable error happened. The fetched data, + * does not match the original machine check data. */ +#define XEN_MC_NOMATCH 0x4 + +/* OUT: DomU did not register MC NMI handler. Try something else. */ +#define XEN_MC_CANNOTHANDLE 0x8 +/* OUT: Notifying DomU failed. Retry later or try something else. */ +#define XEN_MC_NOTDELIVERED 0x10 +/* Note, XEN_MC_CANNOTHANDLE and XEN_MC_NOTDELIVERED are mutually exclusive. */ + + +#ifndef __ASSEMBLY__ + +#define VIRQ_MCA VIRQ_ARCH_0 /* G. (DOM0) Machine Check Architecture */ + +/* + * Machine Check Architecure: + * structs are read-only and used to report all kinds of + * correctable and uncorrectable errors detected by the HW. + * Dom0 and DomU: register a handler to get notified. + * Dom0 only: Correctable errors are reported via VIRQ_MCA + * Dom0 and DomU: Uncorrectable errors are reported via nmi handlers + */ +#define MC_TYPE_GLOBAL 0 +#define MC_TYPE_BANK 1 +#define MC_TYPE_EXTENDED 2 + +struct mcinfo_common { + uint16_t type; /* structure type */ + uint16_t size; /* size of this struct in bytes */ +}; + + +#define MC_FLAG_CORRECTABLE (1 << 0) +#define MC_FLAG_UNCORRECTABLE (1 << 1) + +/* contains global x86 mc information */ +struct mcinfo_global { + struct mcinfo_common common; + + /* running domain at the time in error (most likely the impacted one) */ + uint16_t mc_domid; + uint32_t mc_socketid; /* physical socket of the physical core */ + uint16_t mc_coreid; /* physical impacted core */ + uint16_t mc_core_threadid; /* core thread of physical core */ + uint16_t mc_vcpuid; /* virtual cpu scheduled for mc_domid */ + uint64_t mc_gstatus; /* global status */ + uint32_t mc_flags; +}; + +/* contains bank local x86 mc information */ +struct mcinfo_bank { + struct mcinfo_common common; + + uint16_t mc_bank; /* bank nr */ + uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0 + * and if mc_addr is valid. Never valid on DomU. */ + uint64_t mc_status; /* bank status */ + uint64_t mc_addr; /* bank address, only valid + * if addr bit is set in mc_status */ + uint64_t mc_misc; +}; + + +struct mcinfo_msr { + uint64_t reg; /* MSR */ + uint64_t value; /* MSR value */ +}; + +/* contains mc information from other + * or additional mc MSRs */ +struct mcinfo_extended { + struct mcinfo_common common; + + /* You can fill up to five registers. + * If you need more, then use this structure + * multiple times. */ + + uint32_t mc_msrs; /* Number of msr with valid values. */ + struct mcinfo_msr mc_msr[5]; +}; + +#define MCINFO_HYPERCALLSIZE 1024 +#define MCINFO_MAXSIZE 768 + +struct mc_info { + /* Number of mcinfo_* entries in mi_data */ + uint32_t mi_nentries; + + uint8_t mi_data[MCINFO_MAXSIZE - sizeof(uint32_t)]; +}; +typedef struct mc_info mc_info_t; + + + +/* + * OS's should use these instead of writing their own lookup function + * each with its own bugs and drawbacks. + * We use macros instead of static inline functions to allow guests + * to include this header in assembly files (*.S). + */ +/* Prototype: + * uint32_t x86_mcinfo_nentries(struct mc_info *mi); + */ +#define x86_mcinfo_nentries(_mi) \ + (_mi)->mi_nentries +/* Prototype: + * struct mcinfo_common *x86_mcinfo_first(struct mc_info *mi); + */ +#define x86_mcinfo_first(_mi) \ + (struct mcinfo_common *)((_mi)->mi_data) +/* Prototype: + * struct mcinfo_common *x86_mcinfo_next(struct mcinfo_common *mic); + */ +#define x86_mcinfo_next(_mic) \ + (struct mcinfo_common *)((uint8_t *)(_mic) + (_mic)->size) + +/* Prototype: + * void x86_mcinfo_lookup(void *ret, struct mc_info *mi, uint16_t type); + */ +#define x86_mcinfo_lookup(_ret, _mi, _type) \ + do { \ + uint32_t found, i; \ + struct mcinfo_common *_mic; \ + \ + found = 0; \ + (_ret) = NULL; \ + if (_mi == NULL) break; \ + _mic = x86_mcinfo_first(_mi); \ + for (i = 0; i < x86_mcinfo_nentries(_mi); i++) { \ + if (_mic->type == (_type)) { \ + found = 1; \ + break; \ + } \ + _mic = x86_mcinfo_next(_mic); \ + } \ + (_ret) = found ? _mic : NULL; \ + } while (0) + + +/* Usecase 1 + * Register machine check trap callback handler + * (already done via "set_trap_table" hypercall) + */ + +/* Usecase 2 + * Dom0 registers machine check event callback handler + * done by EVTCHNOP_bind_virq + */ + +/* Usecase 3 + * Fetch machine check data from hypervisor. + * Note, this hypercall is special, because both Dom0 and DomU must use this. + */ +#define XEN_MC_fetch 1 +struct xen_mc_fetch { + /* IN/OUT variables. */ + uint32_t flags; + +/* IN: XEN_MC_CORRECTABLE, XEN_MC_TRAP */ +/* OUT: XEN_MC_OK, XEN_MC_FETCHFAILED, XEN_MC_NODATA, XEN_MC_NOMATCH */ + + /* OUT variables. */ + uint32_t fetch_idx; /* only useful for Dom0 for the notify hypercall */ + struct mc_info mc_info; +}; +typedef struct xen_mc_fetch xen_mc_fetch_t; +DEFINE_XEN_GUEST_HANDLE(xen_mc_fetch_t); + + +/* Usecase 4 + * This tells the hypervisor to notify a DomU about the machine check error + */ +#define XEN_MC_notifydomain 2 +struct xen_mc_notifydomain { + /* IN variables. */ + uint16_t mc_domid; /* The unprivileged domain to notify. */ + uint16_t mc_vcpuid; /* The vcpu in mc_domid to notify. + * Usually echo'd value from the fetch hypercall. */ + uint32_t fetch_idx; /* echo'd value from the fetch hypercall. */ + + /* IN/OUT variables. */ + uint32_t flags; + +/* IN: XEN_MC_CORRECTABLE, XEN_MC_TRAP */ +/* OUT: XEN_MC_OK, XEN_MC_CANNOTHANDLE, XEN_MC_NOTDELIVERED, XEN_MC_NOMATCH */ +}; +typedef struct xen_mc_notifydomain xen_mc_notifydomain_t; +DEFINE_XEN_GUEST_HANDLE(xen_mc_notifydomain_t); + + +struct xen_mc { + uint32_t cmd; + uint32_t interface_version; /* XEN_MCA_INTERFACE_VERSION */ + union { + struct xen_mc_fetch mc_fetch; + struct xen_mc_notifydomain mc_notifydomain; + uint8_t pad[MCINFO_HYPERCALLSIZE]; + } u; +}; +typedef struct xen_mc xen_mc_t; +DEFINE_XEN_GUEST_HANDLE(xen_mc_t); + +#endif /* __ASSEMBLY__ */ + +#endif /* __XEN_PUBLIC_ARCH_X86_MCA_H__ */ diff --git a/xen/public/arch-x86/xen-x86_32.h b/xen/public/arch-x86/xen-x86_32.h new file mode 100644 index 0000000..7cb6a01 --- /dev/null +++ b/xen/public/arch-x86/xen-x86_32.h @@ -0,0 +1,180 @@ +/****************************************************************************** + * xen-x86_32.h + * + * Guest OS interface to x86 32-bit Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004-2007, K A Fraser + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_XEN_X86_32_H__ +#define __XEN_PUBLIC_ARCH_X86_XEN_X86_32_H__ + +/* + * Hypercall interface: + * Input: %ebx, %ecx, %edx, %esi, %edi (arguments 1-5) + * Output: %eax + * Access is via hypercall page (set up by guest loader or via a Xen MSR): + * call hypercall_page + hypercall-number * 32 + * Clobbered: Argument registers (e.g., 2-arg hypercall clobbers %ebx,%ecx) + */ + +#if __XEN_INTERFACE_VERSION__ < 0x00030203 +/* + * Legacy hypercall interface: + * As above, except the entry sequence to the hypervisor is: + * mov $hypercall-number*32,%eax ; int $0x82 + */ +#define TRAP_INSTR "int $0x82" +#endif + +/* + * These flat segments are in the Xen-private section of every GDT. Since these + * are also present in the initial GDT, many OSes will be able to avoid + * installing their own GDT. + */ +#define FLAT_RING1_CS 0xe019 /* GDT index 259 */ +#define FLAT_RING1_DS 0xe021 /* GDT index 260 */ +#define FLAT_RING1_SS 0xe021 /* GDT index 260 */ +#define FLAT_RING3_CS 0xe02b /* GDT index 261 */ +#define FLAT_RING3_DS 0xe033 /* GDT index 262 */ +#define FLAT_RING3_SS 0xe033 /* GDT index 262 */ + +#define FLAT_KERNEL_CS FLAT_RING1_CS +#define FLAT_KERNEL_DS FLAT_RING1_DS +#define FLAT_KERNEL_SS FLAT_RING1_SS +#define FLAT_USER_CS FLAT_RING3_CS +#define FLAT_USER_DS FLAT_RING3_DS +#define FLAT_USER_SS FLAT_RING3_SS + +#define __HYPERVISOR_VIRT_START_PAE 0xF5800000 +#define __MACH2PHYS_VIRT_START_PAE 0xF5800000 +#define __MACH2PHYS_VIRT_END_PAE 0xF6800000 +#define HYPERVISOR_VIRT_START_PAE \ + mk_unsigned_long(__HYPERVISOR_VIRT_START_PAE) +#define MACH2PHYS_VIRT_START_PAE \ + mk_unsigned_long(__MACH2PHYS_VIRT_START_PAE) +#define MACH2PHYS_VIRT_END_PAE \ + mk_unsigned_long(__MACH2PHYS_VIRT_END_PAE) + +/* Non-PAE bounds are obsolete. */ +#define __HYPERVISOR_VIRT_START_NONPAE 0xFC000000 +#define __MACH2PHYS_VIRT_START_NONPAE 0xFC000000 +#define __MACH2PHYS_VIRT_END_NONPAE 0xFC400000 +#define HYPERVISOR_VIRT_START_NONPAE \ + mk_unsigned_long(__HYPERVISOR_VIRT_START_NONPAE) +#define MACH2PHYS_VIRT_START_NONPAE \ + mk_unsigned_long(__MACH2PHYS_VIRT_START_NONPAE) +#define MACH2PHYS_VIRT_END_NONPAE \ + mk_unsigned_long(__MACH2PHYS_VIRT_END_NONPAE) + +#define __HYPERVISOR_VIRT_START __HYPERVISOR_VIRT_START_PAE +#define __MACH2PHYS_VIRT_START __MACH2PHYS_VIRT_START_PAE +#define __MACH2PHYS_VIRT_END __MACH2PHYS_VIRT_END_PAE + +#ifndef HYPERVISOR_VIRT_START +#define HYPERVISOR_VIRT_START mk_unsigned_long(__HYPERVISOR_VIRT_START) +#endif + +#define MACH2PHYS_VIRT_START mk_unsigned_long(__MACH2PHYS_VIRT_START) +#define MACH2PHYS_VIRT_END mk_unsigned_long(__MACH2PHYS_VIRT_END) +#define MACH2PHYS_NR_ENTRIES ((MACH2PHYS_VIRT_END-MACH2PHYS_VIRT_START)>>2) +#ifndef machine_to_phys_mapping +#define machine_to_phys_mapping ((unsigned long *)MACH2PHYS_VIRT_START) +#endif + +/* 32-/64-bit invariability for control interfaces (domctl/sysctl). */ +#if defined(__XEN__) || defined(__XEN_TOOLS__) +#undef ___DEFINE_XEN_GUEST_HANDLE +#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \ + typedef struct { type *p; } \ + __guest_handle_ ## name; \ + typedef struct { union { type *p; uint64_aligned_t q; }; } \ + __guest_handle_64_ ## name +#undef set_xen_guest_handle +#define set_xen_guest_handle(hnd, val) \ + do { if ( sizeof(hnd) == 8 ) *(uint64_t *)&(hnd) = 0; \ + (hnd).p = val; \ + } while ( 0 ) +#define uint64_aligned_t uint64_t __attribute__((aligned(8))) +#define __XEN_GUEST_HANDLE_64(name) __guest_handle_64_ ## name +#define XEN_GUEST_HANDLE_64(name) __XEN_GUEST_HANDLE_64(name) +#endif + +#ifndef __ASSEMBLY__ + +struct cpu_user_regs { + uint32_t ebx; + uint32_t ecx; + uint32_t edx; + uint32_t esi; + uint32_t edi; + uint32_t ebp; + uint32_t eax; + uint16_t error_code; /* private */ + uint16_t entry_vector; /* private */ + uint32_t eip; + uint16_t cs; + uint8_t saved_upcall_mask; + uint8_t _pad0; + uint32_t eflags; /* eflags.IF == !saved_upcall_mask */ + uint32_t esp; + uint16_t ss, _pad1; + uint16_t es, _pad2; + uint16_t ds, _pad3; + uint16_t fs, _pad4; + uint16_t gs, _pad5; +}; +typedef struct cpu_user_regs cpu_user_regs_t; +DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t); + +/* + * Page-directory addresses above 4GB do not fit into architectural %cr3. + * When accessing %cr3, or equivalent field in vcpu_guest_context, guests + * must use the following accessor macros to pack/unpack valid MFNs. + */ +#define xen_pfn_to_cr3(pfn) (((unsigned)(pfn) << 12) | ((unsigned)(pfn) >> 20)) +#define xen_cr3_to_pfn(cr3) (((unsigned)(cr3) >> 12) | ((unsigned)(cr3) << 20)) + +struct arch_vcpu_info { + unsigned long cr2; + unsigned long pad[5]; /* sizeof(vcpu_info_t) == 64 */ +}; +typedef struct arch_vcpu_info arch_vcpu_info_t; + +struct xen_callback { + unsigned long cs; + unsigned long eip; +}; +typedef struct xen_callback xen_callback_t; + +#endif /* !__ASSEMBLY__ */ + +#endif /* __XEN_PUBLIC_ARCH_X86_XEN_X86_32_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/arch-x86/xen-x86_64.h b/xen/public/arch-x86/xen-x86_64.h new file mode 100644 index 0000000..1e54cf9 --- /dev/null +++ b/xen/public/arch-x86/xen-x86_64.h @@ -0,0 +1,212 @@ +/****************************************************************************** + * xen-x86_64.h + * + * Guest OS interface to x86 64-bit Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004-2006, K A Fraser + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_XEN_X86_64_H__ +#define __XEN_PUBLIC_ARCH_X86_XEN_X86_64_H__ + +/* + * Hypercall interface: + * Input: %rdi, %rsi, %rdx, %r10, %r8 (arguments 1-5) + * Output: %rax + * Access is via hypercall page (set up by guest loader or via a Xen MSR): + * call hypercall_page + hypercall-number * 32 + * Clobbered: argument registers (e.g., 2-arg hypercall clobbers %rdi,%rsi) + */ + +#if __XEN_INTERFACE_VERSION__ < 0x00030203 +/* + * Legacy hypercall interface: + * As above, except the entry sequence to the hypervisor is: + * mov $hypercall-number*32,%eax ; syscall + * Clobbered: %rcx, %r11, argument registers (as above) + */ +#define TRAP_INSTR "syscall" +#endif + +/* + * 64-bit segment selectors + * These flat segments are in the Xen-private section of every GDT. Since these + * are also present in the initial GDT, many OSes will be able to avoid + * installing their own GDT. + */ + +#define FLAT_RING3_CS32 0xe023 /* GDT index 260 */ +#define FLAT_RING3_CS64 0xe033 /* GDT index 261 */ +#define FLAT_RING3_DS32 0xe02b /* GDT index 262 */ +#define FLAT_RING3_DS64 0x0000 /* NULL selector */ +#define FLAT_RING3_SS32 0xe02b /* GDT index 262 */ +#define FLAT_RING3_SS64 0xe02b /* GDT index 262 */ + +#define FLAT_KERNEL_DS64 FLAT_RING3_DS64 +#define FLAT_KERNEL_DS32 FLAT_RING3_DS32 +#define FLAT_KERNEL_DS FLAT_KERNEL_DS64 +#define FLAT_KERNEL_CS64 FLAT_RING3_CS64 +#define FLAT_KERNEL_CS32 FLAT_RING3_CS32 +#define FLAT_KERNEL_CS FLAT_KERNEL_CS64 +#define FLAT_KERNEL_SS64 FLAT_RING3_SS64 +#define FLAT_KERNEL_SS32 FLAT_RING3_SS32 +#define FLAT_KERNEL_SS FLAT_KERNEL_SS64 + +#define FLAT_USER_DS64 FLAT_RING3_DS64 +#define FLAT_USER_DS32 FLAT_RING3_DS32 +#define FLAT_USER_DS FLAT_USER_DS64 +#define FLAT_USER_CS64 FLAT_RING3_CS64 +#define FLAT_USER_CS32 FLAT_RING3_CS32 +#define FLAT_USER_CS FLAT_USER_CS64 +#define FLAT_USER_SS64 FLAT_RING3_SS64 +#define FLAT_USER_SS32 FLAT_RING3_SS32 +#define FLAT_USER_SS FLAT_USER_SS64 + +#define __HYPERVISOR_VIRT_START 0xFFFF800000000000 +#define __HYPERVISOR_VIRT_END 0xFFFF880000000000 +#define __MACH2PHYS_VIRT_START 0xFFFF800000000000 +#define __MACH2PHYS_VIRT_END 0xFFFF804000000000 + +#ifndef HYPERVISOR_VIRT_START +#define HYPERVISOR_VIRT_START mk_unsigned_long(__HYPERVISOR_VIRT_START) +#define HYPERVISOR_VIRT_END mk_unsigned_long(__HYPERVISOR_VIRT_END) +#endif + +#define MACH2PHYS_VIRT_START mk_unsigned_long(__MACH2PHYS_VIRT_START) +#define MACH2PHYS_VIRT_END mk_unsigned_long(__MACH2PHYS_VIRT_END) +#define MACH2PHYS_NR_ENTRIES ((MACH2PHYS_VIRT_END-MACH2PHYS_VIRT_START)>>3) +#ifndef machine_to_phys_mapping +#define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START) +#endif + +/* + * int HYPERVISOR_set_segment_base(unsigned int which, unsigned long base) + * @which == SEGBASE_* ; @base == 64-bit base address + * Returns 0 on success. + */ +#define SEGBASE_FS 0 +#define SEGBASE_GS_USER 1 +#define SEGBASE_GS_KERNEL 2 +#define SEGBASE_GS_USER_SEL 3 /* Set user %gs specified in base[15:0] */ + +/* + * int HYPERVISOR_iret(void) + * All arguments are on the kernel stack, in the following format. + * Never returns if successful. Current kernel context is lost. + * The saved CS is mapped as follows: + * RING0 -> RING3 kernel mode. + * RING1 -> RING3 kernel mode. + * RING2 -> RING3 kernel mode. + * RING3 -> RING3 user mode. + * However RING0 indicates that the guest kernel should return to iteself + * directly with + * orb $3,1*8(%rsp) + * iretq + * If flags contains VGCF_in_syscall: + * Restore RAX, RIP, RFLAGS, RSP. + * Discard R11, RCX, CS, SS. + * Otherwise: + * Restore RAX, R11, RCX, CS:RIP, RFLAGS, SS:RSP. + * All other registers are saved on hypercall entry and restored to user. + */ +/* Guest exited in SYSCALL context? Return to guest with SYSRET? */ +#define _VGCF_in_syscall 8 +#define VGCF_in_syscall (1<<_VGCF_in_syscall) +#define VGCF_IN_SYSCALL VGCF_in_syscall + +#ifndef __ASSEMBLY__ + +struct iret_context { + /* Top of stack (%rsp at point of hypercall). */ + uint64_t rax, r11, rcx, flags, rip, cs, rflags, rsp, ss; + /* Bottom of iret stack frame. */ +}; + +#if defined(__GNUC__) && !defined(__STRICT_ANSI__) +/* Anonymous union includes both 32- and 64-bit names (e.g., eax/rax). */ +#define __DECL_REG(name) union { \ + uint64_t r ## name, e ## name; \ + uint32_t _e ## name; \ +} +#else +/* Non-gcc sources must always use the proper 64-bit name (e.g., rax). */ +#define __DECL_REG(name) uint64_t r ## name +#endif + +struct cpu_user_regs { + uint64_t r15; + uint64_t r14; + uint64_t r13; + uint64_t r12; + __DECL_REG(bp); + __DECL_REG(bx); + uint64_t r11; + uint64_t r10; + uint64_t r9; + uint64_t r8; + __DECL_REG(ax); + __DECL_REG(cx); + __DECL_REG(dx); + __DECL_REG(si); + __DECL_REG(di); + uint32_t error_code; /* private */ + uint32_t entry_vector; /* private */ + __DECL_REG(ip); + uint16_t cs, _pad0[1]; + uint8_t saved_upcall_mask; + uint8_t _pad1[3]; + __DECL_REG(flags); /* rflags.IF == !saved_upcall_mask */ + __DECL_REG(sp); + uint16_t ss, _pad2[3]; + uint16_t es, _pad3[3]; + uint16_t ds, _pad4[3]; + uint16_t fs, _pad5[3]; /* Non-zero => takes precedence over fs_base. */ + uint16_t gs, _pad6[3]; /* Non-zero => takes precedence over gs_base_usr. */ +}; +typedef struct cpu_user_regs cpu_user_regs_t; +DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t); + +#undef __DECL_REG + +#define xen_pfn_to_cr3(pfn) ((unsigned long)(pfn) << 12) +#define xen_cr3_to_pfn(cr3) ((unsigned long)(cr3) >> 12) + +struct arch_vcpu_info { + unsigned long cr2; + unsigned long pad; /* sizeof(vcpu_info_t) == 64 */ +}; +typedef struct arch_vcpu_info arch_vcpu_info_t; + +typedef unsigned long xen_callback_t; + +#endif /* !__ASSEMBLY__ */ + +#endif /* __XEN_PUBLIC_ARCH_X86_XEN_X86_64_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/arch-x86/xen.h b/xen/public/arch-x86/xen.h new file mode 100644 index 0000000..084348f --- /dev/null +++ b/xen/public/arch-x86/xen.h @@ -0,0 +1,204 @@ +/****************************************************************************** + * arch-x86/xen.h + * + * Guest OS interface to x86 Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004-2006, K A Fraser + */ + +#include "../xen.h" + +#ifndef __XEN_PUBLIC_ARCH_X86_XEN_H__ +#define __XEN_PUBLIC_ARCH_X86_XEN_H__ + +/* Structural guest handles introduced in 0x00030201. */ +#if __XEN_INTERFACE_VERSION__ >= 0x00030201 +#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \ + typedef struct { type *p; } __guest_handle_ ## name +#else +#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \ + typedef type * __guest_handle_ ## name +#endif + +#define __DEFINE_XEN_GUEST_HANDLE(name, type) \ + ___DEFINE_XEN_GUEST_HANDLE(name, type); \ + ___DEFINE_XEN_GUEST_HANDLE(const_##name, const type) +#define DEFINE_XEN_GUEST_HANDLE(name) __DEFINE_XEN_GUEST_HANDLE(name, name) +#define __XEN_GUEST_HANDLE(name) __guest_handle_ ## name +#define XEN_GUEST_HANDLE(name) __XEN_GUEST_HANDLE(name) +#define set_xen_guest_handle(hnd, val) do { (hnd).p = val; } while (0) +#ifdef __XEN_TOOLS__ +#define get_xen_guest_handle(val, hnd) do { val = (hnd).p; } while (0) +#endif + +#if defined(__i386__) +#include "xen-x86_32.h" +#elif defined(__x86_64__) +#include "xen-x86_64.h" +#endif + +#ifndef __ASSEMBLY__ +typedef unsigned long xen_pfn_t; +#define PRI_xen_pfn "lx" +#endif + +/* + * SEGMENT DESCRIPTOR TABLES + */ +/* + * A number of GDT entries are reserved by Xen. These are not situated at the + * start of the GDT because some stupid OSes export hard-coded selector values + * in their ABI. These hard-coded values are always near the start of the GDT, + * so Xen places itself out of the way, at the far end of the GDT. + */ +#define FIRST_RESERVED_GDT_PAGE 14 +#define FIRST_RESERVED_GDT_BYTE (FIRST_RESERVED_GDT_PAGE * 4096) +#define FIRST_RESERVED_GDT_ENTRY (FIRST_RESERVED_GDT_BYTE / 8) + +/* Maximum number of virtual CPUs in multi-processor guests. */ +#define MAX_VIRT_CPUS 32 + + +/* Machine check support */ +#include "xen-mca.h" + +#ifndef __ASSEMBLY__ + +typedef unsigned long xen_ulong_t; + +/* + * Send an array of these to HYPERVISOR_set_trap_table(). + * The privilege level specifies which modes may enter a trap via a software + * interrupt. On x86/64, since rings 1 and 2 are unavailable, we allocate + * privilege levels as follows: + * Level == 0: Noone may enter + * Level == 1: Kernel may enter + * Level == 2: Kernel may enter + * Level == 3: Everyone may enter + */ +#define TI_GET_DPL(_ti) ((_ti)->flags & 3) +#define TI_GET_IF(_ti) ((_ti)->flags & 4) +#define TI_SET_DPL(_ti,_dpl) ((_ti)->flags |= (_dpl)) +#define TI_SET_IF(_ti,_if) ((_ti)->flags |= ((!!(_if))<<2)) +struct trap_info { + uint8_t vector; /* exception vector */ + uint8_t flags; /* 0-3: privilege level; 4: clear event enable? */ + uint16_t cs; /* code selector */ + unsigned long address; /* code offset */ +}; +typedef struct trap_info trap_info_t; +DEFINE_XEN_GUEST_HANDLE(trap_info_t); + +typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */ + +/* + * The following is all CPU context. Note that the fpu_ctxt block is filled + * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used. + */ +struct vcpu_guest_context { + /* FPU registers come first so they can be aligned for FXSAVE/FXRSTOR. */ + struct { char x[512]; } fpu_ctxt; /* User-level FPU registers */ +#define VGCF_I387_VALID (1<<0) +#define VGCF_IN_KERNEL (1<<2) +#define _VGCF_i387_valid 0 +#define VGCF_i387_valid (1<<_VGCF_i387_valid) +#define _VGCF_in_kernel 2 +#define VGCF_in_kernel (1<<_VGCF_in_kernel) +#define _VGCF_failsafe_disables_events 3 +#define VGCF_failsafe_disables_events (1<<_VGCF_failsafe_disables_events) +#define _VGCF_syscall_disables_events 4 +#define VGCF_syscall_disables_events (1<<_VGCF_syscall_disables_events) +#define _VGCF_online 5 +#define VGCF_online (1<<_VGCF_online) + unsigned long flags; /* VGCF_* flags */ + struct cpu_user_regs user_regs; /* User-level CPU registers */ + struct trap_info trap_ctxt[256]; /* Virtual IDT */ + unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ + unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ + unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ + /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ + unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ + unsigned long debugreg[8]; /* DB0-DB7 (debug registers) */ +#ifdef __i386__ + unsigned long event_callback_cs; /* CS:EIP of event callback */ + unsigned long event_callback_eip; + unsigned long failsafe_callback_cs; /* CS:EIP of failsafe callback */ + unsigned long failsafe_callback_eip; +#else + unsigned long event_callback_eip; + unsigned long failsafe_callback_eip; +#ifdef __XEN__ + union { + unsigned long syscall_callback_eip; + struct { + unsigned int event_callback_cs; /* compat CS of event cb */ + unsigned int failsafe_callback_cs; /* compat CS of failsafe cb */ + }; + }; +#else + unsigned long syscall_callback_eip; +#endif +#endif + unsigned long vm_assist; /* VMASST_TYPE_* bitmap */ +#ifdef __x86_64__ + /* Segment base addresses. */ + uint64_t fs_base; + uint64_t gs_base_kernel; + uint64_t gs_base_user; +#endif +}; +typedef struct vcpu_guest_context vcpu_guest_context_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t); + +struct arch_shared_info { + unsigned long max_pfn; /* max pfn that appears in table */ + /* Frame containing list of mfns containing list of mfns containing p2m. */ + xen_pfn_t pfn_to_mfn_frame_list_list; + unsigned long nmi_reason; + uint64_t pad[32]; +}; +typedef struct arch_shared_info arch_shared_info_t; + +#endif /* !__ASSEMBLY__ */ + +/* + * Prefix forces emulation of some non-trapping instructions. + * Currently only CPUID. + */ +#ifdef __ASSEMBLY__ +#define XEN_EMULATE_PREFIX .byte 0x0f,0x0b,0x78,0x65,0x6e ; +#define XEN_CPUID XEN_EMULATE_PREFIX cpuid +#else +#define XEN_EMULATE_PREFIX ".byte 0x0f,0x0b,0x78,0x65,0x6e ; " +#define XEN_CPUID XEN_EMULATE_PREFIX "cpuid" +#endif + +#endif /* __XEN_PUBLIC_ARCH_X86_XEN_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/arch-x86_32.h b/xen/public/arch-x86_32.h new file mode 100644 index 0000000..45842b2 --- /dev/null +++ b/xen/public/arch-x86_32.h @@ -0,0 +1,27 @@ +/****************************************************************************** + * arch-x86_32.h + * + * Guest OS interface to x86 32-bit Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004-2006, K A Fraser + */ + +#include "arch-x86/xen.h" diff --git a/xen/public/arch-x86_64.h b/xen/public/arch-x86_64.h new file mode 100644 index 0000000..fbb2639 --- /dev/null +++ b/xen/public/arch-x86_64.h @@ -0,0 +1,27 @@ +/****************************************************************************** + * arch-x86_64.h + * + * Guest OS interface to x86 64-bit Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004-2006, K A Fraser + */ + +#include "arch-x86/xen.h" diff --git a/xen/public/callback.h b/xen/public/callback.h new file mode 100644 index 0000000..f4962f6 --- /dev/null +++ b/xen/public/callback.h @@ -0,0 +1,121 @@ +/****************************************************************************** + * callback.h + * + * Register guest OS callbacks with Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2006, Ian Campbell + */ + +#ifndef __XEN_PUBLIC_CALLBACK_H__ +#define __XEN_PUBLIC_CALLBACK_H__ + +#include "xen.h" + +/* + * Prototype for this hypercall is: + * long callback_op(int cmd, void *extra_args) + * @cmd == CALLBACKOP_??? (callback operation). + * @extra_args == Operation-specific extra arguments (NULL if none). + */ + +/* ia64, x86: Callback for event delivery. */ +#define CALLBACKTYPE_event 0 + +/* x86: Failsafe callback when guest state cannot be restored by Xen. */ +#define CALLBACKTYPE_failsafe 1 + +/* x86/64 hypervisor: Syscall by 64-bit guest app ('64-on-64-on-64'). */ +#define CALLBACKTYPE_syscall 2 + +/* + * x86/32 hypervisor: Only available on x86/32 when supervisor_mode_kernel + * feature is enabled. Do not use this callback type in new code. + */ +#define CALLBACKTYPE_sysenter_deprecated 3 + +/* x86: Callback for NMI delivery. */ +#define CALLBACKTYPE_nmi 4 + +/* + * x86: sysenter is only available as follows: + * - 32-bit hypervisor: with the supervisor_mode_kernel feature enabled + * - 64-bit hypervisor: 32-bit guest applications on Intel CPUs + * ('32-on-32-on-64', '32-on-64-on-64') + * [nb. also 64-bit guest applications on Intel CPUs + * ('64-on-64-on-64'), but syscall is preferred] + */ +#define CALLBACKTYPE_sysenter 5 + +/* + * x86/64 hypervisor: Syscall by 32-bit guest app on AMD CPUs + * ('32-on-32-on-64', '32-on-64-on-64') + */ +#define CALLBACKTYPE_syscall32 7 + +/* + * Disable event deliver during callback? This flag is ignored for event and + * NMI callbacks: event delivery is unconditionally disabled. + */ +#define _CALLBACKF_mask_events 0 +#define CALLBACKF_mask_events (1U << _CALLBACKF_mask_events) + +/* + * Register a callback. + */ +#define CALLBACKOP_register 0 +struct callback_register { + uint16_t type; + uint16_t flags; + xen_callback_t address; +}; +typedef struct callback_register callback_register_t; +DEFINE_XEN_GUEST_HANDLE(callback_register_t); + +/* + * Unregister a callback. + * + * Not all callbacks can be unregistered. -EINVAL will be returned if + * you attempt to unregister such a callback. + */ +#define CALLBACKOP_unregister 1 +struct callback_unregister { + uint16_t type; + uint16_t _unused; +}; +typedef struct callback_unregister callback_unregister_t; +DEFINE_XEN_GUEST_HANDLE(callback_unregister_t); + +#if __XEN_INTERFACE_VERSION__ < 0x00030207 +#undef CALLBACKTYPE_sysenter +#define CALLBACKTYPE_sysenter CALLBACKTYPE_sysenter_deprecated +#endif + +#endif /* __XEN_PUBLIC_CALLBACK_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/dom0_ops.h b/xen/public/dom0_ops.h new file mode 100644 index 0000000..5d2b324 --- /dev/null +++ b/xen/public/dom0_ops.h @@ -0,0 +1,120 @@ +/****************************************************************************** + * dom0_ops.h + * + * Process command requests from domain-0 guest OS. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2002-2003, B Dragovic + * Copyright (c) 2002-2006, K Fraser + */ + +#ifndef __XEN_PUBLIC_DOM0_OPS_H__ +#define __XEN_PUBLIC_DOM0_OPS_H__ + +#include "xen.h" +#include "platform.h" + +#if __XEN_INTERFACE_VERSION__ >= 0x00030204 +#error "dom0_ops.h is a compatibility interface only" +#endif + +#define DOM0_INTERFACE_VERSION XENPF_INTERFACE_VERSION + +#define DOM0_SETTIME XENPF_settime +#define dom0_settime xenpf_settime +#define dom0_settime_t xenpf_settime_t + +#define DOM0_ADD_MEMTYPE XENPF_add_memtype +#define dom0_add_memtype xenpf_add_memtype +#define dom0_add_memtype_t xenpf_add_memtype_t + +#define DOM0_DEL_MEMTYPE XENPF_del_memtype +#define dom0_del_memtype xenpf_del_memtype +#define dom0_del_memtype_t xenpf_del_memtype_t + +#define DOM0_READ_MEMTYPE XENPF_read_memtype +#define dom0_read_memtype xenpf_read_memtype +#define dom0_read_memtype_t xenpf_read_memtype_t + +#define DOM0_MICROCODE XENPF_microcode_update +#define dom0_microcode xenpf_microcode_update +#define dom0_microcode_t xenpf_microcode_update_t + +#define DOM0_PLATFORM_QUIRK XENPF_platform_quirk +#define dom0_platform_quirk xenpf_platform_quirk +#define dom0_platform_quirk_t xenpf_platform_quirk_t + +typedef uint64_t cpumap_t; + +/* Unsupported legacy operation -- defined for API compatibility. */ +#define DOM0_MSR 15 +struct dom0_msr { + /* IN variables. */ + uint32_t write; + cpumap_t cpu_mask; + uint32_t msr; + uint32_t in1; + uint32_t in2; + /* OUT variables. */ + uint32_t out1; + uint32_t out2; +}; +typedef struct dom0_msr dom0_msr_t; +DEFINE_XEN_GUEST_HANDLE(dom0_msr_t); + +/* Unsupported legacy operation -- defined for API compatibility. */ +#define DOM0_PHYSICAL_MEMORY_MAP 40 +struct dom0_memory_map_entry { + uint64_t start, end; + uint32_t flags; /* reserved */ + uint8_t is_ram; +}; +typedef struct dom0_memory_map_entry dom0_memory_map_entry_t; +DEFINE_XEN_GUEST_HANDLE(dom0_memory_map_entry_t); + +struct dom0_op { + uint32_t cmd; + uint32_t interface_version; /* DOM0_INTERFACE_VERSION */ + union { + struct dom0_msr msr; + struct dom0_settime settime; + struct dom0_add_memtype add_memtype; + struct dom0_del_memtype del_memtype; + struct dom0_read_memtype read_memtype; + struct dom0_microcode microcode; + struct dom0_platform_quirk platform_quirk; + struct dom0_memory_map_entry physical_memory_map; + uint8_t pad[128]; + } u; +}; +typedef struct dom0_op dom0_op_t; +DEFINE_XEN_GUEST_HANDLE(dom0_op_t); + +#endif /* __XEN_PUBLIC_DOM0_OPS_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/domctl.h b/xen/public/domctl.h new file mode 100644 index 0000000..b7075ac --- /dev/null +++ b/xen/public/domctl.h @@ -0,0 +1,680 @@ +/****************************************************************************** + * domctl.h + * + * Domain management operations. For use by node control stack. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2002-2003, B Dragovic + * Copyright (c) 2002-2006, K Fraser + */ + +#ifndef __XEN_PUBLIC_DOMCTL_H__ +#define __XEN_PUBLIC_DOMCTL_H__ + +#if !defined(__XEN__) && !defined(__XEN_TOOLS__) +#error "domctl operations are intended for use by node control tools only" +#endif + +#include "xen.h" + +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000005 + +struct xenctl_cpumap { + XEN_GUEST_HANDLE_64(uint8) bitmap; + uint32_t nr_cpus; +}; + +/* + * NB. xen_domctl.domain is an IN/OUT parameter for this operation. + * If it is specified as zero, an id is auto-allocated and returned. + */ +#define XEN_DOMCTL_createdomain 1 +struct xen_domctl_createdomain { + /* IN parameters */ + uint32_t ssidref; + xen_domain_handle_t handle; + /* Is this an HVM guest (as opposed to a PV guest)? */ +#define _XEN_DOMCTL_CDF_hvm_guest 0 +#define XEN_DOMCTL_CDF_hvm_guest (1U<<_XEN_DOMCTL_CDF_hvm_guest) + /* Use hardware-assisted paging if available? */ +#define _XEN_DOMCTL_CDF_hap 1 +#define XEN_DOMCTL_CDF_hap (1U<<_XEN_DOMCTL_CDF_hap) + uint32_t flags; +}; +typedef struct xen_domctl_createdomain xen_domctl_createdomain_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_createdomain_t); + +#define XEN_DOMCTL_destroydomain 2 +#define XEN_DOMCTL_pausedomain 3 +#define XEN_DOMCTL_unpausedomain 4 +#define XEN_DOMCTL_resumedomain 27 + +#define XEN_DOMCTL_getdomaininfo 5 +struct xen_domctl_getdomaininfo { + /* OUT variables. */ + domid_t domain; /* Also echoed in domctl.domain */ + /* Domain is scheduled to die. */ +#define _XEN_DOMINF_dying 0 +#define XEN_DOMINF_dying (1U<<_XEN_DOMINF_dying) + /* Domain is an HVM guest (as opposed to a PV guest). */ +#define _XEN_DOMINF_hvm_guest 1 +#define XEN_DOMINF_hvm_guest (1U<<_XEN_DOMINF_hvm_guest) + /* The guest OS has shut down. */ +#define _XEN_DOMINF_shutdown 2 +#define XEN_DOMINF_shutdown (1U<<_XEN_DOMINF_shutdown) + /* Currently paused by control software. */ +#define _XEN_DOMINF_paused 3 +#define XEN_DOMINF_paused (1U<<_XEN_DOMINF_paused) + /* Currently blocked pending an event. */ +#define _XEN_DOMINF_blocked 4 +#define XEN_DOMINF_blocked (1U<<_XEN_DOMINF_blocked) + /* Domain is currently running. */ +#define _XEN_DOMINF_running 5 +#define XEN_DOMINF_running (1U<<_XEN_DOMINF_running) + /* Being debugged. */ +#define _XEN_DOMINF_debugged 6 +#define XEN_DOMINF_debugged (1U<<_XEN_DOMINF_debugged) + /* CPU to which this domain is bound. */ +#define XEN_DOMINF_cpumask 255 +#define XEN_DOMINF_cpushift 8 + /* XEN_DOMINF_shutdown guest-supplied code. */ +#define XEN_DOMINF_shutdownmask 255 +#define XEN_DOMINF_shutdownshift 16 + uint32_t flags; /* XEN_DOMINF_* */ + uint64_aligned_t tot_pages; + uint64_aligned_t max_pages; + uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */ + uint64_aligned_t cpu_time; + uint32_t nr_online_vcpus; /* Number of VCPUs currently online. */ + uint32_t max_vcpu_id; /* Maximum VCPUID in use by this domain. */ + uint32_t ssidref; + xen_domain_handle_t handle; +}; +typedef struct xen_domctl_getdomaininfo xen_domctl_getdomaininfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t); + + +#define XEN_DOMCTL_getmemlist 6 +struct xen_domctl_getmemlist { + /* IN variables. */ + /* Max entries to write to output buffer. */ + uint64_aligned_t max_pfns; + /* Start index in guest's page list. */ + uint64_aligned_t start_pfn; + XEN_GUEST_HANDLE_64(uint64) buffer; + /* OUT variables. */ + uint64_aligned_t num_pfns; +}; +typedef struct xen_domctl_getmemlist xen_domctl_getmemlist_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_getmemlist_t); + + +#define XEN_DOMCTL_getpageframeinfo 7 + +#define XEN_DOMCTL_PFINFO_LTAB_SHIFT 28 +#define XEN_DOMCTL_PFINFO_NOTAB (0x0U<<28) +#define XEN_DOMCTL_PFINFO_L1TAB (0x1U<<28) +#define XEN_DOMCTL_PFINFO_L2TAB (0x2U<<28) +#define XEN_DOMCTL_PFINFO_L3TAB (0x3U<<28) +#define XEN_DOMCTL_PFINFO_L4TAB (0x4U<<28) +#define XEN_DOMCTL_PFINFO_LTABTYPE_MASK (0x7U<<28) +#define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31) +#define XEN_DOMCTL_PFINFO_XTAB (0xfU<<28) /* invalid page */ +#define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28) + +struct xen_domctl_getpageframeinfo { + /* IN variables. */ + uint64_aligned_t gmfn; /* GMFN to query */ + /* OUT variables. */ + /* Is the page PINNED to a type? */ + uint32_t type; /* see above type defs */ +}; +typedef struct xen_domctl_getpageframeinfo xen_domctl_getpageframeinfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_getpageframeinfo_t); + + +#define XEN_DOMCTL_getpageframeinfo2 8 +struct xen_domctl_getpageframeinfo2 { + /* IN variables. */ + uint64_aligned_t num; + /* IN/OUT variables. */ + XEN_GUEST_HANDLE_64(uint32) array; +}; +typedef struct xen_domctl_getpageframeinfo2 xen_domctl_getpageframeinfo2_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_getpageframeinfo2_t); + + +/* + * Control shadow pagetables operation + */ +#define XEN_DOMCTL_shadow_op 10 + +/* Disable shadow mode. */ +#define XEN_DOMCTL_SHADOW_OP_OFF 0 + +/* Enable shadow mode (mode contains ORed XEN_DOMCTL_SHADOW_ENABLE_* flags). */ +#define XEN_DOMCTL_SHADOW_OP_ENABLE 32 + +/* Log-dirty bitmap operations. */ + /* Return the bitmap and clean internal copy for next round. */ +#define XEN_DOMCTL_SHADOW_OP_CLEAN 11 + /* Return the bitmap but do not modify internal copy. */ +#define XEN_DOMCTL_SHADOW_OP_PEEK 12 + +/* Memory allocation accessors. */ +#define XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION 30 +#define XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION 31 + +/* Legacy enable operations. */ + /* Equiv. to ENABLE with no mode flags. */ +#define XEN_DOMCTL_SHADOW_OP_ENABLE_TEST 1 + /* Equiv. to ENABLE with mode flag ENABLE_LOG_DIRTY. */ +#define XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY 2 + /* Equiv. to ENABLE with mode flags ENABLE_REFCOUNT and ENABLE_TRANSLATE. */ +#define XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE 3 + +/* Mode flags for XEN_DOMCTL_SHADOW_OP_ENABLE. */ + /* + * Shadow pagetables are refcounted: guest does not use explicit mmu + * operations nor write-protect its pagetables. + */ +#define XEN_DOMCTL_SHADOW_ENABLE_REFCOUNT (1 << 1) + /* + * Log pages in a bitmap as they are dirtied. + * Used for live relocation to determine which pages must be re-sent. + */ +#define XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY (1 << 2) + /* + * Automatically translate GPFNs into MFNs. + */ +#define XEN_DOMCTL_SHADOW_ENABLE_TRANSLATE (1 << 3) + /* + * Xen does not steal virtual address space from the guest. + * Requires HVM support. + */ +#define XEN_DOMCTL_SHADOW_ENABLE_EXTERNAL (1 << 4) + +struct xen_domctl_shadow_op_stats { + uint32_t fault_count; + uint32_t dirty_count; +}; +typedef struct xen_domctl_shadow_op_stats xen_domctl_shadow_op_stats_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_shadow_op_stats_t); + +struct xen_domctl_shadow_op { + /* IN variables. */ + uint32_t op; /* XEN_DOMCTL_SHADOW_OP_* */ + + /* OP_ENABLE */ + uint32_t mode; /* XEN_DOMCTL_SHADOW_ENABLE_* */ + + /* OP_GET_ALLOCATION / OP_SET_ALLOCATION */ + uint32_t mb; /* Shadow memory allocation in MB */ + + /* OP_PEEK / OP_CLEAN */ + XEN_GUEST_HANDLE_64(uint8) dirty_bitmap; + uint64_aligned_t pages; /* Size of buffer. Updated with actual size. */ + struct xen_domctl_shadow_op_stats stats; +}; +typedef struct xen_domctl_shadow_op xen_domctl_shadow_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_shadow_op_t); + + +#define XEN_DOMCTL_max_mem 11 +struct xen_domctl_max_mem { + /* IN variables. */ + uint64_aligned_t max_memkb; +}; +typedef struct xen_domctl_max_mem xen_domctl_max_mem_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_mem_t); + + +#define XEN_DOMCTL_setvcpucontext 12 +#define XEN_DOMCTL_getvcpucontext 13 +struct xen_domctl_vcpucontext { + uint32_t vcpu; /* IN */ + XEN_GUEST_HANDLE_64(vcpu_guest_context_t) ctxt; /* IN/OUT */ +}; +typedef struct xen_domctl_vcpucontext xen_domctl_vcpucontext_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpucontext_t); + + +#define XEN_DOMCTL_getvcpuinfo 14 +struct xen_domctl_getvcpuinfo { + /* IN variables. */ + uint32_t vcpu; + /* OUT variables. */ + uint8_t online; /* currently online (not hotplugged)? */ + uint8_t blocked; /* blocked waiting for an event? */ + uint8_t running; /* currently scheduled on its CPU? */ + uint64_aligned_t cpu_time; /* total cpu time consumed (ns) */ + uint32_t cpu; /* current mapping */ +}; +typedef struct xen_domctl_getvcpuinfo xen_domctl_getvcpuinfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t); + + +/* Get/set which physical cpus a vcpu can execute on. */ +#define XEN_DOMCTL_setvcpuaffinity 9 +#define XEN_DOMCTL_getvcpuaffinity 25 +struct xen_domctl_vcpuaffinity { + uint32_t vcpu; /* IN */ + struct xenctl_cpumap cpumap; /* IN/OUT */ +}; +typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); + + +#define XEN_DOMCTL_max_vcpus 15 +struct xen_domctl_max_vcpus { + uint32_t max; /* maximum number of vcpus */ +}; +typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t); + + +#define XEN_DOMCTL_scheduler_op 16 +/* Scheduler types. */ +#define XEN_SCHEDULER_SEDF 4 +#define XEN_SCHEDULER_CREDIT 5 +/* Set or get info? */ +#define XEN_DOMCTL_SCHEDOP_putinfo 0 +#define XEN_DOMCTL_SCHEDOP_getinfo 1 +struct xen_domctl_scheduler_op { + uint32_t sched_id; /* XEN_SCHEDULER_* */ + uint32_t cmd; /* XEN_DOMCTL_SCHEDOP_* */ + union { + struct xen_domctl_sched_sedf { + uint64_aligned_t period; + uint64_aligned_t slice; + uint64_aligned_t latency; + uint32_t extratime; + uint32_t weight; + } sedf; + struct xen_domctl_sched_credit { + uint16_t weight; + uint16_t cap; + } credit; + } u; +}; +typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_scheduler_op_t); + + +#define XEN_DOMCTL_setdomainhandle 17 +struct xen_domctl_setdomainhandle { + xen_domain_handle_t handle; +}; +typedef struct xen_domctl_setdomainhandle xen_domctl_setdomainhandle_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_setdomainhandle_t); + + +#define XEN_DOMCTL_setdebugging 18 +struct xen_domctl_setdebugging { + uint8_t enable; +}; +typedef struct xen_domctl_setdebugging xen_domctl_setdebugging_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_setdebugging_t); + + +#define XEN_DOMCTL_irq_permission 19 +struct xen_domctl_irq_permission { + uint8_t pirq; + uint8_t allow_access; /* flag to specify enable/disable of IRQ access */ +}; +typedef struct xen_domctl_irq_permission xen_domctl_irq_permission_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_irq_permission_t); + + +#define XEN_DOMCTL_iomem_permission 20 +struct xen_domctl_iomem_permission { + uint64_aligned_t first_mfn;/* first page (physical page number) in range */ + uint64_aligned_t nr_mfns; /* number of pages in range (>0) */ + uint8_t allow_access; /* allow (!0) or deny (0) access to range? */ +}; +typedef struct xen_domctl_iomem_permission xen_domctl_iomem_permission_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_iomem_permission_t); + + +#define XEN_DOMCTL_ioport_permission 21 +struct xen_domctl_ioport_permission { + uint32_t first_port; /* first port int range */ + uint32_t nr_ports; /* size of port range */ + uint8_t allow_access; /* allow or deny access to range? */ +}; +typedef struct xen_domctl_ioport_permission xen_domctl_ioport_permission_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_ioport_permission_t); + + +#define XEN_DOMCTL_hypercall_init 22 +struct xen_domctl_hypercall_init { + uint64_aligned_t gmfn; /* GMFN to be initialised */ +}; +typedef struct xen_domctl_hypercall_init xen_domctl_hypercall_init_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_hypercall_init_t); + + +#define XEN_DOMCTL_arch_setup 23 +#define _XEN_DOMAINSETUP_hvm_guest 0 +#define XEN_DOMAINSETUP_hvm_guest (1UL<<_XEN_DOMAINSETUP_hvm_guest) +#define _XEN_DOMAINSETUP_query 1 /* Get parameters (for save) */ +#define XEN_DOMAINSETUP_query (1UL<<_XEN_DOMAINSETUP_query) +#define _XEN_DOMAINSETUP_sioemu_guest 2 +#define XEN_DOMAINSETUP_sioemu_guest (1UL<<_XEN_DOMAINSETUP_sioemu_guest) +typedef struct xen_domctl_arch_setup { + uint64_aligned_t flags; /* XEN_DOMAINSETUP_* */ +#ifdef __ia64__ + uint64_aligned_t bp; /* mpaddr of boot param area */ + uint64_aligned_t maxmem; /* Highest memory address for MDT. */ + uint64_aligned_t xsi_va; /* Xen shared_info area virtual address. */ + uint32_t hypercall_imm; /* Break imm for Xen hypercalls. */ + int8_t vhpt_size_log2; /* Log2 of VHPT size. */ +#endif +} xen_domctl_arch_setup_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_arch_setup_t); + + +#define XEN_DOMCTL_settimeoffset 24 +struct xen_domctl_settimeoffset { + int32_t time_offset_seconds; /* applied to domain wallclock time */ +}; +typedef struct xen_domctl_settimeoffset xen_domctl_settimeoffset_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_settimeoffset_t); + + +#define XEN_DOMCTL_gethvmcontext 33 +#define XEN_DOMCTL_sethvmcontext 34 +typedef struct xen_domctl_hvmcontext { + uint32_t size; /* IN/OUT: size of buffer / bytes filled */ + XEN_GUEST_HANDLE_64(uint8) buffer; /* IN/OUT: data, or call + * gethvmcontext with NULL + * buffer to get size req'd */ +} xen_domctl_hvmcontext_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_hvmcontext_t); + + +#define XEN_DOMCTL_set_address_size 35 +#define XEN_DOMCTL_get_address_size 36 +typedef struct xen_domctl_address_size { + uint32_t size; +} xen_domctl_address_size_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_address_size_t); + + +#define XEN_DOMCTL_real_mode_area 26 +struct xen_domctl_real_mode_area { + uint32_t log; /* log2 of Real Mode Area size */ +}; +typedef struct xen_domctl_real_mode_area xen_domctl_real_mode_area_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_real_mode_area_t); + + +#define XEN_DOMCTL_sendtrigger 28 +#define XEN_DOMCTL_SENDTRIGGER_NMI 0 +#define XEN_DOMCTL_SENDTRIGGER_RESET 1 +#define XEN_DOMCTL_SENDTRIGGER_INIT 2 +struct xen_domctl_sendtrigger { + uint32_t trigger; /* IN */ + uint32_t vcpu; /* IN */ +}; +typedef struct xen_domctl_sendtrigger xen_domctl_sendtrigger_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_sendtrigger_t); + + +/* Assign PCI device to HVM guest. Sets up IOMMU structures. */ +#define XEN_DOMCTL_assign_device 37 +#define XEN_DOMCTL_test_assign_device 45 +#define XEN_DOMCTL_deassign_device 47 +struct xen_domctl_assign_device { + uint32_t machine_bdf; /* machine PCI ID of assigned device */ +}; +typedef struct xen_domctl_assign_device xen_domctl_assign_device_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t); + +/* Retrieve sibling devices infomation of machine_bdf */ +#define XEN_DOMCTL_get_device_group 50 +struct xen_domctl_get_device_group { + uint32_t machine_bdf; /* IN */ + uint32_t max_sdevs; /* IN */ + uint32_t num_sdevs; /* OUT */ + XEN_GUEST_HANDLE_64(uint32) sdev_array; /* OUT */ +}; +typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t); + +/* Pass-through interrupts: bind real irq -> hvm devfn. */ +#define XEN_DOMCTL_bind_pt_irq 38 +#define XEN_DOMCTL_unbind_pt_irq 48 +typedef enum pt_irq_type_e { + PT_IRQ_TYPE_PCI, + PT_IRQ_TYPE_ISA, + PT_IRQ_TYPE_MSI, +} pt_irq_type_t; +struct xen_domctl_bind_pt_irq { + uint32_t machine_irq; + pt_irq_type_t irq_type; + uint32_t hvm_domid; + + union { + struct { + uint8_t isa_irq; + } isa; + struct { + uint8_t bus; + uint8_t device; + uint8_t intx; + } pci; + struct { + uint8_t gvec; + uint32_t gflags; + } msi; + } u; +}; +typedef struct xen_domctl_bind_pt_irq xen_domctl_bind_pt_irq_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_bind_pt_irq_t); + + +/* Bind machine I/O address range -> HVM address range. */ +#define XEN_DOMCTL_memory_mapping 39 +#define DPCI_ADD_MAPPING 1 +#define DPCI_REMOVE_MAPPING 0 +struct xen_domctl_memory_mapping { + uint64_aligned_t first_gfn; /* first page (hvm guest phys page) in range */ + uint64_aligned_t first_mfn; /* first page (machine page) in range */ + uint64_aligned_t nr_mfns; /* number of pages in range (>0) */ + uint32_t add_mapping; /* add or remove mapping */ + uint32_t padding; /* padding for 64-bit aligned structure */ +}; +typedef struct xen_domctl_memory_mapping xen_domctl_memory_mapping_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_memory_mapping_t); + + +/* Bind machine I/O port range -> HVM I/O port range. */ +#define XEN_DOMCTL_ioport_mapping 40 +struct xen_domctl_ioport_mapping { + uint32_t first_gport; /* first guest IO port*/ + uint32_t first_mport; /* first machine IO port */ + uint32_t nr_ports; /* size of port range */ + uint32_t add_mapping; /* add or remove mapping */ +}; +typedef struct xen_domctl_ioport_mapping xen_domctl_ioport_mapping_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_ioport_mapping_t); + + +/* + * Pin caching type of RAM space for x86 HVM domU. + */ +#define XEN_DOMCTL_pin_mem_cacheattr 41 +/* Caching types: these happen to be the same as x86 MTRR/PAT type codes. */ +#define XEN_DOMCTL_MEM_CACHEATTR_UC 0 +#define XEN_DOMCTL_MEM_CACHEATTR_WC 1 +#define XEN_DOMCTL_MEM_CACHEATTR_WT 4 +#define XEN_DOMCTL_MEM_CACHEATTR_WP 5 +#define XEN_DOMCTL_MEM_CACHEATTR_WB 6 +#define XEN_DOMCTL_MEM_CACHEATTR_UCM 7 +struct xen_domctl_pin_mem_cacheattr { + uint64_aligned_t start, end; + unsigned int type; /* XEN_DOMCTL_MEM_CACHEATTR_* */ +}; +typedef struct xen_domctl_pin_mem_cacheattr xen_domctl_pin_mem_cacheattr_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_pin_mem_cacheattr_t); + + +#define XEN_DOMCTL_set_ext_vcpucontext 42 +#define XEN_DOMCTL_get_ext_vcpucontext 43 +struct xen_domctl_ext_vcpucontext { + /* IN: VCPU that this call applies to. */ + uint32_t vcpu; + /* + * SET: Size of struct (IN) + * GET: Size of struct (OUT) + */ + uint32_t size; +#if defined(__i386__) || defined(__x86_64__) + /* SYSCALL from 32-bit mode and SYSENTER callback information. */ + /* NB. SYSCALL from 64-bit mode is contained in vcpu_guest_context_t */ + uint64_aligned_t syscall32_callback_eip; + uint64_aligned_t sysenter_callback_eip; + uint16_t syscall32_callback_cs; + uint16_t sysenter_callback_cs; + uint8_t syscall32_disables_events; + uint8_t sysenter_disables_events; +#endif +}; +typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_ext_vcpucontext_t); + +/* + * Set optimizaton features for a domain + */ +#define XEN_DOMCTL_set_opt_feature 44 +struct xen_domctl_set_opt_feature { +#if defined(__ia64__) + struct xen_ia64_opt_feature optf; +#else + /* Make struct non-empty: do not depend on this field name! */ + uint64_t dummy; +#endif +}; +typedef struct xen_domctl_set_opt_feature xen_domctl_set_opt_feature_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_opt_feature_t); + +/* + * Set the target domain for a domain + */ +#define XEN_DOMCTL_set_target 46 +struct xen_domctl_set_target { + domid_t target; +}; +typedef struct xen_domctl_set_target xen_domctl_set_target_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_target_t); + +#if defined(__i386__) || defined(__x86_64__) +# define XEN_CPUID_INPUT_UNUSED 0xFFFFFFFF +# define XEN_DOMCTL_set_cpuid 49 +struct xen_domctl_cpuid { + unsigned int input[2]; + unsigned int eax; + unsigned int ebx; + unsigned int ecx; + unsigned int edx; +}; +typedef struct xen_domctl_cpuid xen_domctl_cpuid_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_cpuid_t); +#endif + +#define XEN_DOMCTL_subscribe 29 +struct xen_domctl_subscribe { + uint32_t port; /* IN */ +}; +typedef struct xen_domctl_subscribe xen_domctl_subscribe_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_subscribe_t); + +/* + * Define the maximum machine address size which should be allocated + * to a guest. + */ +#define XEN_DOMCTL_set_machine_address_size 51 +#define XEN_DOMCTL_get_machine_address_size 52 + +/* + * Do not inject spurious page faults into this domain. + */ +#define XEN_DOMCTL_suppress_spurious_page_faults 53 + +struct xen_domctl { + uint32_t cmd; + uint32_t interface_version; /* XEN_DOMCTL_INTERFACE_VERSION */ + domid_t domain; + union { + struct xen_domctl_createdomain createdomain; + struct xen_domctl_getdomaininfo getdomaininfo; + struct xen_domctl_getmemlist getmemlist; + struct xen_domctl_getpageframeinfo getpageframeinfo; + struct xen_domctl_getpageframeinfo2 getpageframeinfo2; + struct xen_domctl_vcpuaffinity vcpuaffinity; + struct xen_domctl_shadow_op shadow_op; + struct xen_domctl_max_mem max_mem; + struct xen_domctl_vcpucontext vcpucontext; + struct xen_domctl_getvcpuinfo getvcpuinfo; + struct xen_domctl_max_vcpus max_vcpus; + struct xen_domctl_scheduler_op scheduler_op; + struct xen_domctl_setdomainhandle setdomainhandle; + struct xen_domctl_setdebugging setdebugging; + struct xen_domctl_irq_permission irq_permission; + struct xen_domctl_iomem_permission iomem_permission; + struct xen_domctl_ioport_permission ioport_permission; + struct xen_domctl_hypercall_init hypercall_init; + struct xen_domctl_arch_setup arch_setup; + struct xen_domctl_settimeoffset settimeoffset; + struct xen_domctl_real_mode_area real_mode_area; + struct xen_domctl_hvmcontext hvmcontext; + struct xen_domctl_address_size address_size; + struct xen_domctl_sendtrigger sendtrigger; + struct xen_domctl_get_device_group get_device_group; + struct xen_domctl_assign_device assign_device; + struct xen_domctl_bind_pt_irq bind_pt_irq; + struct xen_domctl_memory_mapping memory_mapping; + struct xen_domctl_ioport_mapping ioport_mapping; + struct xen_domctl_pin_mem_cacheattr pin_mem_cacheattr; + struct xen_domctl_ext_vcpucontext ext_vcpucontext; + struct xen_domctl_set_opt_feature set_opt_feature; + struct xen_domctl_set_target set_target; + struct xen_domctl_subscribe subscribe; +#if defined(__i386__) || defined(__x86_64__) + struct xen_domctl_cpuid cpuid; +#endif + uint8_t pad[128]; + } u; +}; +typedef struct xen_domctl xen_domctl_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_t); + +#endif /* __XEN_PUBLIC_DOMCTL_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/elfnote.h b/xen/public/elfnote.h new file mode 100644 index 0000000..77be41b --- /dev/null +++ b/xen/public/elfnote.h @@ -0,0 +1,233 @@ +/****************************************************************************** + * elfnote.h + * + * Definitions used for the Xen ELF notes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2006, Ian Campbell, XenSource Ltd. + */ + +#ifndef __XEN_PUBLIC_ELFNOTE_H__ +#define __XEN_PUBLIC_ELFNOTE_H__ + +/* + * The notes should live in a PT_NOTE segment and have "Xen" in the + * name field. + * + * Numeric types are either 4 or 8 bytes depending on the content of + * the desc field. + * + * LEGACY indicated the fields in the legacy __xen_guest string which + * this a note type replaces. + */ + +/* + * NAME=VALUE pair (string). + */ +#define XEN_ELFNOTE_INFO 0 + +/* + * The virtual address of the entry point (numeric). + * + * LEGACY: VIRT_ENTRY + */ +#define XEN_ELFNOTE_ENTRY 1 + +/* The virtual address of the hypercall transfer page (numeric). + * + * LEGACY: HYPERCALL_PAGE. (n.b. legacy value is a physical page + * number not a virtual address) + */ +#define XEN_ELFNOTE_HYPERCALL_PAGE 2 + +/* The virtual address where the kernel image should be mapped (numeric). + * + * Defaults to 0. + * + * LEGACY: VIRT_BASE + */ +#define XEN_ELFNOTE_VIRT_BASE 3 + +/* + * The offset of the ELF paddr field from the acutal required + * psuedo-physical address (numeric). + * + * This is used to maintain backwards compatibility with older kernels + * which wrote __PAGE_OFFSET into that field. This field defaults to 0 + * if not present. + * + * LEGACY: ELF_PADDR_OFFSET. (n.b. legacy default is VIRT_BASE) + */ +#define XEN_ELFNOTE_PADDR_OFFSET 4 + +/* + * The version of Xen that we work with (string). + * + * LEGACY: XEN_VER + */ +#define XEN_ELFNOTE_XEN_VERSION 5 + +/* + * The name of the guest operating system (string). + * + * LEGACY: GUEST_OS + */ +#define XEN_ELFNOTE_GUEST_OS 6 + +/* + * The version of the guest operating system (string). + * + * LEGACY: GUEST_VER + */ +#define XEN_ELFNOTE_GUEST_VERSION 7 + +/* + * The loader type (string). + * + * LEGACY: LOADER + */ +#define XEN_ELFNOTE_LOADER 8 + +/* + * The kernel supports PAE (x86/32 only, string = "yes", "no" or + * "bimodal"). + * + * For compatibility with Xen 3.0.3 and earlier the "bimodal" setting + * may be given as "yes,bimodal" which will cause older Xen to treat + * this kernel as PAE. + * + * LEGACY: PAE (n.b. The legacy interface included a provision to + * indicate 'extended-cr3' support allowing L3 page tables to be + * placed above 4G. It is assumed that any kernel new enough to use + * these ELF notes will include this and therefore "yes" here is + * equivalent to "yes[entended-cr3]" in the __xen_guest interface. + */ +#define XEN_ELFNOTE_PAE_MODE 9 + +/* + * The features supported/required by this kernel (string). + * + * The string must consist of a list of feature names (as given in + * features.h, without the "XENFEAT_" prefix) separated by '|' + * characters. If a feature is required for the kernel to function + * then the feature name must be preceded by a '!' character. + * + * LEGACY: FEATURES + */ +#define XEN_ELFNOTE_FEATURES 10 + +/* + * The kernel requires the symbol table to be loaded (string = "yes" or "no") + * LEGACY: BSD_SYMTAB (n.b. The legacy treated the presence or absence + * of this string as a boolean flag rather than requiring "yes" or + * "no". + */ +#define XEN_ELFNOTE_BSD_SYMTAB 11 + +/* + * The lowest address the hypervisor hole can begin at (numeric). + * + * This must not be set higher than HYPERVISOR_VIRT_START. Its presence + * also indicates to the hypervisor that the kernel can deal with the + * hole starting at a higher address. + */ +#define XEN_ELFNOTE_HV_START_LOW 12 + +/* + * List of maddr_t-sized mask/value pairs describing how to recognize + * (non-present) L1 page table entries carrying valid MFNs (numeric). + */ +#define XEN_ELFNOTE_L1_MFN_VALID 13 + +/* + * Whether or not the guest supports cooperative suspend cancellation. + */ +#define XEN_ELFNOTE_SUSPEND_CANCEL 14 + +/* + * The number of the highest elfnote defined. + */ +#define XEN_ELFNOTE_MAX XEN_ELFNOTE_SUSPEND_CANCEL + +/* + * System information exported through crash notes. + * + * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_INFO + * note in case of a system crash. This note will contain various + * information about the system, see xen/include/xen/elfcore.h. + */ +#define XEN_ELFNOTE_CRASH_INFO 0x1000001 + +/* + * System registers exported through crash notes. + * + * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_REGS + * note per cpu in case of a system crash. This note is architecture + * specific and will contain registers not saved in the "CORE" note. + * See xen/include/xen/elfcore.h for more information. + */ +#define XEN_ELFNOTE_CRASH_REGS 0x1000002 + + +/* + * xen dump-core none note. + * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_NONE + * in its dump file to indicate that the file is xen dump-core + * file. This note doesn't have any other information. + * See tools/libxc/xc_core.h for more information. + */ +#define XEN_ELFNOTE_DUMPCORE_NONE 0x2000000 + +/* + * xen dump-core header note. + * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_HEADER + * in its dump file. + * See tools/libxc/xc_core.h for more information. + */ +#define XEN_ELFNOTE_DUMPCORE_HEADER 0x2000001 + +/* + * xen dump-core xen version note. + * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_XEN_VERSION + * in its dump file. It contains the xen version obtained via the + * XENVER hypercall. + * See tools/libxc/xc_core.h for more information. + */ +#define XEN_ELFNOTE_DUMPCORE_XEN_VERSION 0x2000002 + +/* + * xen dump-core format version note. + * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION + * in its dump file. It contains a format version identifier. + * See tools/libxc/xc_core.h for more information. + */ +#define XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION 0x2000003 + +#endif /* __XEN_PUBLIC_ELFNOTE_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/elfstructs.h b/xen/public/elfstructs.h new file mode 100644 index 0000000..77362f3 --- /dev/null +++ b/xen/public/elfstructs.h @@ -0,0 +1,527 @@ +#ifndef __XEN_PUBLIC_ELFSTRUCTS_H__ +#define __XEN_PUBLIC_ELFSTRUCTS_H__ 1 +/* + * Copyright (c) 1995, 1996 Erik Theisen. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +typedef uint8_t Elf_Byte; + +typedef uint32_t Elf32_Addr; /* Unsigned program address */ +typedef uint32_t Elf32_Off; /* Unsigned file offset */ +typedef int32_t Elf32_Sword; /* Signed large integer */ +typedef uint32_t Elf32_Word; /* Unsigned large integer */ +typedef uint16_t Elf32_Half; /* Unsigned medium integer */ + +typedef uint64_t Elf64_Addr; +typedef uint64_t Elf64_Off; +typedef int32_t Elf64_Shalf; + +typedef int32_t Elf64_Sword; +typedef uint32_t Elf64_Word; + +typedef int64_t Elf64_Sxword; +typedef uint64_t Elf64_Xword; + +typedef uint32_t Elf64_Half; +typedef uint16_t Elf64_Quarter; + +/* + * e_ident[] identification indexes + * See http://www.caldera.com/developers/gabi/2000-07-17/ch4.eheader.html + */ +#define EI_MAG0 0 /* file ID */ +#define EI_MAG1 1 /* file ID */ +#define EI_MAG2 2 /* file ID */ +#define EI_MAG3 3 /* file ID */ +#define EI_CLASS 4 /* file class */ +#define EI_DATA 5 /* data encoding */ +#define EI_VERSION 6 /* ELF header version */ +#define EI_OSABI 7 /* OS/ABI ID */ +#define EI_ABIVERSION 8 /* ABI version */ +#define EI_PAD 9 /* start of pad bytes */ +#define EI_NIDENT 16 /* Size of e_ident[] */ + +/* e_ident[] magic number */ +#define ELFMAG0 0x7f /* e_ident[EI_MAG0] */ +#define ELFMAG1 'E' /* e_ident[EI_MAG1] */ +#define ELFMAG2 'L' /* e_ident[EI_MAG2] */ +#define ELFMAG3 'F' /* e_ident[EI_MAG3] */ +#define ELFMAG "\177ELF" /* magic */ +#define SELFMAG 4 /* size of magic */ + +/* e_ident[] file class */ +#define ELFCLASSNONE 0 /* invalid */ +#define ELFCLASS32 1 /* 32-bit objs */ +#define ELFCLASS64 2 /* 64-bit objs */ +#define ELFCLASSNUM 3 /* number of classes */ + +/* e_ident[] data encoding */ +#define ELFDATANONE 0 /* invalid */ +#define ELFDATA2LSB 1 /* Little-Endian */ +#define ELFDATA2MSB 2 /* Big-Endian */ +#define ELFDATANUM 3 /* number of data encode defines */ + +/* e_ident[] Operating System/ABI */ +#define ELFOSABI_SYSV 0 /* UNIX System V ABI */ +#define ELFOSABI_HPUX 1 /* HP-UX operating system */ +#define ELFOSABI_NETBSD 2 /* NetBSD */ +#define ELFOSABI_LINUX 3 /* GNU/Linux */ +#define ELFOSABI_HURD 4 /* GNU/Hurd */ +#define ELFOSABI_86OPEN 5 /* 86Open common IA32 ABI */ +#define ELFOSABI_SOLARIS 6 /* Solaris */ +#define ELFOSABI_MONTEREY 7 /* Monterey */ +#define ELFOSABI_IRIX 8 /* IRIX */ +#define ELFOSABI_FREEBSD 9 /* FreeBSD */ +#define ELFOSABI_TRU64 10 /* TRU64 UNIX */ +#define ELFOSABI_MODESTO 11 /* Novell Modesto */ +#define ELFOSABI_OPENBSD 12 /* OpenBSD */ +#define ELFOSABI_ARM 97 /* ARM */ +#define ELFOSABI_STANDALONE 255 /* Standalone (embedded) application */ + +/* e_ident */ +#define IS_ELF(ehdr) ((ehdr).e_ident[EI_MAG0] == ELFMAG0 && \ + (ehdr).e_ident[EI_MAG1] == ELFMAG1 && \ + (ehdr).e_ident[EI_MAG2] == ELFMAG2 && \ + (ehdr).e_ident[EI_MAG3] == ELFMAG3) + +/* ELF Header */ +typedef struct elfhdr { + unsigned char e_ident[EI_NIDENT]; /* ELF Identification */ + Elf32_Half e_type; /* object file type */ + Elf32_Half e_machine; /* machine */ + Elf32_Word e_version; /* object file version */ + Elf32_Addr e_entry; /* virtual entry point */ + Elf32_Off e_phoff; /* program header table offset */ + Elf32_Off e_shoff; /* section header table offset */ + Elf32_Word e_flags; /* processor-specific flags */ + Elf32_Half e_ehsize; /* ELF header size */ + Elf32_Half e_phentsize; /* program header entry size */ + Elf32_Half e_phnum; /* number of program header entries */ + Elf32_Half e_shentsize; /* section header entry size */ + Elf32_Half e_shnum; /* number of section header entries */ + Elf32_Half e_shstrndx; /* section header table's "section + header string table" entry offset */ +} Elf32_Ehdr; + +typedef struct { + unsigned char e_ident[EI_NIDENT]; /* Id bytes */ + Elf64_Quarter e_type; /* file type */ + Elf64_Quarter e_machine; /* machine type */ + Elf64_Half e_version; /* version number */ + Elf64_Addr e_entry; /* entry point */ + Elf64_Off e_phoff; /* Program hdr offset */ + Elf64_Off e_shoff; /* Section hdr offset */ + Elf64_Half e_flags; /* Processor flags */ + Elf64_Quarter e_ehsize; /* sizeof ehdr */ + Elf64_Quarter e_phentsize; /* Program header entry size */ + Elf64_Quarter e_phnum; /* Number of program headers */ + Elf64_Quarter e_shentsize; /* Section header entry size */ + Elf64_Quarter e_shnum; /* Number of section headers */ + Elf64_Quarter e_shstrndx; /* String table index */ +} Elf64_Ehdr; + +/* e_type */ +#define ET_NONE 0 /* No file type */ +#define ET_REL 1 /* relocatable file */ +#define ET_EXEC 2 /* executable file */ +#define ET_DYN 3 /* shared object file */ +#define ET_CORE 4 /* core file */ +#define ET_NUM 5 /* number of types */ +#define ET_LOPROC 0xff00 /* reserved range for processor */ +#define ET_HIPROC 0xffff /* specific e_type */ + +/* e_machine */ +#define EM_NONE 0 /* No Machine */ +#define EM_M32 1 /* AT&T WE 32100 */ +#define EM_SPARC 2 /* SPARC */ +#define EM_386 3 /* Intel 80386 */ +#define EM_68K 4 /* Motorola 68000 */ +#define EM_88K 5 /* Motorola 88000 */ +#define EM_486 6 /* Intel 80486 - unused? */ +#define EM_860 7 /* Intel 80860 */ +#define EM_MIPS 8 /* MIPS R3000 Big-Endian only */ +/* + * Don't know if EM_MIPS_RS4_BE, + * EM_SPARC64, EM_PARISC, + * or EM_PPC are ABI compliant + */ +#define EM_MIPS_RS4_BE 10 /* MIPS R4000 Big-Endian */ +#define EM_SPARC64 11 /* SPARC v9 64-bit unoffical */ +#define EM_PARISC 15 /* HPPA */ +#define EM_SPARC32PLUS 18 /* Enhanced instruction set SPARC */ +#define EM_PPC 20 /* PowerPC */ +#define EM_PPC64 21 /* PowerPC 64-bit */ +#define EM_ARM 40 /* Advanced RISC Machines ARM */ +#define EM_ALPHA 41 /* DEC ALPHA */ +#define EM_SPARCV9 43 /* SPARC version 9 */ +#define EM_ALPHA_EXP 0x9026 /* DEC ALPHA */ +#define EM_IA_64 50 /* Intel Merced */ +#define EM_X86_64 62 /* AMD x86-64 architecture */ +#define EM_VAX 75 /* DEC VAX */ + +/* Version */ +#define EV_NONE 0 /* Invalid */ +#define EV_CURRENT 1 /* Current */ +#define EV_NUM 2 /* number of versions */ + +/* Section Header */ +typedef struct { + Elf32_Word sh_name; /* name - index into section header + string table section */ + Elf32_Word sh_type; /* type */ + Elf32_Word sh_flags; /* flags */ + Elf32_Addr sh_addr; /* address */ + Elf32_Off sh_offset; /* file offset */ + Elf32_Word sh_size; /* section size */ + Elf32_Word sh_link; /* section header table index link */ + Elf32_Word sh_info; /* extra information */ + Elf32_Word sh_addralign; /* address alignment */ + Elf32_Word sh_entsize; /* section entry size */ +} Elf32_Shdr; + +typedef struct { + Elf64_Half sh_name; /* section name */ + Elf64_Half sh_type; /* section type */ + Elf64_Xword sh_flags; /* section flags */ + Elf64_Addr sh_addr; /* virtual address */ + Elf64_Off sh_offset; /* file offset */ + Elf64_Xword sh_size; /* section size */ + Elf64_Half sh_link; /* link to another */ + Elf64_Half sh_info; /* misc info */ + Elf64_Xword sh_addralign; /* memory alignment */ + Elf64_Xword sh_entsize; /* table entry size */ +} Elf64_Shdr; + +/* Special Section Indexes */ +#define SHN_UNDEF 0 /* undefined */ +#define SHN_LORESERVE 0xff00 /* lower bounds of reserved indexes */ +#define SHN_LOPROC 0xff00 /* reserved range for processor */ +#define SHN_HIPROC 0xff1f /* specific section indexes */ +#define SHN_ABS 0xfff1 /* absolute value */ +#define SHN_COMMON 0xfff2 /* common symbol */ +#define SHN_HIRESERVE 0xffff /* upper bounds of reserved indexes */ + +/* sh_type */ +#define SHT_NULL 0 /* inactive */ +#define SHT_PROGBITS 1 /* program defined information */ +#define SHT_SYMTAB 2 /* symbol table section */ +#define SHT_STRTAB 3 /* string table section */ +#define SHT_RELA 4 /* relocation section with addends*/ +#define SHT_HASH 5 /* symbol hash table section */ +#define SHT_DYNAMIC 6 /* dynamic section */ +#define SHT_NOTE 7 /* note section */ +#define SHT_NOBITS 8 /* no space section */ +#define SHT_REL 9 /* relation section without addends */ +#define SHT_SHLIB 10 /* reserved - purpose unknown */ +#define SHT_DYNSYM 11 /* dynamic symbol table section */ +#define SHT_NUM 12 /* number of section types */ +#define SHT_LOPROC 0x70000000 /* reserved range for processor */ +#define SHT_HIPROC 0x7fffffff /* specific section header types */ +#define SHT_LOUSER 0x80000000 /* reserved range for application */ +#define SHT_HIUSER 0xffffffff /* specific indexes */ + +/* Section names */ +#define ELF_BSS ".bss" /* uninitialized data */ +#define ELF_DATA ".data" /* initialized data */ +#define ELF_DEBUG ".debug" /* debug */ +#define ELF_DYNAMIC ".dynamic" /* dynamic linking information */ +#define ELF_DYNSTR ".dynstr" /* dynamic string table */ +#define ELF_DYNSYM ".dynsym" /* dynamic symbol table */ +#define ELF_FINI ".fini" /* termination code */ +#define ELF_GOT ".got" /* global offset table */ +#define ELF_HASH ".hash" /* symbol hash table */ +#define ELF_INIT ".init" /* initialization code */ +#define ELF_REL_DATA ".rel.data" /* relocation data */ +#define ELF_REL_FINI ".rel.fini" /* relocation termination code */ +#define ELF_REL_INIT ".rel.init" /* relocation initialization code */ +#define ELF_REL_DYN ".rel.dyn" /* relocaltion dynamic link info */ +#define ELF_REL_RODATA ".rel.rodata" /* relocation read-only data */ +#define ELF_REL_TEXT ".rel.text" /* relocation code */ +#define ELF_RODATA ".rodata" /* read-only data */ +#define ELF_SHSTRTAB ".shstrtab" /* section header string table */ +#define ELF_STRTAB ".strtab" /* string table */ +#define ELF_SYMTAB ".symtab" /* symbol table */ +#define ELF_TEXT ".text" /* code */ + + +/* Section Attribute Flags - sh_flags */ +#define SHF_WRITE 0x1 /* Writable */ +#define SHF_ALLOC 0x2 /* occupies memory */ +#define SHF_EXECINSTR 0x4 /* executable */ +#define SHF_MASKPROC 0xf0000000 /* reserved bits for processor */ + /* specific section attributes */ + +/* Symbol Table Entry */ +typedef struct elf32_sym { + Elf32_Word st_name; /* name - index into string table */ + Elf32_Addr st_value; /* symbol value */ + Elf32_Word st_size; /* symbol size */ + unsigned char st_info; /* type and binding */ + unsigned char st_other; /* 0 - no defined meaning */ + Elf32_Half st_shndx; /* section header index */ +} Elf32_Sym; + +typedef struct { + Elf64_Half st_name; /* Symbol name index in str table */ + Elf_Byte st_info; /* type / binding attrs */ + Elf_Byte st_other; /* unused */ + Elf64_Quarter st_shndx; /* section index of symbol */ + Elf64_Xword st_value; /* value of symbol */ + Elf64_Xword st_size; /* size of symbol */ +} Elf64_Sym; + +/* Symbol table index */ +#define STN_UNDEF 0 /* undefined */ + +/* Extract symbol info - st_info */ +#define ELF32_ST_BIND(x) ((x) >> 4) +#define ELF32_ST_TYPE(x) (((unsigned int) x) & 0xf) +#define ELF32_ST_INFO(b,t) (((b) << 4) + ((t) & 0xf)) + +#define ELF64_ST_BIND(x) ((x) >> 4) +#define ELF64_ST_TYPE(x) (((unsigned int) x) & 0xf) +#define ELF64_ST_INFO(b,t) (((b) << 4) + ((t) & 0xf)) + +/* Symbol Binding - ELF32_ST_BIND - st_info */ +#define STB_LOCAL 0 /* Local symbol */ +#define STB_GLOBAL 1 /* Global symbol */ +#define STB_WEAK 2 /* like global - lower precedence */ +#define STB_NUM 3 /* number of symbol bindings */ +#define STB_LOPROC 13 /* reserved range for processor */ +#define STB_HIPROC 15 /* specific symbol bindings */ + +/* Symbol type - ELF32_ST_TYPE - st_info */ +#define STT_NOTYPE 0 /* not specified */ +#define STT_OBJECT 1 /* data object */ +#define STT_FUNC 2 /* function */ +#define STT_SECTION 3 /* section */ +#define STT_FILE 4 /* file */ +#define STT_NUM 5 /* number of symbol types */ +#define STT_LOPROC 13 /* reserved range for processor */ +#define STT_HIPROC 15 /* specific symbol types */ + +/* Relocation entry with implicit addend */ +typedef struct { + Elf32_Addr r_offset; /* offset of relocation */ + Elf32_Word r_info; /* symbol table index and type */ +} Elf32_Rel; + +/* Relocation entry with explicit addend */ +typedef struct { + Elf32_Addr r_offset; /* offset of relocation */ + Elf32_Word r_info; /* symbol table index and type */ + Elf32_Sword r_addend; +} Elf32_Rela; + +/* Extract relocation info - r_info */ +#define ELF32_R_SYM(i) ((i) >> 8) +#define ELF32_R_TYPE(i) ((unsigned char) (i)) +#define ELF32_R_INFO(s,t) (((s) << 8) + (unsigned char)(t)) + +typedef struct { + Elf64_Xword r_offset; /* where to do it */ + Elf64_Xword r_info; /* index & type of relocation */ +} Elf64_Rel; + +typedef struct { + Elf64_Xword r_offset; /* where to do it */ + Elf64_Xword r_info; /* index & type of relocation */ + Elf64_Sxword r_addend; /* adjustment value */ +} Elf64_Rela; + +#define ELF64_R_SYM(info) ((info) >> 32) +#define ELF64_R_TYPE(info) ((info) & 0xFFFFFFFF) +#define ELF64_R_INFO(s,t) (((s) << 32) + (u_int32_t)(t)) + +/* Program Header */ +typedef struct { + Elf32_Word p_type; /* segment type */ + Elf32_Off p_offset; /* segment offset */ + Elf32_Addr p_vaddr; /* virtual address of segment */ + Elf32_Addr p_paddr; /* physical address - ignored? */ + Elf32_Word p_filesz; /* number of bytes in file for seg. */ + Elf32_Word p_memsz; /* number of bytes in mem. for seg. */ + Elf32_Word p_flags; /* flags */ + Elf32_Word p_align; /* memory alignment */ +} Elf32_Phdr; + +typedef struct { + Elf64_Half p_type; /* entry type */ + Elf64_Half p_flags; /* flags */ + Elf64_Off p_offset; /* offset */ + Elf64_Addr p_vaddr; /* virtual address */ + Elf64_Addr p_paddr; /* physical address */ + Elf64_Xword p_filesz; /* file size */ + Elf64_Xword p_memsz; /* memory size */ + Elf64_Xword p_align; /* memory & file alignment */ +} Elf64_Phdr; + +/* Segment types - p_type */ +#define PT_NULL 0 /* unused */ +#define PT_LOAD 1 /* loadable segment */ +#define PT_DYNAMIC 2 /* dynamic linking section */ +#define PT_INTERP 3 /* the RTLD */ +#define PT_NOTE 4 /* auxiliary information */ +#define PT_SHLIB 5 /* reserved - purpose undefined */ +#define PT_PHDR 6 /* program header */ +#define PT_NUM 7 /* Number of segment types */ +#define PT_LOPROC 0x70000000 /* reserved range for processor */ +#define PT_HIPROC 0x7fffffff /* specific segment types */ + +/* Segment flags - p_flags */ +#define PF_X 0x1 /* Executable */ +#define PF_W 0x2 /* Writable */ +#define PF_R 0x4 /* Readable */ +#define PF_MASKPROC 0xf0000000 /* reserved bits for processor */ + /* specific segment flags */ + +/* Dynamic structure */ +typedef struct { + Elf32_Sword d_tag; /* controls meaning of d_val */ + union { + Elf32_Word d_val; /* Multiple meanings - see d_tag */ + Elf32_Addr d_ptr; /* program virtual address */ + } d_un; +} Elf32_Dyn; + +typedef struct { + Elf64_Xword d_tag; /* controls meaning of d_val */ + union { + Elf64_Addr d_ptr; + Elf64_Xword d_val; + } d_un; +} Elf64_Dyn; + +/* Dynamic Array Tags - d_tag */ +#define DT_NULL 0 /* marks end of _DYNAMIC array */ +#define DT_NEEDED 1 /* string table offset of needed lib */ +#define DT_PLTRELSZ 2 /* size of relocation entries in PLT */ +#define DT_PLTGOT 3 /* address PLT/GOT */ +#define DT_HASH 4 /* address of symbol hash table */ +#define DT_STRTAB 5 /* address of string table */ +#define DT_SYMTAB 6 /* address of symbol table */ +#define DT_RELA 7 /* address of relocation table */ +#define DT_RELASZ 8 /* size of relocation table */ +#define DT_RELAENT 9 /* size of relocation entry */ +#define DT_STRSZ 10 /* size of string table */ +#define DT_SYMENT 11 /* size of symbol table entry */ +#define DT_INIT 12 /* address of initialization func. */ +#define DT_FINI 13 /* address of termination function */ +#define DT_SONAME 14 /* string table offset of shared obj */ +#define DT_RPATH 15 /* string table offset of library + search path */ +#define DT_SYMBOLIC 16 /* start sym search in shared obj. */ +#define DT_REL 17 /* address of rel. tbl. w addends */ +#define DT_RELSZ 18 /* size of DT_REL relocation table */ +#define DT_RELENT 19 /* size of DT_REL relocation entry */ +#define DT_PLTREL 20 /* PLT referenced relocation entry */ +#define DT_DEBUG 21 /* bugger */ +#define DT_TEXTREL 22 /* Allow rel. mod. to unwritable seg */ +#define DT_JMPREL 23 /* add. of PLT's relocation entries */ +#define DT_BIND_NOW 24 /* Bind now regardless of env setting */ +#define DT_NUM 25 /* Number used. */ +#define DT_LOPROC 0x70000000 /* reserved range for processor */ +#define DT_HIPROC 0x7fffffff /* specific dynamic array tags */ + +/* Standard ELF hashing function */ +unsigned int elf_hash(const unsigned char *name); + +/* + * Note Definitions + */ +typedef struct { + Elf32_Word namesz; + Elf32_Word descsz; + Elf32_Word type; +} Elf32_Note; + +typedef struct { + Elf64_Half namesz; + Elf64_Half descsz; + Elf64_Half type; +} Elf64_Note; + + +#if defined(ELFSIZE) +#define CONCAT(x,y) __CONCAT(x,y) +#define ELFNAME(x) CONCAT(elf,CONCAT(ELFSIZE,CONCAT(_,x))) +#define ELFNAME2(x,y) CONCAT(x,CONCAT(_elf,CONCAT(ELFSIZE,CONCAT(_,y)))) +#define ELFNAMEEND(x) CONCAT(x,CONCAT(_elf,ELFSIZE)) +#define ELFDEFNNAME(x) CONCAT(ELF,CONCAT(ELFSIZE,CONCAT(_,x))) +#endif + +#if defined(ELFSIZE) && (ELFSIZE == 32) +#define Elf_Ehdr Elf32_Ehdr +#define Elf_Phdr Elf32_Phdr +#define Elf_Shdr Elf32_Shdr +#define Elf_Sym Elf32_Sym +#define Elf_Rel Elf32_Rel +#define Elf_RelA Elf32_Rela +#define Elf_Dyn Elf32_Dyn +#define Elf_Word Elf32_Word +#define Elf_Sword Elf32_Sword +#define Elf_Addr Elf32_Addr +#define Elf_Off Elf32_Off +#define Elf_Nhdr Elf32_Nhdr +#define Elf_Note Elf32_Note + +#define ELF_R_SYM ELF32_R_SYM +#define ELF_R_TYPE ELF32_R_TYPE +#define ELF_R_INFO ELF32_R_INFO +#define ELFCLASS ELFCLASS32 + +#define ELF_ST_BIND ELF32_ST_BIND +#define ELF_ST_TYPE ELF32_ST_TYPE +#define ELF_ST_INFO ELF32_ST_INFO + +#define AuxInfo Aux32Info +#elif defined(ELFSIZE) && (ELFSIZE == 64) +#define Elf_Ehdr Elf64_Ehdr +#define Elf_Phdr Elf64_Phdr +#define Elf_Shdr Elf64_Shdr +#define Elf_Sym Elf64_Sym +#define Elf_Rel Elf64_Rel +#define Elf_RelA Elf64_Rela +#define Elf_Dyn Elf64_Dyn +#define Elf_Word Elf64_Word +#define Elf_Sword Elf64_Sword +#define Elf_Addr Elf64_Addr +#define Elf_Off Elf64_Off +#define Elf_Nhdr Elf64_Nhdr +#define Elf_Note Elf64_Note + +#define ELF_R_SYM ELF64_R_SYM +#define ELF_R_TYPE ELF64_R_TYPE +#define ELF_R_INFO ELF64_R_INFO +#define ELFCLASS ELFCLASS64 + +#define ELF_ST_BIND ELF64_ST_BIND +#define ELF_ST_TYPE ELF64_ST_TYPE +#define ELF_ST_INFO ELF64_ST_INFO + +#define AuxInfo Aux64Info +#endif + +#endif /* __XEN_PUBLIC_ELFSTRUCTS_H__ */ diff --git a/xen/public/event_channel.h b/xen/public/event_channel.h new file mode 100644 index 0000000..d35cce5 --- /dev/null +++ b/xen/public/event_channel.h @@ -0,0 +1,264 @@ +/****************************************************************************** + * event_channel.h + * + * Event channels between domains. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2003-2004, K A Fraser. + */ + +#ifndef __XEN_PUBLIC_EVENT_CHANNEL_H__ +#define __XEN_PUBLIC_EVENT_CHANNEL_H__ + +/* + * Prototype for this hypercall is: + * int event_channel_op(int cmd, void *args) + * @cmd == EVTCHNOP_??? (event-channel operation). + * @args == Operation-specific extra arguments (NULL if none). + */ + +typedef uint32_t evtchn_port_t; +DEFINE_XEN_GUEST_HANDLE(evtchn_port_t); + +/* + * EVTCHNOP_alloc_unbound: Allocate a port in domain <dom> and mark as + * accepting interdomain bindings from domain <remote_dom>. A fresh port + * is allocated in <dom> and returned as <port>. + * NOTES: + * 1. If the caller is unprivileged then <dom> must be DOMID_SELF. + * 2. <rdom> may be DOMID_SELF, allowing loopback connections. + */ +#define EVTCHNOP_alloc_unbound 6 +struct evtchn_alloc_unbound { + /* IN parameters */ + domid_t dom, remote_dom; + /* OUT parameters */ + evtchn_port_t port; +}; +typedef struct evtchn_alloc_unbound evtchn_alloc_unbound_t; + +/* + * EVTCHNOP_bind_interdomain: Construct an interdomain event channel between + * the calling domain and <remote_dom>. <remote_dom,remote_port> must identify + * a port that is unbound and marked as accepting bindings from the calling + * domain. A fresh port is allocated in the calling domain and returned as + * <local_port>. + * NOTES: + * 2. <remote_dom> may be DOMID_SELF, allowing loopback connections. + */ +#define EVTCHNOP_bind_interdomain 0 +struct evtchn_bind_interdomain { + /* IN parameters. */ + domid_t remote_dom; + evtchn_port_t remote_port; + /* OUT parameters. */ + evtchn_port_t local_port; +}; +typedef struct evtchn_bind_interdomain evtchn_bind_interdomain_t; + +/* + * EVTCHNOP_bind_virq: Bind a local event channel to VIRQ <irq> on specified + * vcpu. + * NOTES: + * 1. Virtual IRQs are classified as per-vcpu or global. See the VIRQ list + * in xen.h for the classification of each VIRQ. + * 2. Global VIRQs must be allocated on VCPU0 but can subsequently be + * re-bound via EVTCHNOP_bind_vcpu. + * 3. Per-vcpu VIRQs may be bound to at most one event channel per vcpu. + * The allocated event channel is bound to the specified vcpu and the + * binding cannot be changed. + */ +#define EVTCHNOP_bind_virq 1 +struct evtchn_bind_virq { + /* IN parameters. */ + uint32_t virq; + uint32_t vcpu; + /* OUT parameters. */ + evtchn_port_t port; +}; +typedef struct evtchn_bind_virq evtchn_bind_virq_t; + +/* + * EVTCHNOP_bind_pirq: Bind a local event channel to PIRQ <irq>. + * NOTES: + * 1. A physical IRQ may be bound to at most one event channel per domain. + * 2. Only a sufficiently-privileged domain may bind to a physical IRQ. + */ +#define EVTCHNOP_bind_pirq 2 +struct evtchn_bind_pirq { + /* IN parameters. */ + uint32_t pirq; +#define BIND_PIRQ__WILL_SHARE 1 + uint32_t flags; /* BIND_PIRQ__* */ + /* OUT parameters. */ + evtchn_port_t port; +}; +typedef struct evtchn_bind_pirq evtchn_bind_pirq_t; + +/* + * EVTCHNOP_bind_ipi: Bind a local event channel to receive events. + * NOTES: + * 1. The allocated event channel is bound to the specified vcpu. The binding + * may not be changed. + */ +#define EVTCHNOP_bind_ipi 7 +struct evtchn_bind_ipi { + uint32_t vcpu; + /* OUT parameters. */ + evtchn_port_t port; +}; +typedef struct evtchn_bind_ipi evtchn_bind_ipi_t; + +/* + * EVTCHNOP_close: Close a local event channel <port>. If the channel is + * interdomain then the remote end is placed in the unbound state + * (EVTCHNSTAT_unbound), awaiting a new connection. + */ +#define EVTCHNOP_close 3 +struct evtchn_close { + /* IN parameters. */ + evtchn_port_t port; +}; +typedef struct evtchn_close evtchn_close_t; + +/* + * EVTCHNOP_send: Send an event to the remote end of the channel whose local + * endpoint is <port>. + */ +#define EVTCHNOP_send 4 +struct evtchn_send { + /* IN parameters. */ + evtchn_port_t port; +}; +typedef struct evtchn_send evtchn_send_t; + +/* + * EVTCHNOP_status: Get the current status of the communication channel which + * has an endpoint at <dom, port>. + * NOTES: + * 1. <dom> may be specified as DOMID_SELF. + * 2. Only a sufficiently-privileged domain may obtain the status of an event + * channel for which <dom> is not DOMID_SELF. + */ +#define EVTCHNOP_status 5 +struct evtchn_status { + /* IN parameters */ + domid_t dom; + evtchn_port_t port; + /* OUT parameters */ +#define EVTCHNSTAT_closed 0 /* Channel is not in use. */ +#define EVTCHNSTAT_unbound 1 /* Channel is waiting interdom connection.*/ +#define EVTCHNSTAT_interdomain 2 /* Channel is connected to remote domain. */ +#define EVTCHNSTAT_pirq 3 /* Channel is bound to a phys IRQ line. */ +#define EVTCHNSTAT_virq 4 /* Channel is bound to a virtual IRQ line */ +#define EVTCHNSTAT_ipi 5 /* Channel is bound to a virtual IPI line */ + uint32_t status; + uint32_t vcpu; /* VCPU to which this channel is bound. */ + union { + struct { + domid_t dom; + } unbound; /* EVTCHNSTAT_unbound */ + struct { + domid_t dom; + evtchn_port_t port; + } interdomain; /* EVTCHNSTAT_interdomain */ + uint32_t pirq; /* EVTCHNSTAT_pirq */ + uint32_t virq; /* EVTCHNSTAT_virq */ + } u; +}; +typedef struct evtchn_status evtchn_status_t; + +/* + * EVTCHNOP_bind_vcpu: Specify which vcpu a channel should notify when an + * event is pending. + * NOTES: + * 1. IPI-bound channels always notify the vcpu specified at bind time. + * This binding cannot be changed. + * 2. Per-VCPU VIRQ channels always notify the vcpu specified at bind time. + * This binding cannot be changed. + * 3. All other channels notify vcpu0 by default. This default is set when + * the channel is allocated (a port that is freed and subsequently reused + * has its binding reset to vcpu0). + */ +#define EVTCHNOP_bind_vcpu 8 +struct evtchn_bind_vcpu { + /* IN parameters. */ + evtchn_port_t port; + uint32_t vcpu; +}; +typedef struct evtchn_bind_vcpu evtchn_bind_vcpu_t; + +/* + * EVTCHNOP_unmask: Unmask the specified local event-channel port and deliver + * a notification to the appropriate VCPU if an event is pending. + */ +#define EVTCHNOP_unmask 9 +struct evtchn_unmask { + /* IN parameters. */ + evtchn_port_t port; +}; +typedef struct evtchn_unmask evtchn_unmask_t; + +/* + * EVTCHNOP_reset: Close all event channels associated with specified domain. + * NOTES: + * 1. <dom> may be specified as DOMID_SELF. + * 2. Only a sufficiently-privileged domain may specify other than DOMID_SELF. + */ +#define EVTCHNOP_reset 10 +struct evtchn_reset { + /* IN parameters. */ + domid_t dom; +}; +typedef struct evtchn_reset evtchn_reset_t; + +/* + * Argument to event_channel_op_compat() hypercall. Superceded by new + * event_channel_op() hypercall since 0x00030202. + */ +struct evtchn_op { + uint32_t cmd; /* EVTCHNOP_* */ + union { + struct evtchn_alloc_unbound alloc_unbound; + struct evtchn_bind_interdomain bind_interdomain; + struct evtchn_bind_virq bind_virq; + struct evtchn_bind_pirq bind_pirq; + struct evtchn_bind_ipi bind_ipi; + struct evtchn_close close; + struct evtchn_send send; + struct evtchn_status status; + struct evtchn_bind_vcpu bind_vcpu; + struct evtchn_unmask unmask; + } u; +}; +typedef struct evtchn_op evtchn_op_t; +DEFINE_XEN_GUEST_HANDLE(evtchn_op_t); + +#endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/features.h b/xen/public/features.h new file mode 100644 index 0000000..879131c --- /dev/null +++ b/xen/public/features.h @@ -0,0 +1,83 @@ +/****************************************************************************** + * features.h + * + * Feature flags, reported by XENVER_get_features. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2006, Keir Fraser <keir@xensource.com> + */ + +#ifndef __XEN_PUBLIC_FEATURES_H__ +#define __XEN_PUBLIC_FEATURES_H__ + +/* + * If set, the guest does not need to write-protect its pagetables, and can + * update them via direct writes. + */ +#define XENFEAT_writable_page_tables 0 + +/* + * If set, the guest does not need to write-protect its segment descriptor + * tables, and can update them via direct writes. + */ +#define XENFEAT_writable_descriptor_tables 1 + +/* + * If set, translation between the guest's 'pseudo-physical' address space + * and the host's machine address space are handled by the hypervisor. In this + * mode the guest does not need to perform phys-to/from-machine translations + * when performing page table operations. + */ +#define XENFEAT_auto_translated_physmap 2 + +/* If set, the guest is running in supervisor mode (e.g., x86 ring 0). */ +#define XENFEAT_supervisor_mode_kernel 3 + +/* + * If set, the guest does not need to allocate x86 PAE page directories + * below 4GB. This flag is usually implied by auto_translated_physmap. + */ +#define XENFEAT_pae_pgdir_above_4gb 4 + +/* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */ +#define XENFEAT_mmu_pt_update_preserve_ad 5 + +/* x86: Does this Xen host support the MMU_{CLEAR,COPY}_PAGE hypercall? */ +#define XENFEAT_highmem_assist 6 + +/* + * If set, GNTTABOP_map_grant_ref honors flags to be placed into guest kernel + * available pte bits. + */ +#define XENFEAT_gnttab_map_avail_bits 7 + +#define XENFEAT_NR_SUBMAPS 1 + +#endif /* __XEN_PUBLIC_FEATURES_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/grant_table.h b/xen/public/grant_table.h new file mode 100644 index 0000000..ad116e7 --- /dev/null +++ b/xen/public/grant_table.h @@ -0,0 +1,438 @@ +/****************************************************************************** + * grant_table.h + * + * Interface for granting foreign access to page frames, and receiving + * page-ownership transfers. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004, K A Fraser + */ + +#ifndef __XEN_PUBLIC_GRANT_TABLE_H__ +#define __XEN_PUBLIC_GRANT_TABLE_H__ + + +/*********************************** + * GRANT TABLE REPRESENTATION + */ + +/* Some rough guidelines on accessing and updating grant-table entries + * in a concurrency-safe manner. For more information, Linux contains a + * reference implementation for guest OSes (arch/xen/kernel/grant_table.c). + * + * NB. WMB is a no-op on current-generation x86 processors. However, a + * compiler barrier will still be required. + * + * Introducing a valid entry into the grant table: + * 1. Write ent->domid. + * 2. Write ent->frame: + * GTF_permit_access: Frame to which access is permitted. + * GTF_accept_transfer: Pseudo-phys frame slot being filled by new + * frame, or zero if none. + * 3. Write memory barrier (WMB). + * 4. Write ent->flags, inc. valid type. + * + * Invalidating an unused GTF_permit_access entry: + * 1. flags = ent->flags. + * 2. Observe that !(flags & (GTF_reading|GTF_writing)). + * 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0). + * NB. No need for WMB as reuse of entry is control-dependent on success of + * step 3, and all architectures guarantee ordering of ctrl-dep writes. + * + * Invalidating an in-use GTF_permit_access entry: + * This cannot be done directly. Request assistance from the domain controller + * which can set a timeout on the use of a grant entry and take necessary + * action. (NB. This is not yet implemented!). + * + * Invalidating an unused GTF_accept_transfer entry: + * 1. flags = ent->flags. + * 2. Observe that !(flags & GTF_transfer_committed). [*] + * 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0). + * NB. No need for WMB as reuse of entry is control-dependent on success of + * step 3, and all architectures guarantee ordering of ctrl-dep writes. + * [*] If GTF_transfer_committed is set then the grant entry is 'committed'. + * The guest must /not/ modify the grant entry until the address of the + * transferred frame is written. It is safe for the guest to spin waiting + * for this to occur (detect by observing GTF_transfer_completed in + * ent->flags). + * + * Invalidating a committed GTF_accept_transfer entry: + * 1. Wait for (ent->flags & GTF_transfer_completed). + * + * Changing a GTF_permit_access from writable to read-only: + * Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing. + * + * Changing a GTF_permit_access from read-only to writable: + * Use SMP-safe bit-setting instruction. + */ + +/* + * A grant table comprises a packed array of grant entries in one or more + * page frames shared between Xen and a guest. + * [XEN]: This field is written by Xen and read by the sharing guest. + * [GST]: This field is written by the guest and read by Xen. + */ +struct grant_entry { + /* GTF_xxx: various type and flag information. [XEN,GST] */ + uint16_t flags; + /* The domain being granted foreign privileges. [GST] */ + domid_t domid; + /* + * GTF_permit_access: Frame that @domid is allowed to map and access. [GST] + * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN] + */ + uint32_t frame; +}; +typedef struct grant_entry grant_entry_t; + +/* + * Type of grant entry. + * GTF_invalid: This grant entry grants no privileges. + * GTF_permit_access: Allow @domid to map/access @frame. + * GTF_accept_transfer: Allow @domid to transfer ownership of one page frame + * to this guest. Xen writes the page number to @frame. + */ +#define GTF_invalid (0U<<0) +#define GTF_permit_access (1U<<0) +#define GTF_accept_transfer (2U<<0) +#define GTF_type_mask (3U<<0) + +/* + * Subflags for GTF_permit_access. + * GTF_readonly: Restrict @domid to read-only mappings and accesses. [GST] + * GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN] + * GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN] + * GTF_PAT, GTF_PWT, GTF_PCD: (x86) cache attribute flags for the grant [GST] + */ +#define _GTF_readonly (2) +#define GTF_readonly (1U<<_GTF_readonly) +#define _GTF_reading (3) +#define GTF_reading (1U<<_GTF_reading) +#define _GTF_writing (4) +#define GTF_writing (1U<<_GTF_writing) +#define _GTF_PWT (5) +#define GTF_PWT (1U<<_GTF_PWT) +#define _GTF_PCD (6) +#define GTF_PCD (1U<<_GTF_PCD) +#define _GTF_PAT (7) +#define GTF_PAT (1U<<_GTF_PAT) + +/* + * Subflags for GTF_accept_transfer: + * GTF_transfer_committed: Xen sets this flag to indicate that it is committed + * to transferring ownership of a page frame. When a guest sees this flag + * it must /not/ modify the grant entry until GTF_transfer_completed is + * set by Xen. + * GTF_transfer_completed: It is safe for the guest to spin-wait on this flag + * after reading GTF_transfer_committed. Xen will always write the frame + * address, followed by ORing this flag, in a timely manner. + */ +#define _GTF_transfer_committed (2) +#define GTF_transfer_committed (1U<<_GTF_transfer_committed) +#define _GTF_transfer_completed (3) +#define GTF_transfer_completed (1U<<_GTF_transfer_completed) + + +/*********************************** + * GRANT TABLE QUERIES AND USES + */ + +/* + * Reference to a grant entry in a specified domain's grant table. + */ +typedef uint32_t grant_ref_t; + +/* + * Handle to track a mapping created via a grant reference. + */ +typedef uint32_t grant_handle_t; + +/* + * GNTTABOP_map_grant_ref: Map the grant entry (<dom>,<ref>) for access + * by devices and/or host CPUs. If successful, <handle> is a tracking number + * that must be presented later to destroy the mapping(s). On error, <handle> + * is a negative status code. + * NOTES: + * 1. If GNTMAP_device_map is specified then <dev_bus_addr> is the address + * via which I/O devices may access the granted frame. + * 2. If GNTMAP_host_map is specified then a mapping will be added at + * either a host virtual address in the current address space, or at + * a PTE at the specified machine address. The type of mapping to + * perform is selected through the GNTMAP_contains_pte flag, and the + * address is specified in <host_addr>. + * 3. Mappings should only be destroyed via GNTTABOP_unmap_grant_ref. If a + * host mapping is destroyed by other means then it is *NOT* guaranteed + * to be accounted to the correct grant reference! + */ +#define GNTTABOP_map_grant_ref 0 +struct gnttab_map_grant_ref { + /* IN parameters. */ + uint64_t host_addr; + uint32_t flags; /* GNTMAP_* */ + grant_ref_t ref; + domid_t dom; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ + grant_handle_t handle; + uint64_t dev_bus_addr; +}; +typedef struct gnttab_map_grant_ref gnttab_map_grant_ref_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_map_grant_ref_t); + +/* + * GNTTABOP_unmap_grant_ref: Destroy one or more grant-reference mappings + * tracked by <handle>. If <host_addr> or <dev_bus_addr> is zero, that + * field is ignored. If non-zero, they must refer to a device/host mapping + * that is tracked by <handle> + * NOTES: + * 1. The call may fail in an undefined manner if either mapping is not + * tracked by <handle>. + * 3. After executing a batch of unmaps, it is guaranteed that no stale + * mappings will remain in the device or host TLBs. + */ +#define GNTTABOP_unmap_grant_ref 1 +struct gnttab_unmap_grant_ref { + /* IN parameters. */ + uint64_t host_addr; + uint64_t dev_bus_addr; + grant_handle_t handle; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ +}; +typedef struct gnttab_unmap_grant_ref gnttab_unmap_grant_ref_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t); + +/* + * GNTTABOP_setup_table: Set up a grant table for <dom> comprising at least + * <nr_frames> pages. The frame addresses are written to the <frame_list>. + * Only <nr_frames> addresses are written, even if the table is larger. + * NOTES: + * 1. <dom> may be specified as DOMID_SELF. + * 2. Only a sufficiently-privileged domain may specify <dom> != DOMID_SELF. + * 3. Xen may not support more than a single grant-table page per domain. + */ +#define GNTTABOP_setup_table 2 +struct gnttab_setup_table { + /* IN parameters. */ + domid_t dom; + uint32_t nr_frames; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ + XEN_GUEST_HANDLE(ulong) frame_list; +}; +typedef struct gnttab_setup_table gnttab_setup_table_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_setup_table_t); + +/* + * GNTTABOP_dump_table: Dump the contents of the grant table to the + * xen console. Debugging use only. + */ +#define GNTTABOP_dump_table 3 +struct gnttab_dump_table { + /* IN parameters. */ + domid_t dom; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ +}; +typedef struct gnttab_dump_table gnttab_dump_table_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_dump_table_t); + +/* + * GNTTABOP_transfer_grant_ref: Transfer <frame> to a foreign domain. The + * foreign domain has previously registered its interest in the transfer via + * <domid, ref>. + * + * Note that, even if the transfer fails, the specified page no longer belongs + * to the calling domain *unless* the error is GNTST_bad_page. + */ +#define GNTTABOP_transfer 4 +struct gnttab_transfer { + /* IN parameters. */ + xen_pfn_t mfn; + domid_t domid; + grant_ref_t ref; + /* OUT parameters. */ + int16_t status; +}; +typedef struct gnttab_transfer gnttab_transfer_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_transfer_t); + + +/* + * GNTTABOP_copy: Hypervisor based copy + * source and destinations can be eithers MFNs or, for foreign domains, + * grant references. the foreign domain has to grant read/write access + * in its grant table. + * + * The flags specify what type source and destinations are (either MFN + * or grant reference). + * + * Note that this can also be used to copy data between two domains + * via a third party if the source and destination domains had previously + * grant appropriate access to their pages to the third party. + * + * source_offset specifies an offset in the source frame, dest_offset + * the offset in the target frame and len specifies the number of + * bytes to be copied. + */ + +#define _GNTCOPY_source_gref (0) +#define GNTCOPY_source_gref (1<<_GNTCOPY_source_gref) +#define _GNTCOPY_dest_gref (1) +#define GNTCOPY_dest_gref (1<<_GNTCOPY_dest_gref) + +#define GNTTABOP_copy 5 +typedef struct gnttab_copy { + /* IN parameters. */ + struct { + union { + grant_ref_t ref; + xen_pfn_t gmfn; + } u; + domid_t domid; + uint16_t offset; + } source, dest; + uint16_t len; + uint16_t flags; /* GNTCOPY_* */ + /* OUT parameters. */ + int16_t status; +} gnttab_copy_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_copy_t); + +/* + * GNTTABOP_query_size: Query the current and maximum sizes of the shared + * grant table. + * NOTES: + * 1. <dom> may be specified as DOMID_SELF. + * 2. Only a sufficiently-privileged domain may specify <dom> != DOMID_SELF. + */ +#define GNTTABOP_query_size 6 +struct gnttab_query_size { + /* IN parameters. */ + domid_t dom; + /* OUT parameters. */ + uint32_t nr_frames; + uint32_t max_nr_frames; + int16_t status; /* GNTST_* */ +}; +typedef struct gnttab_query_size gnttab_query_size_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_query_size_t); + +/* + * GNTTABOP_unmap_and_replace: Destroy one or more grant-reference mappings + * tracked by <handle> but atomically replace the page table entry with one + * pointing to the machine address under <new_addr>. <new_addr> will be + * redirected to the null entry. + * NOTES: + * 1. The call may fail in an undefined manner if either mapping is not + * tracked by <handle>. + * 2. After executing a batch of unmaps, it is guaranteed that no stale + * mappings will remain in the device or host TLBs. + */ +#define GNTTABOP_unmap_and_replace 7 +struct gnttab_unmap_and_replace { + /* IN parameters. */ + uint64_t host_addr; + uint64_t new_addr; + grant_handle_t handle; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ +}; +typedef struct gnttab_unmap_and_replace gnttab_unmap_and_replace_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_and_replace_t); + + +/* + * Bitfield values for gnttab_map_grant_ref.flags. + */ + /* Map the grant entry for access by I/O devices. */ +#define _GNTMAP_device_map (0) +#define GNTMAP_device_map (1<<_GNTMAP_device_map) + /* Map the grant entry for access by host CPUs. */ +#define _GNTMAP_host_map (1) +#define GNTMAP_host_map (1<<_GNTMAP_host_map) + /* Accesses to the granted frame will be restricted to read-only access. */ +#define _GNTMAP_readonly (2) +#define GNTMAP_readonly (1<<_GNTMAP_readonly) + /* + * GNTMAP_host_map subflag: + * 0 => The host mapping is usable only by the guest OS. + * 1 => The host mapping is usable by guest OS + current application. + */ +#define _GNTMAP_application_map (3) +#define GNTMAP_application_map (1<<_GNTMAP_application_map) + + /* + * GNTMAP_contains_pte subflag: + * 0 => This map request contains a host virtual address. + * 1 => This map request contains the machine addess of the PTE to update. + */ +#define _GNTMAP_contains_pte (4) +#define GNTMAP_contains_pte (1<<_GNTMAP_contains_pte) + +/* + * Bits to be placed in guest kernel available PTE bits (architecture + * dependent; only supported when XENFEAT_gnttab_map_avail_bits is set). + */ +#define _GNTMAP_guest_avail0 (16) +#define GNTMAP_guest_avail_mask ((uint32_t)~0 << _GNTMAP_guest_avail0) + +/* + * Values for error status returns. All errors are -ve. + */ +#define GNTST_okay (0) /* Normal return. */ +#define GNTST_general_error (-1) /* General undefined error. */ +#define GNTST_bad_domain (-2) /* Unrecognsed domain id. */ +#define GNTST_bad_gntref (-3) /* Unrecognised or inappropriate gntref. */ +#define GNTST_bad_handle (-4) /* Unrecognised or inappropriate handle. */ +#define GNTST_bad_virt_addr (-5) /* Inappropriate virtual address to map. */ +#define GNTST_bad_dev_addr (-6) /* Inappropriate device address to unmap.*/ +#define GNTST_no_device_space (-7) /* Out of space in I/O MMU. */ +#define GNTST_permission_denied (-8) /* Not enough privilege for operation. */ +#define GNTST_bad_page (-9) /* Specified page was invalid for op. */ +#define GNTST_bad_copy_arg (-10) /* copy arguments cross page boundary. */ +#define GNTST_address_too_big (-11) /* transfer page address too large. */ + +#define GNTTABOP_error_msgs { \ + "okay", \ + "undefined error", \ + "unrecognised domain id", \ + "invalid grant reference", \ + "invalid mapping handle", \ + "invalid virtual address", \ + "invalid device address", \ + "no spare translation slot in the I/O MMU", \ + "permission denied", \ + "bad page", \ + "copy arguments cross page boundary", \ + "page address size too large" \ +} + +#endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/blkif.h b/xen/public/io/blkif.h new file mode 100644 index 0000000..2380066 --- /dev/null +++ b/xen/public/io/blkif.h @@ -0,0 +1,141 @@ +/****************************************************************************** + * blkif.h + * + * Unified block-device I/O interface for Xen guest OSes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2003-2004, Keir Fraser + */ + +#ifndef __XEN_PUBLIC_IO_BLKIF_H__ +#define __XEN_PUBLIC_IO_BLKIF_H__ + +#include "ring.h" +#include "../grant_table.h" + +/* + * Front->back notifications: When enqueuing a new request, sending a + * notification can be made conditional on req_event (i.e., the generic + * hold-off mechanism provided by the ring macros). Backends must set + * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()). + * + * Back->front notifications: When enqueuing a new response, sending a + * notification can be made conditional on rsp_event (i.e., the generic + * hold-off mechanism provided by the ring macros). Frontends must set + * rsp_event appropriately (e.g., using RING_FINAL_CHECK_FOR_RESPONSES()). + */ + +#ifndef blkif_vdev_t +#define blkif_vdev_t uint16_t +#endif +#define blkif_sector_t uint64_t + +/* + * REQUEST CODES. + */ +#define BLKIF_OP_READ 0 +#define BLKIF_OP_WRITE 1 +/* + * Recognised only if "feature-barrier" is present in backend xenbus info. + * The "feature-barrier" node contains a boolean indicating whether barrier + * requests are likely to succeed or fail. Either way, a barrier request + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by + * the underlying block-device hardware. The boolean simply indicates whether + * or not it is worthwhile for the frontend to attempt barrier requests. + * If a backend does not recognise BLKIF_OP_WRITE_BARRIER, it should *not* + * create the "feature-barrier" node! + */ +#define BLKIF_OP_WRITE_BARRIER 2 +/* + * Recognised if "feature-flush-cache" is present in backend xenbus + * info. A flush will ask the underlying storage hardware to flush its + * non-volatile caches as appropriate. The "feature-flush-cache" node + * contains a boolean indicating whether flush requests are likely to + * succeed or fail. Either way, a flush request may fail at any time + * with BLKIF_RSP_EOPNOTSUPP if it is unsupported by the underlying + * block-device hardware. The boolean simply indicates whether or not it + * is worthwhile for the frontend to attempt flushes. If a backend does + * not recognise BLKIF_OP_WRITE_FLUSH_CACHE, it should *not* create the + * "feature-flush-cache" node! + */ +#define BLKIF_OP_FLUSH_DISKCACHE 3 + +/* + * Maximum scatter/gather segments per request. + * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. + * NB. This could be 12 if the ring indexes weren't stored in the same page. + */ +#define BLKIF_MAX_SEGMENTS_PER_REQUEST 11 + +struct blkif_request_segment { + grant_ref_t gref; /* reference to I/O buffer frame */ + /* @first_sect: first sector in frame to transfer (inclusive). */ + /* @last_sect: last sector in frame to transfer (inclusive). */ + uint8_t first_sect, last_sect; +}; + +struct blkif_request { + uint8_t operation; /* BLKIF_OP_??? */ + uint8_t nr_segments; /* number of segments */ + blkif_vdev_t handle; /* only for read/write requests */ + uint64_t id; /* private guest value, echoed in resp */ + blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; +typedef struct blkif_request blkif_request_t; + +struct blkif_response { + uint64_t id; /* copied from request */ + uint8_t operation; /* copied from request */ + int16_t status; /* BLKIF_RSP_??? */ +}; +typedef struct blkif_response blkif_response_t; + +/* + * STATUS RETURN CODES. + */ + /* Operation not supported (only happens on barrier writes). */ +#define BLKIF_RSP_EOPNOTSUPP -2 + /* Operation failed for some unspecified reason (-EIO). */ +#define BLKIF_RSP_ERROR -1 + /* Operation completed successfully. */ +#define BLKIF_RSP_OKAY 0 + +/* + * Generate blkif ring structures and types. + */ + +DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); + +#define VDISK_CDROM 0x1 +#define VDISK_REMOVABLE 0x2 +#define VDISK_READONLY 0x4 + +#endif /* __XEN_PUBLIC_IO_BLKIF_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/console.h b/xen/public/io/console.h new file mode 100644 index 0000000..4b8c01a --- /dev/null +++ b/xen/public/io/console.h @@ -0,0 +1,51 @@ +/****************************************************************************** + * console.h + * + * Console I/O interface for Xen guest OSes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, Keir Fraser + */ + +#ifndef __XEN_PUBLIC_IO_CONSOLE_H__ +#define __XEN_PUBLIC_IO_CONSOLE_H__ + +typedef uint32_t XENCONS_RING_IDX; + +#define MASK_XENCONS_IDX(idx, ring) ((idx) & (sizeof(ring)-1)) + +struct xencons_interface { + char in[1024]; + char out[2048]; + XENCONS_RING_IDX in_cons, in_prod; + XENCONS_RING_IDX out_cons, out_prod; +}; + +#endif /* __XEN_PUBLIC_IO_CONSOLE_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/fbif.h b/xen/public/io/fbif.h new file mode 100644 index 0000000..95377a0 --- /dev/null +++ b/xen/public/io/fbif.h @@ -0,0 +1,176 @@ +/* + * fbif.h -- Xen virtual frame buffer device + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) 2005 Anthony Liguori <aliguori@us.ibm.com> + * Copyright (C) 2006 Red Hat, Inc., Markus Armbruster <armbru@redhat.com> + */ + +#ifndef __XEN_PUBLIC_IO_FBIF_H__ +#define __XEN_PUBLIC_IO_FBIF_H__ + +/* Out events (frontend -> backend) */ + +/* + * Out events may be sent only when requested by backend, and receipt + * of an unknown out event is an error. + */ + +/* Event type 1 currently not used */ +/* + * Framebuffer update notification event + * Capable frontend sets feature-update in xenstore. + * Backend requests it by setting request-update in xenstore. + */ +#define XENFB_TYPE_UPDATE 2 + +struct xenfb_update +{ + uint8_t type; /* XENFB_TYPE_UPDATE */ + int32_t x; /* source x */ + int32_t y; /* source y */ + int32_t width; /* rect width */ + int32_t height; /* rect height */ +}; + +/* + * Framebuffer resize notification event + * Capable backend sets feature-resize in xenstore. + */ +#define XENFB_TYPE_RESIZE 3 + +struct xenfb_resize +{ + uint8_t type; /* XENFB_TYPE_RESIZE */ + int32_t width; /* width in pixels */ + int32_t height; /* height in pixels */ + int32_t stride; /* stride in bytes */ + int32_t depth; /* depth in bits */ + int32_t offset; /* offset of the framebuffer in bytes */ +}; + +#define XENFB_OUT_EVENT_SIZE 40 + +union xenfb_out_event +{ + uint8_t type; + struct xenfb_update update; + struct xenfb_resize resize; + char pad[XENFB_OUT_EVENT_SIZE]; +}; + +/* In events (backend -> frontend) */ + +/* + * Frontends should ignore unknown in events. + */ + +/* + * Framebuffer refresh period advice + * Backend sends it to advise the frontend their preferred period of + * refresh. Frontends that keep the framebuffer constantly up-to-date + * just ignore it. Frontends that use the advice should immediately + * refresh the framebuffer (and send an update notification event if + * those have been requested), then use the update frequency to guide + * their periodical refreshs. + */ +#define XENFB_TYPE_REFRESH_PERIOD 1 +#define XENFB_NO_REFRESH 0 + +struct xenfb_refresh_period +{ + uint8_t type; /* XENFB_TYPE_UPDATE_PERIOD */ + uint32_t period; /* period of refresh, in ms, + * XENFB_NO_REFRESH if no refresh is needed */ +}; + +#define XENFB_IN_EVENT_SIZE 40 + +union xenfb_in_event +{ + uint8_t type; + struct xenfb_refresh_period refresh_period; + char pad[XENFB_IN_EVENT_SIZE]; +}; + +/* shared page */ + +#define XENFB_IN_RING_SIZE 1024 +#define XENFB_IN_RING_LEN (XENFB_IN_RING_SIZE / XENFB_IN_EVENT_SIZE) +#define XENFB_IN_RING_OFFS 1024 +#define XENFB_IN_RING(page) \ + ((union xenfb_in_event *)((char *)(page) + XENFB_IN_RING_OFFS)) +#define XENFB_IN_RING_REF(page, idx) \ + (XENFB_IN_RING((page))[(idx) % XENFB_IN_RING_LEN]) + +#define XENFB_OUT_RING_SIZE 2048 +#define XENFB_OUT_RING_LEN (XENFB_OUT_RING_SIZE / XENFB_OUT_EVENT_SIZE) +#define XENFB_OUT_RING_OFFS (XENFB_IN_RING_OFFS + XENFB_IN_RING_SIZE) +#define XENFB_OUT_RING(page) \ + ((union xenfb_out_event *)((char *)(page) + XENFB_OUT_RING_OFFS)) +#define XENFB_OUT_RING_REF(page, idx) \ + (XENFB_OUT_RING((page))[(idx) % XENFB_OUT_RING_LEN]) + +struct xenfb_page +{ + uint32_t in_cons, in_prod; + uint32_t out_cons, out_prod; + + int32_t width; /* the width of the framebuffer (in pixels) */ + int32_t height; /* the height of the framebuffer (in pixels) */ + uint32_t line_length; /* the length of a row of pixels (in bytes) */ + uint32_t mem_length; /* the length of the framebuffer (in bytes) */ + uint8_t depth; /* the depth of a pixel (in bits) */ + + /* + * Framebuffer page directory + * + * Each directory page holds PAGE_SIZE / sizeof(*pd) + * framebuffer pages, and can thus map up to PAGE_SIZE * + * PAGE_SIZE / sizeof(*pd) bytes. With PAGE_SIZE == 4096 and + * sizeof(unsigned long) == 4/8, that's 4 Megs 32 bit and 2 Megs + * 64 bit. 256 directories give enough room for a 512 Meg + * framebuffer with a max resolution of 12,800x10,240. Should + * be enough for a while with room leftover for expansion. + */ + unsigned long pd[256]; +}; + +/* + * Wart: xenkbd needs to know default resolution. Put it here until a + * better solution is found, but don't leak it to the backend. + */ +#ifdef __KERNEL__ +#define XENFB_WIDTH 800 +#define XENFB_HEIGHT 600 +#define XENFB_DEPTH 32 +#endif + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/fsif.h b/xen/public/io/fsif.h new file mode 100644 index 0000000..04ef928 --- /dev/null +++ b/xen/public/io/fsif.h @@ -0,0 +1,191 @@ +/****************************************************************************** + * fsif.h + * + * Interface to FS level split device drivers. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2007, Grzegorz Milos, <gm281@cam.ac.uk>. + */ + +#ifndef __XEN_PUBLIC_IO_FSIF_H__ +#define __XEN_PUBLIC_IO_FSIF_H__ + +#include "ring.h" +#include "../grant_table.h" + +#define REQ_FILE_OPEN 1 +#define REQ_FILE_CLOSE 2 +#define REQ_FILE_READ 3 +#define REQ_FILE_WRITE 4 +#define REQ_STAT 5 +#define REQ_FILE_TRUNCATE 6 +#define REQ_REMOVE 7 +#define REQ_RENAME 8 +#define REQ_CREATE 9 +#define REQ_DIR_LIST 10 +#define REQ_CHMOD 11 +#define REQ_FS_SPACE 12 +#define REQ_FILE_SYNC 13 + +struct fsif_open_request { + grant_ref_t gref; +}; + +struct fsif_close_request { + uint32_t fd; +}; + +struct fsif_read_request { + uint32_t fd; + int32_t pad; + uint64_t len; + uint64_t offset; + grant_ref_t grefs[1]; /* Variable length */ +}; + +struct fsif_write_request { + uint32_t fd; + int32_t pad; + uint64_t len; + uint64_t offset; + grant_ref_t grefs[1]; /* Variable length */ +}; + +struct fsif_stat_request { + uint32_t fd; +}; + +/* This structure is a copy of some fields from stat structure, returned + * via the ring. */ +struct fsif_stat_response { + int32_t stat_mode; + uint32_t stat_uid; + uint32_t stat_gid; + int32_t stat_ret; + int64_t stat_size; + int64_t stat_atime; + int64_t stat_mtime; + int64_t stat_ctime; +}; + +struct fsif_truncate_request { + uint32_t fd; + int32_t pad; + int64_t length; +}; + +struct fsif_remove_request { + grant_ref_t gref; +}; + +struct fsif_rename_request { + uint16_t old_name_offset; + uint16_t new_name_offset; + grant_ref_t gref; +}; + +struct fsif_create_request { + int8_t directory; + int8_t pad; + int16_t pad2; + int32_t mode; + grant_ref_t gref; +}; + +struct fsif_list_request { + uint32_t offset; + grant_ref_t gref; +}; + +#define NR_FILES_SHIFT 0 +#define NR_FILES_SIZE 16 /* 16 bits for the number of files mask */ +#define NR_FILES_MASK (((1ULL << NR_FILES_SIZE) - 1) << NR_FILES_SHIFT) +#define ERROR_SIZE 32 /* 32 bits for the error mask */ +#define ERROR_SHIFT (NR_FILES_SIZE + NR_FILES_SHIFT) +#define ERROR_MASK (((1ULL << ERROR_SIZE) - 1) << ERROR_SHIFT) +#define HAS_MORE_SHIFT (ERROR_SHIFT + ERROR_SIZE) +#define HAS_MORE_FLAG (1ULL << HAS_MORE_SHIFT) + +struct fsif_chmod_request { + uint32_t fd; + int32_t mode; +}; + +struct fsif_space_request { + grant_ref_t gref; +}; + +struct fsif_sync_request { + uint32_t fd; +}; + + +/* FS operation request */ +struct fsif_request { + uint8_t type; /* Type of the request */ + uint8_t pad; + uint16_t id; /* Request ID, copied to the response */ + uint32_t pad2; + union { + struct fsif_open_request fopen; + struct fsif_close_request fclose; + struct fsif_read_request fread; + struct fsif_write_request fwrite; + struct fsif_stat_request fstat; + struct fsif_truncate_request ftruncate; + struct fsif_remove_request fremove; + struct fsif_rename_request frename; + struct fsif_create_request fcreate; + struct fsif_list_request flist; + struct fsif_chmod_request fchmod; + struct fsif_space_request fspace; + struct fsif_sync_request fsync; + } u; +}; +typedef struct fsif_request fsif_request_t; + +/* FS operation response */ +struct fsif_response { + uint16_t id; + uint16_t pad1; + uint32_t pad2; + union { + uint64_t ret_val; + struct fsif_stat_response fstat; + }; +}; + +typedef struct fsif_response fsif_response_t; + +#define FSIF_RING_ENTRY_SIZE 64 + +#define FSIF_NR_READ_GNTS ((FSIF_RING_ENTRY_SIZE - sizeof(struct fsif_read_request)) / \ + sizeof(grant_ref_t) + 1) +#define FSIF_NR_WRITE_GNTS ((FSIF_RING_ENTRY_SIZE - sizeof(struct fsif_write_request)) / \ + sizeof(grant_ref_t) + 1) + +DEFINE_RING_TYPES(fsif, struct fsif_request, struct fsif_response); + +#define STATE_INITIALISED "init" +#define STATE_READY "ready" + + + +#endif diff --git a/xen/public/io/kbdif.h b/xen/public/io/kbdif.h new file mode 100644 index 0000000..e1d66a5 --- /dev/null +++ b/xen/public/io/kbdif.h @@ -0,0 +1,132 @@ +/* + * kbdif.h -- Xen virtual keyboard/mouse + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) 2005 Anthony Liguori <aliguori@us.ibm.com> + * Copyright (C) 2006 Red Hat, Inc., Markus Armbruster <armbru@redhat.com> + */ + +#ifndef __XEN_PUBLIC_IO_KBDIF_H__ +#define __XEN_PUBLIC_IO_KBDIF_H__ + +/* In events (backend -> frontend) */ + +/* + * Frontends should ignore unknown in events. + */ + +/* Pointer movement event */ +#define XENKBD_TYPE_MOTION 1 +/* Event type 2 currently not used */ +/* Key event (includes pointer buttons) */ +#define XENKBD_TYPE_KEY 3 +/* + * Pointer position event + * Capable backend sets feature-abs-pointer in xenstore. + * Frontend requests ot instead of XENKBD_TYPE_MOTION by setting + * request-abs-update in xenstore. + */ +#define XENKBD_TYPE_POS 4 + +struct xenkbd_motion +{ + uint8_t type; /* XENKBD_TYPE_MOTION */ + int32_t rel_x; /* relative X motion */ + int32_t rel_y; /* relative Y motion */ + int32_t rel_z; /* relative Z motion (wheel) */ +}; + +struct xenkbd_key +{ + uint8_t type; /* XENKBD_TYPE_KEY */ + uint8_t pressed; /* 1 if pressed; 0 otherwise */ + uint32_t keycode; /* KEY_* from linux/input.h */ +}; + +struct xenkbd_position +{ + uint8_t type; /* XENKBD_TYPE_POS */ + int32_t abs_x; /* absolute X position (in FB pixels) */ + int32_t abs_y; /* absolute Y position (in FB pixels) */ + int32_t rel_z; /* relative Z motion (wheel) */ +}; + +#define XENKBD_IN_EVENT_SIZE 40 + +union xenkbd_in_event +{ + uint8_t type; + struct xenkbd_motion motion; + struct xenkbd_key key; + struct xenkbd_position pos; + char pad[XENKBD_IN_EVENT_SIZE]; +}; + +/* Out events (frontend -> backend) */ + +/* + * Out events may be sent only when requested by backend, and receipt + * of an unknown out event is an error. + * No out events currently defined. + */ + +#define XENKBD_OUT_EVENT_SIZE 40 + +union xenkbd_out_event +{ + uint8_t type; + char pad[XENKBD_OUT_EVENT_SIZE]; +}; + +/* shared page */ + +#define XENKBD_IN_RING_SIZE 2048 +#define XENKBD_IN_RING_LEN (XENKBD_IN_RING_SIZE / XENKBD_IN_EVENT_SIZE) +#define XENKBD_IN_RING_OFFS 1024 +#define XENKBD_IN_RING(page) \ + ((union xenkbd_in_event *)((char *)(page) + XENKBD_IN_RING_OFFS)) +#define XENKBD_IN_RING_REF(page, idx) \ + (XENKBD_IN_RING((page))[(idx) % XENKBD_IN_RING_LEN]) + +#define XENKBD_OUT_RING_SIZE 1024 +#define XENKBD_OUT_RING_LEN (XENKBD_OUT_RING_SIZE / XENKBD_OUT_EVENT_SIZE) +#define XENKBD_OUT_RING_OFFS (XENKBD_IN_RING_OFFS + XENKBD_IN_RING_SIZE) +#define XENKBD_OUT_RING(page) \ + ((union xenkbd_out_event *)((char *)(page) + XENKBD_OUT_RING_OFFS)) +#define XENKBD_OUT_RING_REF(page, idx) \ + (XENKBD_OUT_RING((page))[(idx) % XENKBD_OUT_RING_LEN]) + +struct xenkbd_page +{ + uint32_t in_cons, in_prod; + uint32_t out_cons, out_prod; +}; + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/netif.h b/xen/public/io/netif.h new file mode 100644 index 0000000..fbb5c27 --- /dev/null +++ b/xen/public/io/netif.h @@ -0,0 +1,205 @@ +/****************************************************************************** + * netif.h + * + * Unified network-device I/O interface for Xen guest OSes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2003-2004, Keir Fraser + */ + +#ifndef __XEN_PUBLIC_IO_NETIF_H__ +#define __XEN_PUBLIC_IO_NETIF_H__ + +#include "ring.h" +#include "../grant_table.h" + +/* + * Notifications after enqueuing any type of message should be conditional on + * the appropriate req_event or rsp_event field in the shared ring. + * If the client sends notification for rx requests then it should specify + * feature 'feature-rx-notify' via xenbus. Otherwise the backend will assume + * that it cannot safely queue packets (as it may not be kicked to send them). + */ + +/* + * This is the 'wire' format for packets: + * Request 1: netif_tx_request -- NETTXF_* (any flags) + * [Request 2: netif_tx_extra] (only if request 1 has NETTXF_extra_info) + * [Request 3: netif_tx_extra] (only if request 2 has XEN_NETIF_EXTRA_MORE) + * Request 4: netif_tx_request -- NETTXF_more_data + * Request 5: netif_tx_request -- NETTXF_more_data + * ... + * Request N: netif_tx_request -- 0 + */ + +/* Protocol checksum field is blank in the packet (hardware offload)? */ +#define _NETTXF_csum_blank (0) +#define NETTXF_csum_blank (1U<<_NETTXF_csum_blank) + +/* Packet data has been validated against protocol checksum. */ +#define _NETTXF_data_validated (1) +#define NETTXF_data_validated (1U<<_NETTXF_data_validated) + +/* Packet continues in the next request descriptor. */ +#define _NETTXF_more_data (2) +#define NETTXF_more_data (1U<<_NETTXF_more_data) + +/* Packet to be followed by extra descriptor(s). */ +#define _NETTXF_extra_info (3) +#define NETTXF_extra_info (1U<<_NETTXF_extra_info) + +struct netif_tx_request { + grant_ref_t gref; /* Reference to buffer page */ + uint16_t offset; /* Offset within buffer page */ + uint16_t flags; /* NETTXF_* */ + uint16_t id; /* Echoed in response message. */ + uint16_t size; /* Packet size in bytes. */ +}; +typedef struct netif_tx_request netif_tx_request_t; + +/* Types of netif_extra_info descriptors. */ +#define XEN_NETIF_EXTRA_TYPE_NONE (0) /* Never used - invalid */ +#define XEN_NETIF_EXTRA_TYPE_GSO (1) /* u.gso */ +#define XEN_NETIF_EXTRA_TYPE_MCAST_ADD (2) /* u.mcast */ +#define XEN_NETIF_EXTRA_TYPE_MCAST_DEL (3) /* u.mcast */ +#define XEN_NETIF_EXTRA_TYPE_MAX (4) + +/* netif_extra_info flags. */ +#define _XEN_NETIF_EXTRA_FLAG_MORE (0) +#define XEN_NETIF_EXTRA_FLAG_MORE (1U<<_XEN_NETIF_EXTRA_FLAG_MORE) + +/* GSO types - only TCPv4 currently supported. */ +#define XEN_NETIF_GSO_TYPE_TCPV4 (1) + +/* + * This structure needs to fit within both netif_tx_request and + * netif_rx_response for compatibility. + */ +struct netif_extra_info { + uint8_t type; /* XEN_NETIF_EXTRA_TYPE_* */ + uint8_t flags; /* XEN_NETIF_EXTRA_FLAG_* */ + + union { + /* + * XEN_NETIF_EXTRA_TYPE_GSO: + */ + struct { + /* + * Maximum payload size of each segment. For example, for TCP this + * is just the path MSS. + */ + uint16_t size; + + /* + * GSO type. This determines the protocol of the packet and any + * extra features required to segment the packet properly. + */ + uint8_t type; /* XEN_NETIF_GSO_TYPE_* */ + + /* Future expansion. */ + uint8_t pad; + + /* + * GSO features. This specifies any extra GSO features required + * to process this packet, such as ECN support for TCPv4. + */ + uint16_t features; /* XEN_NETIF_GSO_FEAT_* */ + } gso; + + /* + * XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL}: + * Backend advertises availability via 'feature-multicast-control' + * xenbus node containing value '1'. + * Frontend requests this feature by advertising + * 'request-multicast-control' xenbus node containing value '1'. + * If multicast control is requested then multicast flooding is + * disabled and the frontend must explicitly register its interest + * in multicast groups using dummy transmit requests containing + * MCAST_{ADD,DEL} extra-info fragments. + */ + struct { + uint8_t addr[6]; /* Address to add/remove. */ + } mcast; + + uint16_t pad[3]; + } u; +}; +typedef struct netif_extra_info netif_extra_info_t; + +struct netif_tx_response { + uint16_t id; + int16_t status; /* NETIF_RSP_* */ +}; +typedef struct netif_tx_response netif_tx_response_t; + +struct netif_rx_request { + uint16_t id; /* Echoed in response message. */ + grant_ref_t gref; /* Reference to incoming granted frame */ +}; +typedef struct netif_rx_request netif_rx_request_t; + +/* Packet data has been validated against protocol checksum. */ +#define _NETRXF_data_validated (0) +#define NETRXF_data_validated (1U<<_NETRXF_data_validated) + +/* Protocol checksum field is blank in the packet (hardware offload)? */ +#define _NETRXF_csum_blank (1) +#define NETRXF_csum_blank (1U<<_NETRXF_csum_blank) + +/* Packet continues in the next request descriptor. */ +#define _NETRXF_more_data (2) +#define NETRXF_more_data (1U<<_NETRXF_more_data) + +/* Packet to be followed by extra descriptor(s). */ +#define _NETRXF_extra_info (3) +#define NETRXF_extra_info (1U<<_NETRXF_extra_info) + +struct netif_rx_response { + uint16_t id; + uint16_t offset; /* Offset in page of start of received packet */ + uint16_t flags; /* NETRXF_* */ + int16_t status; /* -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */ +}; +typedef struct netif_rx_response netif_rx_response_t; + +/* + * Generate netif ring structures and types. + */ + +DEFINE_RING_TYPES(netif_tx, struct netif_tx_request, struct netif_tx_response); +DEFINE_RING_TYPES(netif_rx, struct netif_rx_request, struct netif_rx_response); + +#define NETIF_RSP_DROPPED -2 +#define NETIF_RSP_ERROR -1 +#define NETIF_RSP_OKAY 0 +/* No response: used for auxiliary requests (e.g., netif_tx_extra). */ +#define NETIF_RSP_NULL 1 + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/pciif.h b/xen/public/io/pciif.h new file mode 100644 index 0000000..0a0ffcc --- /dev/null +++ b/xen/public/io/pciif.h @@ -0,0 +1,101 @@ +/* + * PCI Backend/Frontend Common Data Structures & Macros + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Author: Ryan Wilson <hap9@epoch.ncsc.mil> + */ +#ifndef __XEN_PCI_COMMON_H__ +#define __XEN_PCI_COMMON_H__ + +/* Be sure to bump this number if you change this file */ +#define XEN_PCI_MAGIC "7" + +/* xen_pci_sharedinfo flags */ +#define _XEN_PCIF_active (0) +#define XEN_PCIF_active (1<<_XEN_PCI_active) + +/* xen_pci_op commands */ +#define XEN_PCI_OP_conf_read (0) +#define XEN_PCI_OP_conf_write (1) +#define XEN_PCI_OP_enable_msi (2) +#define XEN_PCI_OP_disable_msi (3) +#define XEN_PCI_OP_enable_msix (4) +#define XEN_PCI_OP_disable_msix (5) + +/* xen_pci_op error numbers */ +#define XEN_PCI_ERR_success (0) +#define XEN_PCI_ERR_dev_not_found (-1) +#define XEN_PCI_ERR_invalid_offset (-2) +#define XEN_PCI_ERR_access_denied (-3) +#define XEN_PCI_ERR_not_implemented (-4) +/* XEN_PCI_ERR_op_failed - backend failed to complete the operation */ +#define XEN_PCI_ERR_op_failed (-5) + +/* + * it should be PAGE_SIZE-sizeof(struct xen_pci_op))/sizeof(struct msix_entry)) + * Should not exceed 128 + */ +#define SH_INFO_MAX_VEC 128 + +struct xen_msix_entry { + uint16_t vector; + uint16_t entry; +}; +struct xen_pci_op { + /* IN: what action to perform: XEN_PCI_OP_* */ + uint32_t cmd; + + /* OUT: will contain an error number (if any) from errno.h */ + int32_t err; + + /* IN: which device to touch */ + uint32_t domain; /* PCI Domain/Segment */ + uint32_t bus; + uint32_t devfn; + + /* IN: which configuration registers to touch */ + int32_t offset; + int32_t size; + + /* IN/OUT: Contains the result after a READ or the value to WRITE */ + uint32_t value; + /* IN: Contains extra infor for this operation */ + uint32_t info; + /*IN: param for msi-x */ + struct xen_msix_entry msix_entries[SH_INFO_MAX_VEC]; +}; + +struct xen_pci_sharedinfo { + /* flags - XEN_PCIF_* */ + uint32_t flags; + struct xen_pci_op op; +}; + +#endif /* __XEN_PCI_COMMON_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/protocols.h b/xen/public/io/protocols.h new file mode 100644 index 0000000..77bd1bd --- /dev/null +++ b/xen/public/io/protocols.h @@ -0,0 +1,40 @@ +/****************************************************************************** + * protocols.h + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#ifndef __XEN_PROTOCOLS_H__ +#define __XEN_PROTOCOLS_H__ + +#define XEN_IO_PROTO_ABI_X86_32 "x86_32-abi" +#define XEN_IO_PROTO_ABI_X86_64 "x86_64-abi" +#define XEN_IO_PROTO_ABI_IA64 "ia64-abi" + +#if defined(__i386__) +# define XEN_IO_PROTO_ABI_NATIVE XEN_IO_PROTO_ABI_X86_32 +#elif defined(__x86_64__) +# define XEN_IO_PROTO_ABI_NATIVE XEN_IO_PROTO_ABI_X86_64 +#elif defined(__ia64__) +# define XEN_IO_PROTO_ABI_NATIVE XEN_IO_PROTO_ABI_IA64 +#else +# error arch fixup needed here +#endif + +#endif diff --git a/xen/public/io/ring.h b/xen/public/io/ring.h new file mode 100644 index 0000000..6ce1d0d --- /dev/null +++ b/xen/public/io/ring.h @@ -0,0 +1,307 @@ +/****************************************************************************** + * ring.h + * + * Shared producer-consumer ring macros. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Tim Deegan and Andrew Warfield November 2004. + */ + +#ifndef __XEN_PUBLIC_IO_RING_H__ +#define __XEN_PUBLIC_IO_RING_H__ + +#include "../xen-compat.h" + +#if __XEN_INTERFACE_VERSION__ < 0x00030208 +#define xen_mb() mb() +#define xen_rmb() rmb() +#define xen_wmb() wmb() +#endif + +typedef unsigned int RING_IDX; + +/* Round a 32-bit unsigned constant down to the nearest power of two. */ +#define __RD2(_x) (((_x) & 0x00000002) ? 0x2 : ((_x) & 0x1)) +#define __RD4(_x) (((_x) & 0x0000000c) ? __RD2((_x)>>2)<<2 : __RD2(_x)) +#define __RD8(_x) (((_x) & 0x000000f0) ? __RD4((_x)>>4)<<4 : __RD4(_x)) +#define __RD16(_x) (((_x) & 0x0000ff00) ? __RD8((_x)>>8)<<8 : __RD8(_x)) +#define __RD32(_x) (((_x) & 0xffff0000) ? __RD16((_x)>>16)<<16 : __RD16(_x)) + +/* + * Calculate size of a shared ring, given the total available space for the + * ring and indexes (_sz), and the name tag of the request/response structure. + * A ring contains as many entries as will fit, rounded down to the nearest + * power of two (so we can mask with (size-1) to loop around). + */ +#define __RING_SIZE(_s, _sz) \ + (__RD32(((_sz) - (long)(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0]))) + +/* + * Macros to make the correct C datatypes for a new kind of ring. + * + * To make a new ring datatype, you need to have two message structures, + * let's say request_t, and response_t already defined. + * + * In a header where you want the ring datatype declared, you then do: + * + * DEFINE_RING_TYPES(mytag, request_t, response_t); + * + * These expand out to give you a set of types, as you can see below. + * The most important of these are: + * + * mytag_sring_t - The shared ring. + * mytag_front_ring_t - The 'front' half of the ring. + * mytag_back_ring_t - The 'back' half of the ring. + * + * To initialize a ring in your code you need to know the location and size + * of the shared memory area (PAGE_SIZE, for instance). To initialise + * the front half: + * + * mytag_front_ring_t front_ring; + * SHARED_RING_INIT((mytag_sring_t *)shared_page); + * FRONT_RING_INIT(&front_ring, (mytag_sring_t *)shared_page, PAGE_SIZE); + * + * Initializing the back follows similarly (note that only the front + * initializes the shared ring): + * + * mytag_back_ring_t back_ring; + * BACK_RING_INIT(&back_ring, (mytag_sring_t *)shared_page, PAGE_SIZE); + */ + +#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \ + \ +/* Shared ring entry */ \ +union __name##_sring_entry { \ + __req_t req; \ + __rsp_t rsp; \ +}; \ + \ +/* Shared ring page */ \ +struct __name##_sring { \ + RING_IDX req_prod, req_event; \ + RING_IDX rsp_prod, rsp_event; \ + uint8_t pad[48]; \ + union __name##_sring_entry ring[1]; /* variable-length */ \ +}; \ + \ +/* "Front" end's private variables */ \ +struct __name##_front_ring { \ + RING_IDX req_prod_pvt; \ + RING_IDX rsp_cons; \ + unsigned int nr_ents; \ + struct __name##_sring *sring; \ +}; \ + \ +/* "Back" end's private variables */ \ +struct __name##_back_ring { \ + RING_IDX rsp_prod_pvt; \ + RING_IDX req_cons; \ + unsigned int nr_ents; \ + struct __name##_sring *sring; \ +}; \ + \ +/* Syntactic sugar */ \ +typedef struct __name##_sring __name##_sring_t; \ +typedef struct __name##_front_ring __name##_front_ring_t; \ +typedef struct __name##_back_ring __name##_back_ring_t + +/* + * Macros for manipulating rings. + * + * FRONT_RING_whatever works on the "front end" of a ring: here + * requests are pushed on to the ring and responses taken off it. + * + * BACK_RING_whatever works on the "back end" of a ring: here + * requests are taken off the ring and responses put on. + * + * N.B. these macros do NO INTERLOCKS OR FLOW CONTROL. + * This is OK in 1-for-1 request-response situations where the + * requestor (front end) never has more than RING_SIZE()-1 + * outstanding requests. + */ + +/* Initialising empty rings */ +#define SHARED_RING_INIT(_s) do { \ + (_s)->req_prod = (_s)->rsp_prod = 0; \ + (_s)->req_event = (_s)->rsp_event = 1; \ + (void)memset((_s)->pad, 0, sizeof((_s)->pad)); \ +} while(0) + +#define FRONT_RING_INIT(_r, _s, __size) do { \ + (_r)->req_prod_pvt = 0; \ + (_r)->rsp_cons = 0; \ + (_r)->nr_ents = __RING_SIZE(_s, __size); \ + (_r)->sring = (_s); \ +} while (0) + +#define BACK_RING_INIT(_r, _s, __size) do { \ + (_r)->rsp_prod_pvt = 0; \ + (_r)->req_cons = 0; \ + (_r)->nr_ents = __RING_SIZE(_s, __size); \ + (_r)->sring = (_s); \ +} while (0) + +/* Initialize to existing shared indexes -- for recovery */ +#define FRONT_RING_ATTACH(_r, _s, __size) do { \ + (_r)->sring = (_s); \ + (_r)->req_prod_pvt = (_s)->req_prod; \ + (_r)->rsp_cons = (_s)->rsp_prod; \ + (_r)->nr_ents = __RING_SIZE(_s, __size); \ +} while (0) + +#define BACK_RING_ATTACH(_r, _s, __size) do { \ + (_r)->sring = (_s); \ + (_r)->rsp_prod_pvt = (_s)->rsp_prod; \ + (_r)->req_cons = (_s)->req_prod; \ + (_r)->nr_ents = __RING_SIZE(_s, __size); \ +} while (0) + +/* How big is this ring? */ +#define RING_SIZE(_r) \ + ((_r)->nr_ents) + +/* Number of free requests (for use on front side only). */ +#define RING_FREE_REQUESTS(_r) \ + (RING_SIZE(_r) - ((_r)->req_prod_pvt - (_r)->rsp_cons)) + +/* Test if there is an empty slot available on the front ring. + * (This is only meaningful from the front. ) + */ +#define RING_FULL(_r) \ + (RING_FREE_REQUESTS(_r) == 0) + +/* Test if there are outstanding messages to be processed on a ring. */ +#define RING_HAS_UNCONSUMED_RESPONSES(_r) \ + ((_r)->sring->rsp_prod - (_r)->rsp_cons) + +#ifdef __GNUC__ +#define RING_HAS_UNCONSUMED_REQUESTS(_r) ({ \ + unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \ + unsigned int rsp = RING_SIZE(_r) - \ + ((_r)->req_cons - (_r)->rsp_prod_pvt); \ + req < rsp ? req : rsp; \ +}) +#else +/* Same as above, but without the nice GCC ({ ... }) syntax. */ +#define RING_HAS_UNCONSUMED_REQUESTS(_r) \ + ((((_r)->sring->req_prod - (_r)->req_cons) < \ + (RING_SIZE(_r) - ((_r)->req_cons - (_r)->rsp_prod_pvt))) ? \ + ((_r)->sring->req_prod - (_r)->req_cons) : \ + (RING_SIZE(_r) - ((_r)->req_cons - (_r)->rsp_prod_pvt))) +#endif + +/* Direct access to individual ring elements, by index. */ +#define RING_GET_REQUEST(_r, _idx) \ + (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req)) + +#define RING_GET_RESPONSE(_r, _idx) \ + (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp)) + +/* Loop termination condition: Would the specified index overflow the ring? */ +#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \ + (((_cons) - (_r)->rsp_prod_pvt) >= RING_SIZE(_r)) + +#define RING_PUSH_REQUESTS(_r) do { \ + xen_wmb(); /* back sees requests /before/ updated producer index */ \ + (_r)->sring->req_prod = (_r)->req_prod_pvt; \ +} while (0) + +#define RING_PUSH_RESPONSES(_r) do { \ + xen_wmb(); /* front sees resps /before/ updated producer index */ \ + (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \ +} while (0) + +/* + * Notification hold-off (req_event and rsp_event): + * + * When queueing requests or responses on a shared ring, it may not always be + * necessary to notify the remote end. For example, if requests are in flight + * in a backend, the front may be able to queue further requests without + * notifying the back (if the back checks for new requests when it queues + * responses). + * + * When enqueuing requests or responses: + * + * Use RING_PUSH_{REQUESTS,RESPONSES}_AND_CHECK_NOTIFY(). The second argument + * is a boolean return value. True indicates that the receiver requires an + * asynchronous notification. + * + * After dequeuing requests or responses (before sleeping the connection): + * + * Use RING_FINAL_CHECK_FOR_REQUESTS() or RING_FINAL_CHECK_FOR_RESPONSES(). + * The second argument is a boolean return value. True indicates that there + * are pending messages on the ring (i.e., the connection should not be put + * to sleep). + * + * These macros will set the req_event/rsp_event field to trigger a + * notification on the very next message that is enqueued. If you want to + * create batches of work (i.e., only receive a notification after several + * messages have been enqueued) then you will need to create a customised + * version of the FINAL_CHECK macro in your own code, which sets the event + * field appropriately. + */ + +#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \ + RING_IDX __old = (_r)->sring->req_prod; \ + RING_IDX __new = (_r)->req_prod_pvt; \ + xen_wmb(); /* back sees requests /before/ updated producer index */ \ + (_r)->sring->req_prod = __new; \ + xen_mb(); /* back sees new requests /before/ we check req_event */ \ + (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \ + (RING_IDX)(__new - __old)); \ +} while (0) + +#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \ + RING_IDX __old = (_r)->sring->rsp_prod; \ + RING_IDX __new = (_r)->rsp_prod_pvt; \ + xen_wmb(); /* front sees resps /before/ updated producer index */ \ + (_r)->sring->rsp_prod = __new; \ + xen_mb(); /* front sees new resps /before/ we check rsp_event */ \ + (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \ + (RING_IDX)(__new - __old)); \ +} while (0) + +#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \ + (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ + if (_work_to_do) break; \ + (_r)->sring->req_event = (_r)->req_cons + 1; \ + xen_mb(); \ + (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ +} while (0) + +#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \ + (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ + if (_work_to_do) break; \ + (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \ + xen_mb(); \ + (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ +} while (0) + +#endif /* __XEN_PUBLIC_IO_RING_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/tpmif.h b/xen/public/io/tpmif.h new file mode 100644 index 0000000..02ccdab --- /dev/null +++ b/xen/public/io/tpmif.h @@ -0,0 +1,77 @@ +/****************************************************************************** + * tpmif.h + * + * TPM I/O interface for Xen guest OSes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, IBM Corporation + * + * Author: Stefan Berger, stefanb@us.ibm.com + * Grant table support: Mahadevan Gomathisankaran + * + * This code has been derived from tools/libxc/xen/io/netif.h + * + * Copyright (c) 2003-2004, Keir Fraser + */ + +#ifndef __XEN_PUBLIC_IO_TPMIF_H__ +#define __XEN_PUBLIC_IO_TPMIF_H__ + +#include "../grant_table.h" + +struct tpmif_tx_request { + unsigned long addr; /* Machine address of packet. */ + grant_ref_t ref; /* grant table access reference */ + uint16_t unused; + uint16_t size; /* Packet size in bytes. */ +}; +typedef struct tpmif_tx_request tpmif_tx_request_t; + +/* + * The TPMIF_TX_RING_SIZE defines the number of pages the + * front-end and backend can exchange (= size of array). + */ +typedef uint32_t TPMIF_RING_IDX; + +#define TPMIF_TX_RING_SIZE 1 + +/* This structure must fit in a memory page. */ + +struct tpmif_ring { + struct tpmif_tx_request req; +}; +typedef struct tpmif_ring tpmif_ring_t; + +struct tpmif_tx_interface { + struct tpmif_ring ring[TPMIF_TX_RING_SIZE]; +}; +typedef struct tpmif_tx_interface tpmif_tx_interface_t; + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/xenbus.h b/xen/public/io/xenbus.h new file mode 100644 index 0000000..4a053df --- /dev/null +++ b/xen/public/io/xenbus.h @@ -0,0 +1,80 @@ +/***************************************************************************** + * xenbus.h + * + * Xenbus protocol details. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) 2005 XenSource Ltd. + */ + +#ifndef _XEN_PUBLIC_IO_XENBUS_H +#define _XEN_PUBLIC_IO_XENBUS_H + +/* + * The state of either end of the Xenbus, i.e. the current communication + * status of initialisation across the bus. States here imply nothing about + * the state of the connection between the driver and the kernel's device + * layers. + */ +enum xenbus_state { + XenbusStateUnknown = 0, + + XenbusStateInitialising = 1, + + /* + * InitWait: Finished early initialisation but waiting for information + * from the peer or hotplug scripts. + */ + XenbusStateInitWait = 2, + + /* + * Initialised: Waiting for a connection from the peer. + */ + XenbusStateInitialised = 3, + + XenbusStateConnected = 4, + + /* + * Closing: The device is being closed due to an error or an unplug event. + */ + XenbusStateClosing = 5, + + XenbusStateClosed = 6, + + /* + * Reconfiguring: The device is being reconfigured. + */ + XenbusStateReconfiguring = 7, + + XenbusStateReconfigured = 8 +}; +typedef enum xenbus_state XenbusState; + +#endif /* _XEN_PUBLIC_IO_XENBUS_H */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/io/xs_wire.h b/xen/public/io/xs_wire.h new file mode 100644 index 0000000..dd2d966 --- /dev/null +++ b/xen/public/io/xs_wire.h @@ -0,0 +1,132 @@ +/* + * Details of the "wire" protocol between Xen Store Daemon and client + * library or guest kernel. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) 2005 Rusty Russell IBM Corporation + */ + +#ifndef _XS_WIRE_H +#define _XS_WIRE_H + +enum xsd_sockmsg_type +{ + XS_DEBUG, + XS_DIRECTORY, + XS_READ, + XS_GET_PERMS, + XS_WATCH, + XS_UNWATCH, + XS_TRANSACTION_START, + XS_TRANSACTION_END, + XS_INTRODUCE, + XS_RELEASE, + XS_GET_DOMAIN_PATH, + XS_WRITE, + XS_MKDIR, + XS_RM, + XS_SET_PERMS, + XS_WATCH_EVENT, + XS_ERROR, + XS_IS_DOMAIN_INTRODUCED, + XS_RESUME, + XS_SET_TARGET +}; + +#define XS_WRITE_NONE "NONE" +#define XS_WRITE_CREATE "CREATE" +#define XS_WRITE_CREATE_EXCL "CREATE|EXCL" + +#ifdef linux_specific +/* We hand errors as strings, for portability. */ +struct xsd_errors +{ + int errnum; + const char *errstring; +}; +#define XSD_ERROR(x) { x, #x } +/* LINTED: static unused */ +static struct xsd_errors xsd_errors[] +#if defined(__GNUC__) +__attribute__((unused)) +#endif + = { + XSD_ERROR(EINVAL), + XSD_ERROR(EACCES), + XSD_ERROR(EEXIST), + XSD_ERROR(EISDIR), + XSD_ERROR(ENOENT), + XSD_ERROR(ENOMEM), + XSD_ERROR(ENOSPC), + XSD_ERROR(EIO), + XSD_ERROR(ENOTEMPTY), + XSD_ERROR(ENOSYS), + XSD_ERROR(EROFS), + XSD_ERROR(EBUSY), + XSD_ERROR(EAGAIN), + XSD_ERROR(EISCONN) +}; +#endif + +struct xsd_sockmsg +{ + uint32_t type; /* XS_??? */ + uint32_t req_id;/* Request identifier, echoed in daemon's response. */ + uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */ + uint32_t len; /* Length of data following this. */ + + /* Generally followed by nul-terminated string(s). */ +}; + +enum xs_watch_type +{ + XS_WATCH_PATH = 0, + XS_WATCH_TOKEN +}; + +/* Inter-domain shared memory communications. */ +#define XENSTORE_RING_SIZE 1024 +typedef uint32_t XENSTORE_RING_IDX; +#define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1)) +struct xenstore_domain_interface { + char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */ + char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */ + XENSTORE_RING_IDX req_cons, req_prod; + XENSTORE_RING_IDX rsp_cons, rsp_prod; +}; + +/* Violating this is very bad. See docs/misc/xenstore.txt. */ +#define XENSTORE_PAYLOAD_MAX 4096 + +/* Violating these just gets you an error back */ +#define XENSTORE_ABS_PATH_MAX 3072 +#define XENSTORE_REL_PATH_MAX 2048 + +#endif /* _XS_WIRE_H */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/kexec.h b/xen/public/kexec.h new file mode 100644 index 0000000..fc19f2f --- /dev/null +++ b/xen/public/kexec.h @@ -0,0 +1,189 @@ +/****************************************************************************** + * kexec.h - Public portion + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Xen port written by: + * - Simon 'Horms' Horman <horms@verge.net.au> + * - Magnus Damm <magnus@valinux.co.jp> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + + +/* This file describes the Kexec / Kdump hypercall interface for Xen. + * + * Kexec under vanilla Linux allows a user to reboot the physical machine + * into a new user-specified kernel. The Xen port extends this idea + * to allow rebooting of the machine from dom0. When kexec for dom0 + * is used to reboot, both the hypervisor and the domains get replaced + * with some other kernel. It is possible to kexec between vanilla + * Linux and Xen and back again. Xen to Xen works well too. + * + * The hypercall interface for kexec can be divided into three main + * types of hypercall operations: + * + * 1) Range information: + * This is used by the dom0 kernel to ask the hypervisor about various + * address information. This information is needed to allow kexec-tools + * to fill in the ELF headers for /proc/vmcore properly. + * + * 2) Load and unload of images: + * There are no big surprises here, the kexec binary from kexec-tools + * runs in userspace in dom0. The tool loads/unloads data into the + * dom0 kernel such as new kernel, initramfs and hypervisor. When + * loaded the dom0 kernel performs a load hypercall operation, and + * before releasing all page references the dom0 kernel calls unload. + * + * 3) Kexec operation: + * This is used to start a previously loaded kernel. + */ + +#include "xen.h" + +#if defined(__i386__) || defined(__x86_64__) +#define KEXEC_XEN_NO_PAGES 17 +#endif + +/* + * Prototype for this hypercall is: + * int kexec_op(int cmd, void *args) + * @cmd == KEXEC_CMD_... + * KEXEC operation to perform + * @args == Operation-specific extra arguments (NULL if none). + */ + +/* + * Kexec supports two types of operation: + * - kexec into a regular kernel, very similar to a standard reboot + * - KEXEC_TYPE_DEFAULT is used to specify this type + * - kexec into a special "crash kernel", aka kexec-on-panic + * - KEXEC_TYPE_CRASH is used to specify this type + * - parts of our system may be broken at kexec-on-panic time + * - the code should be kept as simple and self-contained as possible + */ + +#define KEXEC_TYPE_DEFAULT 0 +#define KEXEC_TYPE_CRASH 1 + + +/* The kexec implementation for Xen allows the user to load two + * types of kernels, KEXEC_TYPE_DEFAULT and KEXEC_TYPE_CRASH. + * All data needed for a kexec reboot is kept in one xen_kexec_image_t + * per "instance". The data mainly consists of machine address lists to pages + * together with destination addresses. The data in xen_kexec_image_t + * is passed to the "code page" which is one page of code that performs + * the final relocations before jumping to the new kernel. + */ + +typedef struct xen_kexec_image { +#if defined(__i386__) || defined(__x86_64__) + unsigned long page_list[KEXEC_XEN_NO_PAGES]; +#endif +#if defined(__ia64__) + unsigned long reboot_code_buffer; +#endif + unsigned long indirection_page; + unsigned long start_address; +} xen_kexec_image_t; + +/* + * Perform kexec having previously loaded a kexec or kdump kernel + * as appropriate. + * type == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in] + */ +#define KEXEC_CMD_kexec 0 +typedef struct xen_kexec_exec { + int type; +} xen_kexec_exec_t; + +/* + * Load/Unload kernel image for kexec or kdump. + * type == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in] + * image == relocation information for kexec (ignored for unload) [in] + */ +#define KEXEC_CMD_kexec_load 1 +#define KEXEC_CMD_kexec_unload 2 +typedef struct xen_kexec_load { + int type; + xen_kexec_image_t image; +} xen_kexec_load_t; + +#define KEXEC_RANGE_MA_CRASH 0 /* machine address and size of crash area */ +#define KEXEC_RANGE_MA_XEN 1 /* machine address and size of Xen itself */ +#define KEXEC_RANGE_MA_CPU 2 /* machine address and size of a CPU note */ +#define KEXEC_RANGE_MA_XENHEAP 3 /* machine address and size of xenheap + * Note that although this is adjacent + * to Xen it exists in a separate EFI + * region on ia64, and thus needs to be + * inserted into iomem_machine separately */ +#define KEXEC_RANGE_MA_BOOT_PARAM 4 /* machine address and size of + * the ia64_boot_param */ +#define KEXEC_RANGE_MA_EFI_MEMMAP 5 /* machine address and size of + * of the EFI Memory Map */ +#define KEXEC_RANGE_MA_VMCOREINFO 6 /* machine address and size of vmcoreinfo */ + +/* + * Find the address and size of certain memory areas + * range == KEXEC_RANGE_... [in] + * nr == physical CPU number (starting from 0) if KEXEC_RANGE_MA_CPU [in] + * size == number of bytes reserved in window [out] + * start == address of the first byte in the window [out] + */ +#define KEXEC_CMD_kexec_get_range 3 +typedef struct xen_kexec_range { + int range; + int nr; + unsigned long size; + unsigned long start; +} xen_kexec_range_t; + +/* vmcoreinfo stuff */ +#define VMCOREINFO_BYTES (4096) +#define VMCOREINFO_NOTE_NAME "VMCOREINFO_XEN" +void arch_crash_save_vmcoreinfo(void); +void vmcoreinfo_append_str(const char *fmt, ...) + __attribute__ ((format (printf, 1, 2))); +#define VMCOREINFO_PAGESIZE(value) \ + vmcoreinfo_append_str("PAGESIZE=%ld\n", value) +#define VMCOREINFO_SYMBOL(name) \ + vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_SYMBOL_ALIAS(alias, name) \ + vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #alias, (unsigned long)&name) +#define VMCOREINFO_STRUCT_SIZE(name) \ + vmcoreinfo_append_str("SIZE(%s)=%zu\n", #name, sizeof(struct name)) +#define VMCOREINFO_OFFSET(name, field) \ + vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \ + (unsigned long)offsetof(struct name, field)) +#define VMCOREINFO_OFFSET_ALIAS(name, field, alias) \ + vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #alias, \ + (unsigned long)offsetof(struct name, field)) + +#endif /* _XEN_PUBLIC_KEXEC_H */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/libelf.h b/xen/public/libelf.h new file mode 100644 index 0000000..d238330 --- /dev/null +++ b/xen/public/libelf.h @@ -0,0 +1,265 @@ +/****************************************************************************** + * libelf.h + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#ifndef __XC_LIBELF__ +#define __XC_LIBELF__ 1 + +#if defined(__i386__) || defined(__x86_64__) || defined(__ia64__) +#define XEN_ELF_LITTLE_ENDIAN +#else +#error define architectural endianness +#endif + +#undef ELFSIZE +#include "elfnote.h" +#include "elfstructs.h" +#include "features.h" + +/* ------------------------------------------------------------------------ */ + +typedef union { + Elf32_Ehdr e32; + Elf64_Ehdr e64; +} elf_ehdr; + +typedef union { + Elf32_Phdr e32; + Elf64_Phdr e64; +} elf_phdr; + +typedef union { + Elf32_Shdr e32; + Elf64_Shdr e64; +} elf_shdr; + +typedef union { + Elf32_Sym e32; + Elf64_Sym e64; +} elf_sym; + +typedef union { + Elf32_Rel e32; + Elf64_Rel e64; +} elf_rel; + +typedef union { + Elf32_Rela e32; + Elf64_Rela e64; +} elf_rela; + +typedef union { + Elf32_Note e32; + Elf64_Note e64; +} elf_note; + +struct elf_binary { + /* elf binary */ + const char *image; + size_t size; + char class; + char data; + + const elf_ehdr *ehdr; + const char *sec_strtab; + const elf_shdr *sym_tab; + const char *sym_strtab; + + /* loaded to */ + char *dest; + uint64_t pstart; + uint64_t pend; + uint64_t reloc_offset; + + uint64_t bsd_symtab_pstart; + uint64_t bsd_symtab_pend; + +#ifndef __XEN__ + /* misc */ + FILE *log; +#endif + int verbose; +}; + +/* ------------------------------------------------------------------------ */ +/* accessing elf header fields */ + +#ifdef XEN_ELF_BIG_ENDIAN +# define NATIVE_ELFDATA ELFDATA2MSB +#else +# define NATIVE_ELFDATA ELFDATA2LSB +#endif + +#define elf_32bit(elf) (ELFCLASS32 == (elf)->class) +#define elf_64bit(elf) (ELFCLASS64 == (elf)->class) +#define elf_msb(elf) (ELFDATA2MSB == (elf)->data) +#define elf_lsb(elf) (ELFDATA2LSB == (elf)->data) +#define elf_swap(elf) (NATIVE_ELFDATA != (elf)->data) + +#define elf_uval(elf, str, elem) \ + ((ELFCLASS64 == (elf)->class) \ + ? elf_access_unsigned((elf), (str), \ + offsetof(typeof(*(str)),e64.elem), \ + sizeof((str)->e64.elem)) \ + : elf_access_unsigned((elf), (str), \ + offsetof(typeof(*(str)),e32.elem), \ + sizeof((str)->e32.elem))) + +#define elf_sval(elf, str, elem) \ + ((ELFCLASS64 == (elf)->class) \ + ? elf_access_signed((elf), (str), \ + offsetof(typeof(*(str)),e64.elem), \ + sizeof((str)->e64.elem)) \ + : elf_access_signed((elf), (str), \ + offsetof(typeof(*(str)),e32.elem), \ + sizeof((str)->e32.elem))) + +#define elf_size(elf, str) \ + ((ELFCLASS64 == (elf)->class) \ + ? sizeof((str)->e64) : sizeof((str)->e32)) + +uint64_t elf_access_unsigned(struct elf_binary *elf, const void *ptr, + uint64_t offset, size_t size); +int64_t elf_access_signed(struct elf_binary *elf, const void *ptr, + uint64_t offset, size_t size); + +uint64_t elf_round_up(struct elf_binary *elf, uint64_t addr); + +/* ------------------------------------------------------------------------ */ +/* xc_libelf_tools.c */ + +int elf_shdr_count(struct elf_binary *elf); +int elf_phdr_count(struct elf_binary *elf); + +const elf_shdr *elf_shdr_by_name(struct elf_binary *elf, const char *name); +const elf_shdr *elf_shdr_by_index(struct elf_binary *elf, int index); +const elf_phdr *elf_phdr_by_index(struct elf_binary *elf, int index); + +const char *elf_section_name(struct elf_binary *elf, const elf_shdr * shdr); +const void *elf_section_start(struct elf_binary *elf, const elf_shdr * shdr); +const void *elf_section_end(struct elf_binary *elf, const elf_shdr * shdr); + +const void *elf_segment_start(struct elf_binary *elf, const elf_phdr * phdr); +const void *elf_segment_end(struct elf_binary *elf, const elf_phdr * phdr); + +const elf_sym *elf_sym_by_name(struct elf_binary *elf, const char *symbol); +const elf_sym *elf_sym_by_index(struct elf_binary *elf, int index); + +const char *elf_note_name(struct elf_binary *elf, const elf_note * note); +const void *elf_note_desc(struct elf_binary *elf, const elf_note * note); +uint64_t elf_note_numeric(struct elf_binary *elf, const elf_note * note); +const elf_note *elf_note_next(struct elf_binary *elf, const elf_note * note); + +int elf_is_elfbinary(const void *image); +int elf_phdr_is_loadable(struct elf_binary *elf, const elf_phdr * phdr); + +/* ------------------------------------------------------------------------ */ +/* xc_libelf_loader.c */ + +int elf_init(struct elf_binary *elf, const char *image, size_t size); +#ifdef __XEN__ +void elf_set_verbose(struct elf_binary *elf); +#else +void elf_set_logfile(struct elf_binary *elf, FILE * log, int verbose); +#endif + +void elf_parse_binary(struct elf_binary *elf); +void elf_load_binary(struct elf_binary *elf); + +void *elf_get_ptr(struct elf_binary *elf, unsigned long addr); +uint64_t elf_lookup_addr(struct elf_binary *elf, const char *symbol); + +void elf_parse_bsdsyms(struct elf_binary *elf, uint64_t pstart); /* private */ + +/* ------------------------------------------------------------------------ */ +/* xc_libelf_relocate.c */ + +int elf_reloc(struct elf_binary *elf); + +/* ------------------------------------------------------------------------ */ +/* xc_libelf_dominfo.c */ + +#define UNSET_ADDR ((uint64_t)-1) + +enum xen_elfnote_type { + XEN_ENT_NONE = 0, + XEN_ENT_LONG = 1, + XEN_ENT_STR = 2 +}; + +struct xen_elfnote { + enum xen_elfnote_type type; + const char *name; + union { + const char *str; + uint64_t num; + } data; +}; + +struct elf_dom_parms { + /* raw */ + const char *guest_info; + const void *elf_note_start; + const void *elf_note_end; + struct xen_elfnote elf_notes[XEN_ELFNOTE_MAX + 1]; + + /* parsed */ + char guest_os[16]; + char guest_ver[16]; + char xen_ver[16]; + char loader[16]; + int pae; + int bsd_symtab; + uint64_t virt_base; + uint64_t virt_entry; + uint64_t virt_hypercall; + uint64_t virt_hv_start_low; + uint64_t elf_paddr_offset; + uint32_t f_supported[XENFEAT_NR_SUBMAPS]; + uint32_t f_required[XENFEAT_NR_SUBMAPS]; + + /* calculated */ + uint64_t virt_offset; + uint64_t virt_kstart; + uint64_t virt_kend; +}; + +static inline void elf_xen_feature_set(int nr, uint32_t * addr) +{ + addr[nr >> 5] |= 1 << (nr & 31); +} +static inline int elf_xen_feature_get(int nr, uint32_t * addr) +{ + return !!(addr[nr >> 5] & (1 << (nr & 31))); +} + +int elf_xen_parse_features(const char *features, + uint32_t *supported, + uint32_t *required); +int elf_xen_parse_note(struct elf_binary *elf, + struct elf_dom_parms *parms, + const elf_note *note); +int elf_xen_parse_guest_info(struct elf_binary *elf, + struct elf_dom_parms *parms); +int elf_xen_parse(struct elf_binary *elf, + struct elf_dom_parms *parms); + +#endif /* __XC_LIBELF__ */ diff --git a/xen/public/memory.h b/xen/public/memory.h new file mode 100644 index 0000000..d7b9fff --- /dev/null +++ b/xen/public/memory.h @@ -0,0 +1,312 @@ +/****************************************************************************** + * memory.h + * + * Memory reservation and information. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, Keir Fraser <keir@xensource.com> + */ + +#ifndef __XEN_PUBLIC_MEMORY_H__ +#define __XEN_PUBLIC_MEMORY_H__ + +/* + * Increase or decrease the specified domain's memory reservation. Returns the + * number of extents successfully allocated or freed. + * arg == addr of struct xen_memory_reservation. + */ +#define XENMEM_increase_reservation 0 +#define XENMEM_decrease_reservation 1 +#define XENMEM_populate_physmap 6 + +#if __XEN_INTERFACE_VERSION__ >= 0x00030209 +/* + * Maximum # bits addressable by the user of the allocated region (e.g., I/O + * devices often have a 32-bit limitation even in 64-bit systems). If zero + * then the user has no addressing restriction. This field is not used by + * XENMEM_decrease_reservation. + */ +#define XENMEMF_address_bits(x) (x) +#define XENMEMF_get_address_bits(x) ((x) & 0xffu) +/* NUMA node to allocate from. */ +#define XENMEMF_node(x) (((x) + 1) << 8) +#define XENMEMF_get_node(x) ((((x) >> 8) - 1) & 0xffu) +#endif + +struct xen_memory_reservation { + + /* + * XENMEM_increase_reservation: + * OUT: MFN (*not* GMFN) bases of extents that were allocated + * XENMEM_decrease_reservation: + * IN: GMFN bases of extents to free + * XENMEM_populate_physmap: + * IN: GPFN bases of extents to populate with memory + * OUT: GMFN bases of extents that were allocated + * (NB. This command also updates the mach_to_phys translation table) + */ + XEN_GUEST_HANDLE(xen_pfn_t) extent_start; + + /* Number of extents, and size/alignment of each (2^extent_order pages). */ + xen_ulong_t nr_extents; + unsigned int extent_order; + +#if __XEN_INTERFACE_VERSION__ >= 0x00030209 + /* XENMEMF flags. */ + unsigned int mem_flags; +#else + unsigned int address_bits; +#endif + + /* + * Domain whose reservation is being changed. + * Unprivileged domains can specify only DOMID_SELF. + */ + domid_t domid; +}; +typedef struct xen_memory_reservation xen_memory_reservation_t; +DEFINE_XEN_GUEST_HANDLE(xen_memory_reservation_t); + +/* + * An atomic exchange of memory pages. If return code is zero then + * @out.extent_list provides GMFNs of the newly-allocated memory. + * Returns zero on complete success, otherwise a negative error code. + * On complete success then always @nr_exchanged == @in.nr_extents. + * On partial success @nr_exchanged indicates how much work was done. + */ +#define XENMEM_exchange 11 +struct xen_memory_exchange { + /* + * [IN] Details of memory extents to be exchanged (GMFN bases). + * Note that @in.address_bits is ignored and unused. + */ + struct xen_memory_reservation in; + + /* + * [IN/OUT] Details of new memory extents. + * We require that: + * 1. @in.domid == @out.domid + * 2. @in.nr_extents << @in.extent_order == + * @out.nr_extents << @out.extent_order + * 3. @in.extent_start and @out.extent_start lists must not overlap + * 4. @out.extent_start lists GPFN bases to be populated + * 5. @out.extent_start is overwritten with allocated GMFN bases + */ + struct xen_memory_reservation out; + + /* + * [OUT] Number of input extents that were successfully exchanged: + * 1. The first @nr_exchanged input extents were successfully + * deallocated. + * 2. The corresponding first entries in the output extent list correctly + * indicate the GMFNs that were successfully exchanged. + * 3. All other input and output extents are untouched. + * 4. If not all input exents are exchanged then the return code of this + * command will be non-zero. + * 5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER! + */ + xen_ulong_t nr_exchanged; +}; +typedef struct xen_memory_exchange xen_memory_exchange_t; +DEFINE_XEN_GUEST_HANDLE(xen_memory_exchange_t); + +/* + * Returns the maximum machine frame number of mapped RAM in this system. + * This command always succeeds (it never returns an error code). + * arg == NULL. + */ +#define XENMEM_maximum_ram_page 2 + +/* + * Returns the current or maximum memory reservation, in pages, of the + * specified domain (may be DOMID_SELF). Returns -ve errcode on failure. + * arg == addr of domid_t. + */ +#define XENMEM_current_reservation 3 +#define XENMEM_maximum_reservation 4 + +/* + * Returns the maximum GPFN in use by the guest, or -ve errcode on failure. + */ +#define XENMEM_maximum_gpfn 14 + +/* + * Returns a list of MFN bases of 2MB extents comprising the machine_to_phys + * mapping table. Architectures which do not have a m2p table do not implement + * this command. + * arg == addr of xen_machphys_mfn_list_t. + */ +#define XENMEM_machphys_mfn_list 5 +struct xen_machphys_mfn_list { + /* + * Size of the 'extent_start' array. Fewer entries will be filled if the + * machphys table is smaller than max_extents * 2MB. + */ + unsigned int max_extents; + + /* + * Pointer to buffer to fill with list of extent starts. If there are + * any large discontiguities in the machine address space, 2MB gaps in + * the machphys table will be represented by an MFN base of zero. + */ + XEN_GUEST_HANDLE(xen_pfn_t) extent_start; + + /* + * Number of extents written to the above array. This will be smaller + * than 'max_extents' if the machphys table is smaller than max_e * 2MB. + */ + unsigned int nr_extents; +}; +typedef struct xen_machphys_mfn_list xen_machphys_mfn_list_t; +DEFINE_XEN_GUEST_HANDLE(xen_machphys_mfn_list_t); + +/* + * Returns the location in virtual address space of the machine_to_phys + * mapping table. Architectures which do not have a m2p table, or which do not + * map it by default into guest address space, do not implement this command. + * arg == addr of xen_machphys_mapping_t. + */ +#define XENMEM_machphys_mapping 12 +struct xen_machphys_mapping { + xen_ulong_t v_start, v_end; /* Start and end virtual addresses. */ + xen_ulong_t max_mfn; /* Maximum MFN that can be looked up. */ +}; +typedef struct xen_machphys_mapping xen_machphys_mapping_t; +DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t); + +/* + * Sets the GPFN at which a particular page appears in the specified guest's + * pseudophysical address space. + * arg == addr of xen_add_to_physmap_t. + */ +#define XENMEM_add_to_physmap 7 +struct xen_add_to_physmap { + /* Which domain to change the mapping for. */ + domid_t domid; + + /* Source mapping space. */ +#define XENMAPSPACE_shared_info 0 /* shared info page */ +#define XENMAPSPACE_grant_table 1 /* grant table page */ +#define XENMAPSPACE_mfn 2 /* usual MFN */ + unsigned int space; + + /* Index into source mapping space. */ + xen_ulong_t idx; + + /* GPFN where the source mapping page should appear. */ + xen_pfn_t gpfn; +}; +typedef struct xen_add_to_physmap xen_add_to_physmap_t; +DEFINE_XEN_GUEST_HANDLE(xen_add_to_physmap_t); + +/* + * Unmaps the page appearing at a particular GPFN from the specified guest's + * pseudophysical address space. + * arg == addr of xen_remove_from_physmap_t. + */ +#define XENMEM_remove_from_physmap 15 +struct xen_remove_from_physmap { + /* Which domain to change the mapping for. */ + domid_t domid; + + /* GPFN of the current mapping of the page. */ + xen_pfn_t gpfn; +}; +typedef struct xen_remove_from_physmap xen_remove_from_physmap_t; +DEFINE_XEN_GUEST_HANDLE(xen_remove_from_physmap_t); + +/* + * Translates a list of domain-specific GPFNs into MFNs. Returns a -ve error + * code on failure. This call only works for auto-translated guests. + */ +#define XENMEM_translate_gpfn_list 8 +struct xen_translate_gpfn_list { + /* Which domain to translate for? */ + domid_t domid; + + /* Length of list. */ + xen_ulong_t nr_gpfns; + + /* List of GPFNs to translate. */ + XEN_GUEST_HANDLE(xen_pfn_t) gpfn_list; + + /* + * Output list to contain MFN translations. May be the same as the input + * list (in which case each input GPFN is overwritten with the output MFN). + */ + XEN_GUEST_HANDLE(xen_pfn_t) mfn_list; +}; +typedef struct xen_translate_gpfn_list xen_translate_gpfn_list_t; +DEFINE_XEN_GUEST_HANDLE(xen_translate_gpfn_list_t); + +/* + * Returns the pseudo-physical memory map as it was when the domain + * was started (specified by XENMEM_set_memory_map). + * arg == addr of xen_memory_map_t. + */ +#define XENMEM_memory_map 9 +struct xen_memory_map { + /* + * On call the number of entries which can be stored in buffer. On + * return the number of entries which have been stored in + * buffer. + */ + unsigned int nr_entries; + + /* + * Entries in the buffer are in the same format as returned by the + * BIOS INT 0x15 EAX=0xE820 call. + */ + XEN_GUEST_HANDLE(void) buffer; +}; +typedef struct xen_memory_map xen_memory_map_t; +DEFINE_XEN_GUEST_HANDLE(xen_memory_map_t); + +/* + * Returns the real physical memory map. Passes the same structure as + * XENMEM_memory_map. + * arg == addr of xen_memory_map_t. + */ +#define XENMEM_machine_memory_map 10 + +/* + * Set the pseudo-physical memory map of a domain, as returned by + * XENMEM_memory_map. + * arg == addr of xen_foreign_memory_map_t. + */ +#define XENMEM_set_memory_map 13 +struct xen_foreign_memory_map { + domid_t domid; + struct xen_memory_map map; +}; +typedef struct xen_foreign_memory_map xen_foreign_memory_map_t; +DEFINE_XEN_GUEST_HANDLE(xen_foreign_memory_map_t); + +#endif /* __XEN_PUBLIC_MEMORY_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/nmi.h b/xen/public/nmi.h new file mode 100644 index 0000000..b2b8401 --- /dev/null +++ b/xen/public/nmi.h @@ -0,0 +1,78 @@ +/****************************************************************************** + * nmi.h + * + * NMI callback registration and reason codes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, Keir Fraser <keir@xensource.com> + */ + +#ifndef __XEN_PUBLIC_NMI_H__ +#define __XEN_PUBLIC_NMI_H__ + +/* + * NMI reason codes: + * Currently these are x86-specific, stored in arch_shared_info.nmi_reason. + */ + /* I/O-check error reported via ISA port 0x61, bit 6. */ +#define _XEN_NMIREASON_io_error 0 +#define XEN_NMIREASON_io_error (1UL << _XEN_NMIREASON_io_error) + /* Parity error reported via ISA port 0x61, bit 7. */ +#define _XEN_NMIREASON_parity_error 1 +#define XEN_NMIREASON_parity_error (1UL << _XEN_NMIREASON_parity_error) + /* Unknown hardware-generated NMI. */ +#define _XEN_NMIREASON_unknown 2 +#define XEN_NMIREASON_unknown (1UL << _XEN_NMIREASON_unknown) + +/* + * long nmi_op(unsigned int cmd, void *arg) + * NB. All ops return zero on success, else a negative error code. + */ + +/* + * Register NMI callback for this (calling) VCPU. Currently this only makes + * sense for domain 0, vcpu 0. All other callers will be returned EINVAL. + * arg == pointer to xennmi_callback structure. + */ +#define XENNMI_register_callback 0 +struct xennmi_callback { + unsigned long handler_address; + unsigned long pad; +}; +typedef struct xennmi_callback xennmi_callback_t; +DEFINE_XEN_GUEST_HANDLE(xennmi_callback_t); + +/* + * Deregister NMI callback for this (calling) VCPU. + * arg == NULL. + */ +#define XENNMI_unregister_callback 1 + +#endif /* __XEN_PUBLIC_NMI_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/physdev.h b/xen/public/physdev.h new file mode 100644 index 0000000..8057277 --- /dev/null +++ b/xen/public/physdev.h @@ -0,0 +1,219 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#ifndef __XEN_PUBLIC_PHYSDEV_H__ +#define __XEN_PUBLIC_PHYSDEV_H__ + +/* + * Prototype for this hypercall is: + * int physdev_op(int cmd, void *args) + * @cmd == PHYSDEVOP_??? (physdev operation). + * @args == Operation-specific extra arguments (NULL if none). + */ + +/* + * Notify end-of-interrupt (EOI) for the specified IRQ. + * @arg == pointer to physdev_eoi structure. + */ +#define PHYSDEVOP_eoi 12 +struct physdev_eoi { + /* IN */ + uint32_t irq; +}; +typedef struct physdev_eoi physdev_eoi_t; +DEFINE_XEN_GUEST_HANDLE(physdev_eoi_t); + +/* + * Query the status of an IRQ line. + * @arg == pointer to physdev_irq_status_query structure. + */ +#define PHYSDEVOP_irq_status_query 5 +struct physdev_irq_status_query { + /* IN */ + uint32_t irq; + /* OUT */ + uint32_t flags; /* XENIRQSTAT_* */ +}; +typedef struct physdev_irq_status_query physdev_irq_status_query_t; +DEFINE_XEN_GUEST_HANDLE(physdev_irq_status_query_t); + +/* Need to call PHYSDEVOP_eoi when the IRQ has been serviced? */ +#define _XENIRQSTAT_needs_eoi (0) +#define XENIRQSTAT_needs_eoi (1U<<_XENIRQSTAT_needs_eoi) + +/* IRQ shared by multiple guests? */ +#define _XENIRQSTAT_shared (1) +#define XENIRQSTAT_shared (1U<<_XENIRQSTAT_shared) + +/* + * Set the current VCPU's I/O privilege level. + * @arg == pointer to physdev_set_iopl structure. + */ +#define PHYSDEVOP_set_iopl 6 +struct physdev_set_iopl { + /* IN */ + uint32_t iopl; +}; +typedef struct physdev_set_iopl physdev_set_iopl_t; +DEFINE_XEN_GUEST_HANDLE(physdev_set_iopl_t); + +/* + * Set the current VCPU's I/O-port permissions bitmap. + * @arg == pointer to physdev_set_iobitmap structure. + */ +#define PHYSDEVOP_set_iobitmap 7 +struct physdev_set_iobitmap { + /* IN */ +#if __XEN_INTERFACE_VERSION__ >= 0x00030205 + XEN_GUEST_HANDLE(uint8) bitmap; +#else + uint8_t *bitmap; +#endif + uint32_t nr_ports; +}; +typedef struct physdev_set_iobitmap physdev_set_iobitmap_t; +DEFINE_XEN_GUEST_HANDLE(physdev_set_iobitmap_t); + +/* + * Read or write an IO-APIC register. + * @arg == pointer to physdev_apic structure. + */ +#define PHYSDEVOP_apic_read 8 +#define PHYSDEVOP_apic_write 9 +struct physdev_apic { + /* IN */ + unsigned long apic_physbase; + uint32_t reg; + /* IN or OUT */ + uint32_t value; +}; +typedef struct physdev_apic physdev_apic_t; +DEFINE_XEN_GUEST_HANDLE(physdev_apic_t); + +/* + * Allocate or free a physical upcall vector for the specified IRQ line. + * @arg == pointer to physdev_irq structure. + */ +#define PHYSDEVOP_alloc_irq_vector 10 +#define PHYSDEVOP_free_irq_vector 11 +struct physdev_irq { + /* IN */ + uint32_t irq; + /* IN or OUT */ + uint32_t vector; +}; +typedef struct physdev_irq physdev_irq_t; +DEFINE_XEN_GUEST_HANDLE(physdev_irq_t); + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + /* IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int entry_nr; + /* IN */ + uint64_t table_base; +}; +typedef struct physdev_map_pirq physdev_map_pirq_t; +DEFINE_XEN_GUEST_HANDLE(physdev_map_pirq_t); + +#define PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t domid; + /* IN */ + int pirq; +}; + +typedef struct physdev_unmap_pirq physdev_unmap_pirq_t; +DEFINE_XEN_GUEST_HANDLE(physdev_unmap_pirq_t); + +#define PHYSDEVOP_manage_pci_add 15 +#define PHYSDEVOP_manage_pci_remove 16 +struct physdev_manage_pci { + /* IN */ + uint8_t bus; + uint8_t devfn; +}; + +typedef struct physdev_manage_pci physdev_manage_pci_t; +DEFINE_XEN_GUEST_HANDLE(physdev_manage_pci_t); + +/* + * Argument to physdev_op_compat() hypercall. Superceded by new physdev_op() + * hypercall since 0x00030202. + */ +struct physdev_op { + uint32_t cmd; + union { + struct physdev_irq_status_query irq_status_query; + struct physdev_set_iopl set_iopl; + struct physdev_set_iobitmap set_iobitmap; + struct physdev_apic apic_op; + struct physdev_irq irq_op; + } u; +}; +typedef struct physdev_op physdev_op_t; +DEFINE_XEN_GUEST_HANDLE(physdev_op_t); + +/* + * Notify that some PIRQ-bound event channels have been unmasked. + * ** This command is obsolete since interface version 0x00030202 and is ** + * ** unsupported by newer versions of Xen. ** + */ +#define PHYSDEVOP_IRQ_UNMASK_NOTIFY 4 + +/* + * These all-capitals physdev operation names are superceded by the new names + * (defined above) since interface version 0x00030202. + */ +#define PHYSDEVOP_IRQ_STATUS_QUERY PHYSDEVOP_irq_status_query +#define PHYSDEVOP_SET_IOPL PHYSDEVOP_set_iopl +#define PHYSDEVOP_SET_IOBITMAP PHYSDEVOP_set_iobitmap +#define PHYSDEVOP_APIC_READ PHYSDEVOP_apic_read +#define PHYSDEVOP_APIC_WRITE PHYSDEVOP_apic_write +#define PHYSDEVOP_ASSIGN_VECTOR PHYSDEVOP_alloc_irq_vector +#define PHYSDEVOP_FREE_VECTOR PHYSDEVOP_free_irq_vector +#define PHYSDEVOP_IRQ_NEEDS_UNMASK_NOTIFY XENIRQSTAT_needs_eoi +#define PHYSDEVOP_IRQ_SHARED XENIRQSTAT_shared + +#endif /* __XEN_PUBLIC_PHYSDEV_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/platform.h b/xen/public/platform.h new file mode 100644 index 0000000..eee047b --- /dev/null +++ b/xen/public/platform.h @@ -0,0 +1,346 @@ +/****************************************************************************** + * platform.h + * + * Hardware platform operations. Intended for use by domain-0 kernel. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2002-2006, K Fraser + */ + +#ifndef __XEN_PUBLIC_PLATFORM_H__ +#define __XEN_PUBLIC_PLATFORM_H__ + +#include "xen.h" + +#define XENPF_INTERFACE_VERSION 0x03000001 + +/* + * Set clock such that it would read <secs,nsecs> after 00:00:00 UTC, + * 1 January, 1970 if the current system time was <system_time>. + */ +#define XENPF_settime 17 +struct xenpf_settime { + /* IN variables. */ + uint32_t secs; + uint32_t nsecs; + uint64_t system_time; +}; +typedef struct xenpf_settime xenpf_settime_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_settime_t); + +/* + * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type. + * On x86, @type is an architecture-defined MTRR memory type. + * On success, returns the MTRR that was used (@reg) and a handle that can + * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting. + * (x86-specific). + */ +#define XENPF_add_memtype 31 +struct xenpf_add_memtype { + /* IN variables. */ + xen_pfn_t mfn; + uint64_t nr_mfns; + uint32_t type; + /* OUT variables. */ + uint32_t handle; + uint32_t reg; +}; +typedef struct xenpf_add_memtype xenpf_add_memtype_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_add_memtype_t); + +/* + * Tear down an existing memory-range type. If @handle is remembered then it + * should be passed in to accurately tear down the correct setting (in case + * of overlapping memory regions with differing types). If it is not known + * then @handle should be set to zero. In all cases @reg must be set. + * (x86-specific). + */ +#define XENPF_del_memtype 32 +struct xenpf_del_memtype { + /* IN variables. */ + uint32_t handle; + uint32_t reg; +}; +typedef struct xenpf_del_memtype xenpf_del_memtype_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_del_memtype_t); + +/* Read current type of an MTRR (x86-specific). */ +#define XENPF_read_memtype 33 +struct xenpf_read_memtype { + /* IN variables. */ + uint32_t reg; + /* OUT variables. */ + xen_pfn_t mfn; + uint64_t nr_mfns; + uint32_t type; +}; +typedef struct xenpf_read_memtype xenpf_read_memtype_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_read_memtype_t); + +#define XENPF_microcode_update 35 +struct xenpf_microcode_update { + /* IN variables. */ + XEN_GUEST_HANDLE(const_void) data;/* Pointer to microcode data */ + uint32_t length; /* Length of microcode data. */ +}; +typedef struct xenpf_microcode_update xenpf_microcode_update_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_microcode_update_t); + +#define XENPF_platform_quirk 39 +#define QUIRK_NOIRQBALANCING 1 /* Do not restrict IO-APIC RTE targets */ +#define QUIRK_IOAPIC_BAD_REGSEL 2 /* IO-APIC REGSEL forgets its value */ +#define QUIRK_IOAPIC_GOOD_REGSEL 3 /* IO-APIC REGSEL behaves properly */ +struct xenpf_platform_quirk { + /* IN variables. */ + uint32_t quirk_id; +}; +typedef struct xenpf_platform_quirk xenpf_platform_quirk_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_platform_quirk_t); + +#define XENPF_firmware_info 50 +#define XEN_FW_DISK_INFO 1 /* from int 13 AH=08/41/48 */ +#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */ +#define XEN_FW_VBEDDC_INFO 3 /* from int 10 AX=4f15 */ +struct xenpf_firmware_info { + /* IN variables. */ + uint32_t type; + uint32_t index; + /* OUT variables. */ + union { + struct { + /* Int13, Fn48: Check Extensions Present. */ + uint8_t device; /* %dl: bios device number */ + uint8_t version; /* %ah: major version */ + uint16_t interface_support; /* %cx: support bitmap */ + /* Int13, Fn08: Legacy Get Device Parameters. */ + uint16_t legacy_max_cylinder; /* %cl[7:6]:%ch: max cyl # */ + uint8_t legacy_max_head; /* %dh: max head # */ + uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector # */ + /* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */ + /* NB. First uint16_t of buffer must be set to buffer size. */ + XEN_GUEST_HANDLE(void) edd_params; + } disk_info; /* XEN_FW_DISK_INFO */ + struct { + uint8_t device; /* bios device number */ + uint32_t mbr_signature; /* offset 0x1b8 in mbr */ + } disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */ + struct { + /* Int10, AX=4F15: Get EDID info. */ + uint8_t capabilities; + uint8_t edid_transfer_time; + /* must refer to 128-byte buffer */ + XEN_GUEST_HANDLE(uint8) edid; + } vbeddc_info; /* XEN_FW_VBEDDC_INFO */ + } u; +}; +typedef struct xenpf_firmware_info xenpf_firmware_info_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_firmware_info_t); + +#define XENPF_enter_acpi_sleep 51 +struct xenpf_enter_acpi_sleep { + /* IN variables */ + uint16_t pm1a_cnt_val; /* PM1a control value. */ + uint16_t pm1b_cnt_val; /* PM1b control value. */ + uint32_t sleep_state; /* Which state to enter (Sn). */ + uint32_t flags; /* Must be zero. */ +}; +typedef struct xenpf_enter_acpi_sleep xenpf_enter_acpi_sleep_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_enter_acpi_sleep_t); + +#define XENPF_change_freq 52 +struct xenpf_change_freq { + /* IN variables */ + uint32_t flags; /* Must be zero. */ + uint32_t cpu; /* Physical cpu. */ + uint64_t freq; /* New frequency (Hz). */ +}; +typedef struct xenpf_change_freq xenpf_change_freq_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_change_freq_t); + +/* + * Get idle times (nanoseconds since boot) for physical CPUs specified in the + * @cpumap_bitmap with range [0..@cpumap_nr_cpus-1]. The @idletime array is + * indexed by CPU number; only entries with the corresponding @cpumap_bitmap + * bit set are written to. On return, @cpumap_bitmap is modified so that any + * non-existent CPUs are cleared. Such CPUs have their @idletime array entry + * cleared. + */ +#define XENPF_getidletime 53 +struct xenpf_getidletime { + /* IN/OUT variables */ + /* IN: CPUs to interrogate; OUT: subset of IN which are present */ + XEN_GUEST_HANDLE(uint8) cpumap_bitmap; + /* IN variables */ + /* Size of cpumap bitmap. */ + uint32_t cpumap_nr_cpus; + /* Must be indexable for every cpu in cpumap_bitmap. */ + XEN_GUEST_HANDLE(uint64) idletime; + /* OUT variables */ + /* System time when the idletime snapshots were taken. */ + uint64_t now; +}; +typedef struct xenpf_getidletime xenpf_getidletime_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t); + +#define XENPF_set_processor_pminfo 54 + +/* ability bits */ +#define XEN_PROCESSOR_PM_CX 1 +#define XEN_PROCESSOR_PM_PX 2 +#define XEN_PROCESSOR_PM_TX 4 + +/* cmd type */ +#define XEN_PM_CX 0 +#define XEN_PM_PX 1 +#define XEN_PM_TX 2 + +/* Px sub info type */ +#define XEN_PX_PCT 1 +#define XEN_PX_PSS 2 +#define XEN_PX_PPC 4 +#define XEN_PX_PSD 8 + +struct xen_power_register { + uint32_t space_id; + uint32_t bit_width; + uint32_t bit_offset; + uint32_t access_size; + uint64_t address; +}; + +struct xen_processor_csd { + uint32_t domain; /* domain number of one dependent group */ + uint32_t coord_type; /* coordination type */ + uint32_t num; /* number of processors in same domain */ +}; +typedef struct xen_processor_csd xen_processor_csd_t; +DEFINE_XEN_GUEST_HANDLE(xen_processor_csd_t); + +struct xen_processor_cx { + struct xen_power_register reg; /* GAS for Cx trigger register */ + uint8_t type; /* cstate value, c0: 0, c1: 1, ... */ + uint32_t latency; /* worst latency (ms) to enter/exit this cstate */ + uint32_t power; /* average power consumption(mW) */ + uint32_t dpcnt; /* number of dependency entries */ + XEN_GUEST_HANDLE(xen_processor_csd_t) dp; /* NULL if no dependency */ +}; +typedef struct xen_processor_cx xen_processor_cx_t; +DEFINE_XEN_GUEST_HANDLE(xen_processor_cx_t); + +struct xen_processor_flags { + uint32_t bm_control:1; + uint32_t bm_check:1; + uint32_t has_cst:1; + uint32_t power_setup_done:1; + uint32_t bm_rld_set:1; +}; + +struct xen_processor_power { + uint32_t count; /* number of C state entries in array below */ + struct xen_processor_flags flags; /* global flags of this processor */ + XEN_GUEST_HANDLE(xen_processor_cx_t) states; /* supported c states */ +}; + +struct xen_pct_register { + uint8_t descriptor; + uint16_t length; + uint8_t space_id; + uint8_t bit_width; + uint8_t bit_offset; + uint8_t reserved; + uint64_t address; +}; + +struct xen_processor_px { + uint64_t core_frequency; /* megahertz */ + uint64_t power; /* milliWatts */ + uint64_t transition_latency; /* microseconds */ + uint64_t bus_master_latency; /* microseconds */ + uint64_t control; /* control value */ + uint64_t status; /* success indicator */ +}; +typedef struct xen_processor_px xen_processor_px_t; +DEFINE_XEN_GUEST_HANDLE(xen_processor_px_t); + +struct xen_psd_package { + uint64_t num_entries; + uint64_t revision; + uint64_t domain; + uint64_t coord_type; + uint64_t num_processors; +}; + +struct xen_processor_performance { + uint32_t flags; /* flag for Px sub info type */ + uint32_t platform_limit; /* Platform limitation on freq usage */ + struct xen_pct_register control_register; + struct xen_pct_register status_register; + uint32_t state_count; /* total available performance states */ + XEN_GUEST_HANDLE(xen_processor_px_t) states; + struct xen_psd_package domain_info; + uint32_t shared_type; /* coordination type of this processor */ +}; +typedef struct xen_processor_performance xen_processor_performance_t; +DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t); + +struct xenpf_set_processor_pminfo { + /* IN variables */ + uint32_t id; /* ACPI CPU ID */ + uint32_t type; /* {XEN_PM_CX, XEN_PM_PX} */ + union { + struct xen_processor_power power;/* Cx: _CST/_CSD */ + struct xen_processor_performance perf; /* Px: _PPC/_PCT/_PSS/_PSD */ + }; +}; +typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_set_processor_pminfo_t); + +struct xen_platform_op { + uint32_t cmd; + uint32_t interface_version; /* XENPF_INTERFACE_VERSION */ + union { + struct xenpf_settime settime; + struct xenpf_add_memtype add_memtype; + struct xenpf_del_memtype del_memtype; + struct xenpf_read_memtype read_memtype; + struct xenpf_microcode_update microcode; + struct xenpf_platform_quirk platform_quirk; + struct xenpf_firmware_info firmware_info; + struct xenpf_enter_acpi_sleep enter_acpi_sleep; + struct xenpf_change_freq change_freq; + struct xenpf_getidletime getidletime; + struct xenpf_set_processor_pminfo set_pminfo; + uint8_t pad[128]; + } u; +}; +typedef struct xen_platform_op xen_platform_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_platform_op_t); + +#endif /* __XEN_PUBLIC_PLATFORM_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/sched.h b/xen/public/sched.h new file mode 100644 index 0000000..2227a95 --- /dev/null +++ b/xen/public/sched.h @@ -0,0 +1,121 @@ +/****************************************************************************** + * sched.h + * + * Scheduler state interactions + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, Keir Fraser <keir@xensource.com> + */ + +#ifndef __XEN_PUBLIC_SCHED_H__ +#define __XEN_PUBLIC_SCHED_H__ + +#include "event_channel.h" + +/* + * The prototype for this hypercall is: + * long sched_op(int cmd, void *arg) + * @cmd == SCHEDOP_??? (scheduler operation). + * @arg == Operation-specific extra argument(s), as described below. + * + * Versions of Xen prior to 3.0.2 provided only the following legacy version + * of this hypercall, supporting only the commands yield, block and shutdown: + * long sched_op(int cmd, unsigned long arg) + * @cmd == SCHEDOP_??? (scheduler operation). + * @arg == 0 (SCHEDOP_yield and SCHEDOP_block) + * == SHUTDOWN_* code (SCHEDOP_shutdown) + * This legacy version is available to new guests as sched_op_compat(). + */ + +/* + * Voluntarily yield the CPU. + * @arg == NULL. + */ +#define SCHEDOP_yield 0 + +/* + * Block execution of this VCPU until an event is received for processing. + * If called with event upcalls masked, this operation will atomically + * reenable event delivery and check for pending events before blocking the + * VCPU. This avoids a "wakeup waiting" race. + * @arg == NULL. + */ +#define SCHEDOP_block 1 + +/* + * Halt execution of this domain (all VCPUs) and notify the system controller. + * @arg == pointer to sched_shutdown structure. + */ +#define SCHEDOP_shutdown 2 +struct sched_shutdown { + unsigned int reason; /* SHUTDOWN_* */ +}; +typedef struct sched_shutdown sched_shutdown_t; +DEFINE_XEN_GUEST_HANDLE(sched_shutdown_t); + +/* + * Poll a set of event-channel ports. Return when one or more are pending. An + * optional timeout may be specified. + * @arg == pointer to sched_poll structure. + */ +#define SCHEDOP_poll 3 +struct sched_poll { + XEN_GUEST_HANDLE(evtchn_port_t) ports; + unsigned int nr_ports; + uint64_t timeout; +}; +typedef struct sched_poll sched_poll_t; +DEFINE_XEN_GUEST_HANDLE(sched_poll_t); + +/* + * Declare a shutdown for another domain. The main use of this function is + * in interpreting shutdown requests and reasons for fully-virtualized + * domains. A para-virtualized domain may use SCHEDOP_shutdown directly. + * @arg == pointer to sched_remote_shutdown structure. + */ +#define SCHEDOP_remote_shutdown 4 +struct sched_remote_shutdown { + domid_t domain_id; /* Remote domain ID */ + unsigned int reason; /* SHUTDOWN_xxx reason */ +}; +typedef struct sched_remote_shutdown sched_remote_shutdown_t; +DEFINE_XEN_GUEST_HANDLE(sched_remote_shutdown_t); + +/* + * Reason codes for SCHEDOP_shutdown. These may be interpreted by control + * software to determine the appropriate action. For the most part, Xen does + * not care about the shutdown code. + */ +#define SHUTDOWN_poweroff 0 /* Domain exited normally. Clean up and kill. */ +#define SHUTDOWN_reboot 1 /* Clean up, kill, and then restart. */ +#define SHUTDOWN_suspend 2 /* Clean up, save suspend info, kill. */ +#define SHUTDOWN_crash 3 /* Tell controller we've crashed. */ + +#endif /* __XEN_PUBLIC_SCHED_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/sysctl.h b/xen/public/sysctl.h new file mode 100644 index 0000000..6b10954 --- /dev/null +++ b/xen/public/sysctl.h @@ -0,0 +1,308 @@ +/****************************************************************************** + * sysctl.h + * + * System management operations. For use by node control stack. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2002-2006, K Fraser + */ + +#ifndef __XEN_PUBLIC_SYSCTL_H__ +#define __XEN_PUBLIC_SYSCTL_H__ + +#if !defined(__XEN__) && !defined(__XEN_TOOLS__) +#error "sysctl operations are intended for use by node control tools only" +#endif + +#include "xen.h" +#include "domctl.h" + +#define XEN_SYSCTL_INTERFACE_VERSION 0x00000006 + +/* + * Read console content from Xen buffer ring. + */ +#define XEN_SYSCTL_readconsole 1 +struct xen_sysctl_readconsole { + /* IN: Non-zero -> clear after reading. */ + uint8_t clear; + /* IN: Non-zero -> start index specified by @index field. */ + uint8_t incremental; + uint8_t pad0, pad1; + /* + * IN: Start index for consuming from ring buffer (if @incremental); + * OUT: End index after consuming from ring buffer. + */ + uint32_t index; + /* IN: Virtual address to write console data. */ + XEN_GUEST_HANDLE_64(char) buffer; + /* IN: Size of buffer; OUT: Bytes written to buffer. */ + uint32_t count; +}; +typedef struct xen_sysctl_readconsole xen_sysctl_readconsole_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_readconsole_t); + +/* Get trace buffers machine base address */ +#define XEN_SYSCTL_tbuf_op 2 +struct xen_sysctl_tbuf_op { + /* IN variables */ +#define XEN_SYSCTL_TBUFOP_get_info 0 +#define XEN_SYSCTL_TBUFOP_set_cpu_mask 1 +#define XEN_SYSCTL_TBUFOP_set_evt_mask 2 +#define XEN_SYSCTL_TBUFOP_set_size 3 +#define XEN_SYSCTL_TBUFOP_enable 4 +#define XEN_SYSCTL_TBUFOP_disable 5 + uint32_t cmd; + /* IN/OUT variables */ + struct xenctl_cpumap cpu_mask; + uint32_t evt_mask; + /* OUT variables */ + uint64_aligned_t buffer_mfn; + uint32_t size; +}; +typedef struct xen_sysctl_tbuf_op xen_sysctl_tbuf_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tbuf_op_t); + +/* + * Get physical information about the host machine + */ +#define XEN_SYSCTL_physinfo 3 + /* (x86) The platform supports HVM guests. */ +#define _XEN_SYSCTL_PHYSCAP_hvm 0 +#define XEN_SYSCTL_PHYSCAP_hvm (1u<<_XEN_SYSCTL_PHYSCAP_hvm) + /* (x86) The platform supports HVM-guest direct access to I/O devices. */ +#define _XEN_SYSCTL_PHYSCAP_hvm_directio 1 +#define XEN_SYSCTL_PHYSCAP_hvm_directio (1u<<_XEN_SYSCTL_PHYSCAP_hvm_directio) +struct xen_sysctl_physinfo { + uint32_t threads_per_core; + uint32_t cores_per_socket; + uint32_t nr_cpus; + uint32_t nr_nodes; + uint32_t cpu_khz; + uint64_aligned_t total_pages; + uint64_aligned_t free_pages; + uint64_aligned_t scrub_pages; + uint32_t hw_cap[8]; + + /* + * IN: maximum addressable entry in the caller-provided cpu_to_node array. + * OUT: largest cpu identifier in the system. + * If OUT is greater than IN then the cpu_to_node array is truncated! + */ + uint32_t max_cpu_id; + /* + * If not NULL, this array is filled with node identifier for each cpu. + * If a cpu has no node information (e.g., cpu not present) then the + * sentinel value ~0u is written. + * The size of this array is specified by the caller in @max_cpu_id. + * If the actual @max_cpu_id is smaller than the array then the trailing + * elements of the array will not be written by the sysctl. + */ + XEN_GUEST_HANDLE_64(uint32) cpu_to_node; + + /* XEN_SYSCTL_PHYSCAP_??? */ + uint32_t capabilities; +}; +typedef struct xen_sysctl_physinfo xen_sysctl_physinfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_physinfo_t); + +/* + * Get the ID of the current scheduler. + */ +#define XEN_SYSCTL_sched_id 4 +struct xen_sysctl_sched_id { + /* OUT variable */ + uint32_t sched_id; +}; +typedef struct xen_sysctl_sched_id xen_sysctl_sched_id_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_sched_id_t); + +/* Interface for controlling Xen software performance counters. */ +#define XEN_SYSCTL_perfc_op 5 +/* Sub-operations: */ +#define XEN_SYSCTL_PERFCOP_reset 1 /* Reset all counters to zero. */ +#define XEN_SYSCTL_PERFCOP_query 2 /* Get perfctr information. */ +struct xen_sysctl_perfc_desc { + char name[80]; /* name of perf counter */ + uint32_t nr_vals; /* number of values for this counter */ +}; +typedef struct xen_sysctl_perfc_desc xen_sysctl_perfc_desc_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_perfc_desc_t); +typedef uint32_t xen_sysctl_perfc_val_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_perfc_val_t); + +struct xen_sysctl_perfc_op { + /* IN variables. */ + uint32_t cmd; /* XEN_SYSCTL_PERFCOP_??? */ + /* OUT variables. */ + uint32_t nr_counters; /* number of counters description */ + uint32_t nr_vals; /* number of values */ + /* counter information (or NULL) */ + XEN_GUEST_HANDLE_64(xen_sysctl_perfc_desc_t) desc; + /* counter values (or NULL) */ + XEN_GUEST_HANDLE_64(xen_sysctl_perfc_val_t) val; +}; +typedef struct xen_sysctl_perfc_op xen_sysctl_perfc_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_perfc_op_t); + +#define XEN_SYSCTL_getdomaininfolist 6 +struct xen_sysctl_getdomaininfolist { + /* IN variables. */ + domid_t first_domain; + uint32_t max_domains; + XEN_GUEST_HANDLE_64(xen_domctl_getdomaininfo_t) buffer; + /* OUT variables. */ + uint32_t num_domains; +}; +typedef struct xen_sysctl_getdomaininfolist xen_sysctl_getdomaininfolist_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_getdomaininfolist_t); + +/* Inject debug keys into Xen. */ +#define XEN_SYSCTL_debug_keys 7 +struct xen_sysctl_debug_keys { + /* IN variables. */ + XEN_GUEST_HANDLE_64(char) keys; + uint32_t nr_keys; +}; +typedef struct xen_sysctl_debug_keys xen_sysctl_debug_keys_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_debug_keys_t); + +/* Get physical CPU information. */ +#define XEN_SYSCTL_getcpuinfo 8 +struct xen_sysctl_cpuinfo { + uint64_aligned_t idletime; +}; +typedef struct xen_sysctl_cpuinfo xen_sysctl_cpuinfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpuinfo_t); +struct xen_sysctl_getcpuinfo { + /* IN variables. */ + uint32_t max_cpus; + XEN_GUEST_HANDLE_64(xen_sysctl_cpuinfo_t) info; + /* OUT variables. */ + uint32_t nr_cpus; +}; +typedef struct xen_sysctl_getcpuinfo xen_sysctl_getcpuinfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_getcpuinfo_t); + +#define XEN_SYSCTL_availheap 9 +struct xen_sysctl_availheap { + /* IN variables. */ + uint32_t min_bitwidth; /* Smallest address width (zero if don't care). */ + uint32_t max_bitwidth; /* Largest address width (zero if don't care). */ + int32_t node; /* NUMA node of interest (-1 for all nodes). */ + /* OUT variables. */ + uint64_aligned_t avail_bytes;/* Bytes available in the specified region. */ +}; +typedef struct xen_sysctl_availheap xen_sysctl_availheap_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_availheap_t); + +#define XEN_SYSCTL_get_pmstat 10 +struct pm_px_val { + uint64_aligned_t freq; /* Px core frequency */ + uint64_aligned_t residency; /* Px residency time */ + uint64_aligned_t count; /* Px transition count */ +}; +typedef struct pm_px_val pm_px_val_t; +DEFINE_XEN_GUEST_HANDLE(pm_px_val_t); + +struct pm_px_stat { + uint8_t total; /* total Px states */ + uint8_t usable; /* usable Px states */ + uint8_t last; /* last Px state */ + uint8_t cur; /* current Px state */ + XEN_GUEST_HANDLE_64(uint64) trans_pt; /* Px transition table */ + XEN_GUEST_HANDLE_64(pm_px_val_t) pt; +}; +typedef struct pm_px_stat pm_px_stat_t; +DEFINE_XEN_GUEST_HANDLE(pm_px_stat_t); + +struct pm_cx_stat { + uint32_t nr; /* entry nr in triggers & residencies, including C0 */ + uint32_t last; /* last Cx state */ + uint64_aligned_t idle_time; /* idle time from boot */ + XEN_GUEST_HANDLE_64(uint64) triggers; /* Cx trigger counts */ + XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */ +}; + +struct xen_sysctl_get_pmstat { +#define PMSTAT_CATEGORY_MASK 0xf0 +#define PMSTAT_PX 0x10 +#define PMSTAT_CX 0x20 +#define PMSTAT_get_max_px (PMSTAT_PX | 0x1) +#define PMSTAT_get_pxstat (PMSTAT_PX | 0x2) +#define PMSTAT_reset_pxstat (PMSTAT_PX | 0x3) +#define PMSTAT_get_max_cx (PMSTAT_CX | 0x1) +#define PMSTAT_get_cxstat (PMSTAT_CX | 0x2) +#define PMSTAT_reset_cxstat (PMSTAT_CX | 0x3) + uint32_t type; + uint32_t cpuid; + union { + struct pm_px_stat getpx; + struct pm_cx_stat getcx; + /* other struct for tx, etc */ + } u; +}; +typedef struct xen_sysctl_get_pmstat xen_sysctl_get_pmstat_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_get_pmstat_t); + +#define XEN_SYSCTL_cpu_hotplug 11 +struct xen_sysctl_cpu_hotplug { + /* IN variables */ + uint32_t cpu; /* Physical cpu. */ +#define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0 +#define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1 + uint32_t op; /* hotplug opcode */ +}; +typedef struct xen_sysctl_cpu_hotplug xen_sysctl_cpu_hotplug_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpu_hotplug_t); + + +struct xen_sysctl { + uint32_t cmd; + uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ + union { + struct xen_sysctl_readconsole readconsole; + struct xen_sysctl_tbuf_op tbuf_op; + struct xen_sysctl_physinfo physinfo; + struct xen_sysctl_sched_id sched_id; + struct xen_sysctl_perfc_op perfc_op; + struct xen_sysctl_getdomaininfolist getdomaininfolist; + struct xen_sysctl_debug_keys debug_keys; + struct xen_sysctl_getcpuinfo getcpuinfo; + struct xen_sysctl_availheap availheap; + struct xen_sysctl_get_pmstat get_pmstat; + struct xen_sysctl_cpu_hotplug cpu_hotplug; + uint8_t pad[128]; + } u; +}; +typedef struct xen_sysctl xen_sysctl_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_t); + +#endif /* __XEN_PUBLIC_SYSCTL_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/trace.h b/xen/public/trace.h new file mode 100644 index 0000000..0fc864d --- /dev/null +++ b/xen/public/trace.h @@ -0,0 +1,206 @@ +/****************************************************************************** + * include/public/trace.h + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Mark Williamson, (C) 2004 Intel Research Cambridge + * Copyright (C) 2005 Bin Ren + */ + +#ifndef __XEN_PUBLIC_TRACE_H__ +#define __XEN_PUBLIC_TRACE_H__ + +#define TRACE_EXTRA_MAX 7 +#define TRACE_EXTRA_SHIFT 28 + +/* Trace classes */ +#define TRC_CLS_SHIFT 16 +#define TRC_GEN 0x0001f000 /* General trace */ +#define TRC_SCHED 0x0002f000 /* Xen Scheduler trace */ +#define TRC_DOM0OP 0x0004f000 /* Xen DOM0 operation trace */ +#define TRC_HVM 0x0008f000 /* Xen HVM trace */ +#define TRC_MEM 0x0010f000 /* Xen memory trace */ +#define TRC_PV 0x0020f000 /* Xen PV traces */ +#define TRC_SHADOW 0x0040f000 /* Xen shadow tracing */ +#define TRC_PM 0x0080f000 /* Xen power management trace */ +#define TRC_ALL 0x0ffff000 +#define TRC_HD_TO_EVENT(x) ((x)&0x0fffffff) +#define TRC_HD_CYCLE_FLAG (1UL<<31) +#define TRC_HD_INCLUDES_CYCLE_COUNT(x) ( !!( (x) & TRC_HD_CYCLE_FLAG ) ) +#define TRC_HD_EXTRA(x) (((x)>>TRACE_EXTRA_SHIFT)&TRACE_EXTRA_MAX) + +/* Trace subclasses */ +#define TRC_SUBCLS_SHIFT 12 + +/* trace subclasses for SVM */ +#define TRC_HVM_ENTRYEXIT 0x00081000 /* VMENTRY and #VMEXIT */ +#define TRC_HVM_HANDLER 0x00082000 /* various HVM handlers */ + +#define TRC_SCHED_MIN 0x00021000 /* Just runstate changes */ +#define TRC_SCHED_VERBOSE 0x00028000 /* More inclusive scheduling */ + +/* Trace events per class */ +#define TRC_LOST_RECORDS (TRC_GEN + 1) +#define TRC_TRACE_WRAP_BUFFER (TRC_GEN + 2) +#define TRC_TRACE_CPU_CHANGE (TRC_GEN + 3) + +#define TRC_SCHED_RUNSTATE_CHANGE (TRC_SCHED_MIN + 1) +#define TRC_SCHED_DOM_ADD (TRC_SCHED_VERBOSE + 1) +#define TRC_SCHED_DOM_REM (TRC_SCHED_VERBOSE + 2) +#define TRC_SCHED_SLEEP (TRC_SCHED_VERBOSE + 3) +#define TRC_SCHED_WAKE (TRC_SCHED_VERBOSE + 4) +#define TRC_SCHED_YIELD (TRC_SCHED_VERBOSE + 5) +#define TRC_SCHED_BLOCK (TRC_SCHED_VERBOSE + 6) +#define TRC_SCHED_SHUTDOWN (TRC_SCHED_VERBOSE + 7) +#define TRC_SCHED_CTL (TRC_SCHED_VERBOSE + 8) +#define TRC_SCHED_ADJDOM (TRC_SCHED_VERBOSE + 9) +#define TRC_SCHED_SWITCH (TRC_SCHED_VERBOSE + 10) +#define TRC_SCHED_S_TIMER_FN (TRC_SCHED_VERBOSE + 11) +#define TRC_SCHED_T_TIMER_FN (TRC_SCHED_VERBOSE + 12) +#define TRC_SCHED_DOM_TIMER_FN (TRC_SCHED_VERBOSE + 13) +#define TRC_SCHED_SWITCH_INFPREV (TRC_SCHED_VERBOSE + 14) +#define TRC_SCHED_SWITCH_INFNEXT (TRC_SCHED_VERBOSE + 15) + +#define TRC_MEM_PAGE_GRANT_MAP (TRC_MEM + 1) +#define TRC_MEM_PAGE_GRANT_UNMAP (TRC_MEM + 2) +#define TRC_MEM_PAGE_GRANT_TRANSFER (TRC_MEM + 3) + +#define TRC_PV_HYPERCALL (TRC_PV + 1) +#define TRC_PV_TRAP (TRC_PV + 3) +#define TRC_PV_PAGE_FAULT (TRC_PV + 4) +#define TRC_PV_FORCED_INVALID_OP (TRC_PV + 5) +#define TRC_PV_EMULATE_PRIVOP (TRC_PV + 6) +#define TRC_PV_EMULATE_4GB (TRC_PV + 7) +#define TRC_PV_MATH_STATE_RESTORE (TRC_PV + 8) +#define TRC_PV_PAGING_FIXUP (TRC_PV + 9) +#define TRC_PV_GDT_LDT_MAPPING_FAULT (TRC_PV + 10) +#define TRC_PV_PTWR_EMULATION (TRC_PV + 11) +#define TRC_PV_PTWR_EMULATION_PAE (TRC_PV + 12) + /* Indicates that addresses in trace record are 64 bits */ +#define TRC_64_FLAG (0x100) + +#define TRC_SHADOW_NOT_SHADOW (TRC_SHADOW + 1) +#define TRC_SHADOW_FAST_PROPAGATE (TRC_SHADOW + 2) +#define TRC_SHADOW_FAST_MMIO (TRC_SHADOW + 3) +#define TRC_SHADOW_FALSE_FAST_PATH (TRC_SHADOW + 4) +#define TRC_SHADOW_MMIO (TRC_SHADOW + 5) +#define TRC_SHADOW_FIXUP (TRC_SHADOW + 6) +#define TRC_SHADOW_DOMF_DYING (TRC_SHADOW + 7) +#define TRC_SHADOW_EMULATE (TRC_SHADOW + 8) +#define TRC_SHADOW_EMULATE_UNSHADOW_USER (TRC_SHADOW + 9) +#define TRC_SHADOW_EMULATE_UNSHADOW_EVTINJ (TRC_SHADOW + 10) +#define TRC_SHADOW_EMULATE_UNSHADOW_UNHANDLED (TRC_SHADOW + 11) +#define TRC_SHADOW_WRMAP_BF (TRC_SHADOW + 12) +#define TRC_SHADOW_PREALLOC_UNPIN (TRC_SHADOW + 13) +#define TRC_SHADOW_RESYNC_FULL (TRC_SHADOW + 14) +#define TRC_SHADOW_RESYNC_ONLY (TRC_SHADOW + 15) + +/* trace events per subclass */ +#define TRC_HVM_VMENTRY (TRC_HVM_ENTRYEXIT + 0x01) +#define TRC_HVM_VMEXIT (TRC_HVM_ENTRYEXIT + 0x02) +#define TRC_HVM_VMEXIT64 (TRC_HVM_ENTRYEXIT + TRC_64_FLAG + 0x02) +#define TRC_HVM_PF_XEN (TRC_HVM_HANDLER + 0x01) +#define TRC_HVM_PF_XEN64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x01) +#define TRC_HVM_PF_INJECT (TRC_HVM_HANDLER + 0x02) +#define TRC_HVM_PF_INJECT64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x02) +#define TRC_HVM_INJ_EXC (TRC_HVM_HANDLER + 0x03) +#define TRC_HVM_INJ_VIRQ (TRC_HVM_HANDLER + 0x04) +#define TRC_HVM_REINJ_VIRQ (TRC_HVM_HANDLER + 0x05) +#define TRC_HVM_IO_READ (TRC_HVM_HANDLER + 0x06) +#define TRC_HVM_IO_WRITE (TRC_HVM_HANDLER + 0x07) +#define TRC_HVM_CR_READ (TRC_HVM_HANDLER + 0x08) +#define TRC_HVM_CR_READ64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x08) +#define TRC_HVM_CR_WRITE (TRC_HVM_HANDLER + 0x09) +#define TRC_HVM_CR_WRITE64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x09) +#define TRC_HVM_DR_READ (TRC_HVM_HANDLER + 0x0A) +#define TRC_HVM_DR_WRITE (TRC_HVM_HANDLER + 0x0B) +#define TRC_HVM_MSR_READ (TRC_HVM_HANDLER + 0x0C) +#define TRC_HVM_MSR_WRITE (TRC_HVM_HANDLER + 0x0D) +#define TRC_HVM_CPUID (TRC_HVM_HANDLER + 0x0E) +#define TRC_HVM_INTR (TRC_HVM_HANDLER + 0x0F) +#define TRC_HVM_NMI (TRC_HVM_HANDLER + 0x10) +#define TRC_HVM_SMI (TRC_HVM_HANDLER + 0x11) +#define TRC_HVM_VMMCALL (TRC_HVM_HANDLER + 0x12) +#define TRC_HVM_HLT (TRC_HVM_HANDLER + 0x13) +#define TRC_HVM_INVLPG (TRC_HVM_HANDLER + 0x14) +#define TRC_HVM_INVLPG64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x14) +#define TRC_HVM_MCE (TRC_HVM_HANDLER + 0x15) +#define TRC_HVM_IO_ASSIST (TRC_HVM_HANDLER + 0x16) +#define TRC_HVM_IO_ASSIST64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x16) +#define TRC_HVM_MMIO_ASSIST (TRC_HVM_HANDLER + 0x17) +#define TRC_HVM_MMIO_ASSIST64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x17) +#define TRC_HVM_CLTS (TRC_HVM_HANDLER + 0x18) +#define TRC_HVM_LMSW (TRC_HVM_HANDLER + 0x19) +#define TRC_HVM_LMSW64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x19) + +/* trace subclasses for power management */ +#define TRC_PM_FREQ 0x00801000 /* xen cpu freq events */ +#define TRC_PM_IDLE 0x00802000 /* xen cpu idle events */ + +/* trace events for per class */ +#define TRC_PM_FREQ_CHANGE (TRC_PM_FREQ + 0x01) +#define TRC_PM_IDLE_ENTRY (TRC_PM_IDLE + 0x01) +#define TRC_PM_IDLE_EXIT (TRC_PM_IDLE + 0x02) + +/* This structure represents a single trace buffer record. */ +struct t_rec { + uint32_t event:28; + uint32_t extra_u32:3; /* # entries in trailing extra_u32[] array */ + uint32_t cycles_included:1; /* u.cycles or u.no_cycles? */ + union { + struct { + uint32_t cycles_lo, cycles_hi; /* cycle counter timestamp */ + uint32_t extra_u32[7]; /* event data items */ + } cycles; + struct { + uint32_t extra_u32[7]; /* event data items */ + } nocycles; + } u; +}; + +/* + * This structure contains the metadata for a single trace buffer. The head + * field, indexes into an array of struct t_rec's. + */ +struct t_buf { + /* Assume the data buffer size is X. X is generally not a power of 2. + * CONS and PROD are incremented modulo (2*X): + * 0 <= cons < 2*X + * 0 <= prod < 2*X + * This is done because addition modulo X breaks at 2^32 when X is not a + * power of 2: + * (((2^32 - 1) % X) + 1) % X != (2^32) % X + */ + uint32_t cons; /* Offset of next item to be consumed by control tools. */ + uint32_t prod; /* Offset of next item to be produced by Xen. */ + /* Records follow immediately after the meta-data header. */ +}; + +#endif /* __XEN_PUBLIC_TRACE_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + */ diff --git a/xen/public/vcpu.h b/xen/public/vcpu.h new file mode 100644 index 0000000..ab65493 --- /dev/null +++ b/xen/public/vcpu.h @@ -0,0 +1,213 @@ +/****************************************************************************** + * vcpu.h + * + * VCPU initialisation, query, and hotplug. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, Keir Fraser <keir@xensource.com> + */ + +#ifndef __XEN_PUBLIC_VCPU_H__ +#define __XEN_PUBLIC_VCPU_H__ + +/* + * Prototype for this hypercall is: + * int vcpu_op(int cmd, int vcpuid, void *extra_args) + * @cmd == VCPUOP_??? (VCPU operation). + * @vcpuid == VCPU to operate on. + * @extra_args == Operation-specific extra arguments (NULL if none). + */ + +/* + * Initialise a VCPU. Each VCPU can be initialised only once. A + * newly-initialised VCPU will not run until it is brought up by VCPUOP_up. + * + * @extra_arg == pointer to vcpu_guest_context structure containing initial + * state for the VCPU. + */ +#define VCPUOP_initialise 0 + +/* + * Bring up a VCPU. This makes the VCPU runnable. This operation will fail + * if the VCPU has not been initialised (VCPUOP_initialise). + */ +#define VCPUOP_up 1 + +/* + * Bring down a VCPU (i.e., make it non-runnable). + * There are a few caveats that callers should observe: + * 1. This operation may return, and VCPU_is_up may return false, before the + * VCPU stops running (i.e., the command is asynchronous). It is a good + * idea to ensure that the VCPU has entered a non-critical loop before + * bringing it down. Alternatively, this operation is guaranteed + * synchronous if invoked by the VCPU itself. + * 2. After a VCPU is initialised, there is currently no way to drop all its + * references to domain memory. Even a VCPU that is down still holds + * memory references via its pagetable base pointer and GDT. It is good + * practise to move a VCPU onto an 'idle' or default page table, LDT and + * GDT before bringing it down. + */ +#define VCPUOP_down 2 + +/* Returns 1 if the given VCPU is up. */ +#define VCPUOP_is_up 3 + +/* + * Return information about the state and running time of a VCPU. + * @extra_arg == pointer to vcpu_runstate_info structure. + */ +#define VCPUOP_get_runstate_info 4 +struct vcpu_runstate_info { + /* VCPU's current state (RUNSTATE_*). */ + int state; + /* When was current state entered (system time, ns)? */ + uint64_t state_entry_time; + /* + * Time spent in each RUNSTATE_* (ns). The sum of these times is + * guaranteed not to drift from system time. + */ + uint64_t time[4]; +}; +typedef struct vcpu_runstate_info vcpu_runstate_info_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_t); + +/* VCPU is currently running on a physical CPU. */ +#define RUNSTATE_running 0 + +/* VCPU is runnable, but not currently scheduled on any physical CPU. */ +#define RUNSTATE_runnable 1 + +/* VCPU is blocked (a.k.a. idle). It is therefore not runnable. */ +#define RUNSTATE_blocked 2 + +/* + * VCPU is not runnable, but it is not blocked. + * This is a 'catch all' state for things like hotplug and pauses by the + * system administrator (or for critical sections in the hypervisor). + * RUNSTATE_blocked dominates this state (it is the preferred state). + */ +#define RUNSTATE_offline 3 + +/* + * Register a shared memory area from which the guest may obtain its own + * runstate information without needing to execute a hypercall. + * Notes: + * 1. The registered address may be virtual or physical or guest handle, + * depending on the platform. Virtual address or guest handle should be + * registered on x86 systems. + * 2. Only one shared area may be registered per VCPU. The shared area is + * updated by the hypervisor each time the VCPU is scheduled. Thus + * runstate.state will always be RUNSTATE_running and + * runstate.state_entry_time will indicate the system time at which the + * VCPU was last scheduled to run. + * @extra_arg == pointer to vcpu_register_runstate_memory_area structure. + */ +#define VCPUOP_register_runstate_memory_area 5 +struct vcpu_register_runstate_memory_area { + union { + XEN_GUEST_HANDLE(vcpu_runstate_info_t) h; + struct vcpu_runstate_info *v; + uint64_t p; + } addr; +}; +typedef struct vcpu_register_runstate_memory_area vcpu_register_runstate_memory_area_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_register_runstate_memory_area_t); + +/* + * Set or stop a VCPU's periodic timer. Every VCPU has one periodic timer + * which can be set via these commands. Periods smaller than one millisecond + * may not be supported. + */ +#define VCPUOP_set_periodic_timer 6 /* arg == vcpu_set_periodic_timer_t */ +#define VCPUOP_stop_periodic_timer 7 /* arg == NULL */ +struct vcpu_set_periodic_timer { + uint64_t period_ns; +}; +typedef struct vcpu_set_periodic_timer vcpu_set_periodic_timer_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_set_periodic_timer_t); + +/* + * Set or stop a VCPU's single-shot timer. Every VCPU has one single-shot + * timer which can be set via these commands. + */ +#define VCPUOP_set_singleshot_timer 8 /* arg == vcpu_set_singleshot_timer_t */ +#define VCPUOP_stop_singleshot_timer 9 /* arg == NULL */ +struct vcpu_set_singleshot_timer { + uint64_t timeout_abs_ns; /* Absolute system time value in nanoseconds. */ + uint32_t flags; /* VCPU_SSHOTTMR_??? */ +}; +typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t); + +/* Flags to VCPUOP_set_singleshot_timer. */ + /* Require the timeout to be in the future (return -ETIME if it's passed). */ +#define _VCPU_SSHOTTMR_future (0) +#define VCPU_SSHOTTMR_future (1U << _VCPU_SSHOTTMR_future) + +/* + * Register a memory location in the guest address space for the + * vcpu_info structure. This allows the guest to place the vcpu_info + * structure in a convenient place, such as in a per-cpu data area. + * The pointer need not be page aligned, but the structure must not + * cross a page boundary. + * + * This may be called only once per vcpu. + */ +#define VCPUOP_register_vcpu_info 10 /* arg == vcpu_register_vcpu_info_t */ +struct vcpu_register_vcpu_info { + uint64_t mfn; /* mfn of page to place vcpu_info */ + uint32_t offset; /* offset within page */ + uint32_t rsvd; /* unused */ +}; +typedef struct vcpu_register_vcpu_info vcpu_register_vcpu_info_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_register_vcpu_info_t); + +/* Send an NMI to the specified VCPU. @extra_arg == NULL. */ +#define VCPUOP_send_nmi 11 + +/* + * Get the physical ID information for a pinned vcpu's underlying physical + * processor. The physical ID informmation is architecture-specific. + * On x86: id[31:0]=apic_id, id[63:32]=acpi_id, and all values 0xff and + * greater are reserved. + * This command returns -EINVAL if it is not a valid operation for this VCPU. + */ +#define VCPUOP_get_physid 12 /* arg == vcpu_get_physid_t */ +struct vcpu_get_physid { + uint64_t phys_id; +}; +typedef struct vcpu_get_physid vcpu_get_physid_t; +DEFINE_XEN_GUEST_HANDLE(vcpu_get_physid_t); +#define xen_vcpu_physid_to_x86_apicid(physid) \ + ((((uint32_t)(physid)) >= 0xff) ? 0xff : ((uint8_t)(physid))) +#define xen_vcpu_physid_to_x86_acpiid(physid) \ + ((((uint32_t)((physid)>>32)) >= 0xff) ? 0xff : ((uint8_t)((physid)>>32))) + +#endif /* __XEN_PUBLIC_VCPU_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/version.h b/xen/public/version.h new file mode 100644 index 0000000..944ca62 --- /dev/null +++ b/xen/public/version.h @@ -0,0 +1,91 @@ +/****************************************************************************** + * version.h + * + * Xen version, type, and compile information. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2005, Nguyen Anh Quynh <aquynh@gmail.com> + * Copyright (c) 2005, Keir Fraser <keir@xensource.com> + */ + +#ifndef __XEN_PUBLIC_VERSION_H__ +#define __XEN_PUBLIC_VERSION_H__ + +/* NB. All ops return zero on success, except XENVER_{version,pagesize} */ + +/* arg == NULL; returns major:minor (16:16). */ +#define XENVER_version 0 + +/* arg == xen_extraversion_t. */ +#define XENVER_extraversion 1 +typedef char xen_extraversion_t[16]; +#define XEN_EXTRAVERSION_LEN (sizeof(xen_extraversion_t)) + +/* arg == xen_compile_info_t. */ +#define XENVER_compile_info 2 +struct xen_compile_info { + char compiler[64]; + char compile_by[16]; + char compile_domain[32]; + char compile_date[32]; +}; +typedef struct xen_compile_info xen_compile_info_t; + +#define XENVER_capabilities 3 +typedef char xen_capabilities_info_t[1024]; +#define XEN_CAPABILITIES_INFO_LEN (sizeof(xen_capabilities_info_t)) + +#define XENVER_changeset 4 +typedef char xen_changeset_info_t[64]; +#define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t)) + +#define XENVER_platform_parameters 5 +struct xen_platform_parameters { + unsigned long virt_start; +}; +typedef struct xen_platform_parameters xen_platform_parameters_t; + +#define XENVER_get_features 6 +struct xen_feature_info { + unsigned int submap_idx; /* IN: which 32-bit submap to return */ + uint32_t submap; /* OUT: 32-bit submap */ +}; +typedef struct xen_feature_info xen_feature_info_t; + +/* Declares the features reported by XENVER_get_features. */ +#include "features.h" + +/* arg == NULL; returns host memory page size. */ +#define XENVER_pagesize 7 + +/* arg == xen_domain_handle_t. */ +#define XENVER_guest_handle 8 + +#endif /* __XEN_PUBLIC_VERSION_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/xen-compat.h b/xen/public/xen-compat.h new file mode 100644 index 0000000..329be07 --- /dev/null +++ b/xen/public/xen-compat.h @@ -0,0 +1,44 @@ +/****************************************************************************** + * xen-compat.h + * + * Guest OS interface to Xen. Compatibility layer. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2006, Christian Limpach + */ + +#ifndef __XEN_PUBLIC_XEN_COMPAT_H__ +#define __XEN_PUBLIC_XEN_COMPAT_H__ + +#define __XEN_LATEST_INTERFACE_VERSION__ 0x00030209 + +#if defined(__XEN__) || defined(__XEN_TOOLS__) +/* Xen is built with matching headers and implements the latest interface. */ +#define __XEN_INTERFACE_VERSION__ __XEN_LATEST_INTERFACE_VERSION__ +#elif !defined(__XEN_INTERFACE_VERSION__) +/* Guests which do not specify a version get the legacy interface. */ +#define __XEN_INTERFACE_VERSION__ 0x00000000 +#endif + +#if __XEN_INTERFACE_VERSION__ > __XEN_LATEST_INTERFACE_VERSION__ +#error "These header files do not support the requested interface version." +#endif + +#endif /* __XEN_PUBLIC_XEN_COMPAT_H__ */ diff --git a/xen/public/xen.h b/xen/public/xen.h new file mode 100644 index 0000000..084bb90 --- /dev/null +++ b/xen/public/xen.h @@ -0,0 +1,657 @@ +/****************************************************************************** + * xen.h + * + * Guest OS interface to Xen. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004, K A Fraser + */ + +#ifndef __XEN_PUBLIC_XEN_H__ +#define __XEN_PUBLIC_XEN_H__ + +#include <sys/types.h> + +#include "xen-compat.h" + +#if defined(__i386__) || defined(__x86_64__) +#include "arch-x86/xen.h" +#elif defined(__ia64__) +#include "arch-ia64.h" +#else +#error "Unsupported architecture" +#endif + +#ifndef __ASSEMBLY__ +/* Guest handles for primitive C types. */ +DEFINE_XEN_GUEST_HANDLE(char); +__DEFINE_XEN_GUEST_HANDLE(uchar, unsigned char); +DEFINE_XEN_GUEST_HANDLE(int); +__DEFINE_XEN_GUEST_HANDLE(uint, unsigned int); +DEFINE_XEN_GUEST_HANDLE(long); +__DEFINE_XEN_GUEST_HANDLE(ulong, unsigned long); +DEFINE_XEN_GUEST_HANDLE(void); + +DEFINE_XEN_GUEST_HANDLE(xen_pfn_t); +#endif + +/* + * HYPERCALLS + */ + +#define __HYPERVISOR_set_trap_table 0 +#define __HYPERVISOR_mmu_update 1 +#define __HYPERVISOR_set_gdt 2 +#define __HYPERVISOR_stack_switch 3 +#define __HYPERVISOR_set_callbacks 4 +#define __HYPERVISOR_fpu_taskswitch 5 +#define __HYPERVISOR_sched_op_compat 6 /* compat since 0x00030101 */ +#define __HYPERVISOR_platform_op 7 +#define __HYPERVISOR_set_debugreg 8 +#define __HYPERVISOR_get_debugreg 9 +#define __HYPERVISOR_update_descriptor 10 +#define __HYPERVISOR_memory_op 12 +#define __HYPERVISOR_multicall 13 +#define __HYPERVISOR_update_va_mapping 14 +#define __HYPERVISOR_set_timer_op 15 +#define __HYPERVISOR_event_channel_op_compat 16 /* compat since 0x00030202 */ +#define __HYPERVISOR_xen_version 17 +#define __HYPERVISOR_console_io 18 +#define __HYPERVISOR_physdev_op_compat 19 /* compat since 0x00030202 */ +#define __HYPERVISOR_grant_table_op 20 +#define __HYPERVISOR_vm_assist 21 +#define __HYPERVISOR_update_va_mapping_otherdomain 22 +#define __HYPERVISOR_iret 23 /* x86 only */ +#define __HYPERVISOR_vcpu_op 24 +#define __HYPERVISOR_set_segment_base 25 /* x86/64 only */ +#define __HYPERVISOR_mmuext_op 26 +#define __HYPERVISOR_xsm_op 27 +#define __HYPERVISOR_nmi_op 28 +#define __HYPERVISOR_sched_op 29 +#define __HYPERVISOR_callback_op 30 +#define __HYPERVISOR_xenoprof_op 31 +#define __HYPERVISOR_event_channel_op 32 +#define __HYPERVISOR_physdev_op 33 +#define __HYPERVISOR_hvm_op 34 +#define __HYPERVISOR_sysctl 35 +#define __HYPERVISOR_domctl 36 +#define __HYPERVISOR_kexec_op 37 + +/* Architecture-specific hypercall definitions. */ +#define __HYPERVISOR_arch_0 48 +#define __HYPERVISOR_arch_1 49 +#define __HYPERVISOR_arch_2 50 +#define __HYPERVISOR_arch_3 51 +#define __HYPERVISOR_arch_4 52 +#define __HYPERVISOR_arch_5 53 +#define __HYPERVISOR_arch_6 54 +#define __HYPERVISOR_arch_7 55 + +/* + * HYPERCALL COMPATIBILITY. + */ + +/* New sched_op hypercall introduced in 0x00030101. */ +#if __XEN_INTERFACE_VERSION__ < 0x00030101 +#undef __HYPERVISOR_sched_op +#define __HYPERVISOR_sched_op __HYPERVISOR_sched_op_compat +#endif + +/* New event-channel and physdev hypercalls introduced in 0x00030202. */ +#if __XEN_INTERFACE_VERSION__ < 0x00030202 +#undef __HYPERVISOR_event_channel_op +#define __HYPERVISOR_event_channel_op __HYPERVISOR_event_channel_op_compat +#undef __HYPERVISOR_physdev_op +#define __HYPERVISOR_physdev_op __HYPERVISOR_physdev_op_compat +#endif + +/* New platform_op hypercall introduced in 0x00030204. */ +#if __XEN_INTERFACE_VERSION__ < 0x00030204 +#define __HYPERVISOR_dom0_op __HYPERVISOR_platform_op +#endif + +/* + * VIRTUAL INTERRUPTS + * + * Virtual interrupts that a guest OS may receive from Xen. + * + * In the side comments, 'V.' denotes a per-VCPU VIRQ while 'G.' denotes a + * global VIRQ. The former can be bound once per VCPU and cannot be re-bound. + * The latter can be allocated only once per guest: they must initially be + * allocated to VCPU0 but can subsequently be re-bound. + */ +#define VIRQ_TIMER 0 /* V. Timebase update, and/or requested timeout. */ +#define VIRQ_DEBUG 1 /* V. Request guest to dump debug info. */ +#define VIRQ_CONSOLE 2 /* G. (DOM0) Bytes received on emergency console. */ +#define VIRQ_DOM_EXC 3 /* G. (DOM0) Exceptional event for some domain. */ +#define VIRQ_TBUF 4 /* G. (DOM0) Trace buffer has records available. */ +#define VIRQ_DEBUGGER 6 /* G. (DOM0) A domain has paused for debugging. */ +#define VIRQ_XENOPROF 7 /* V. XenOprofile interrupt: new sample available */ +#define VIRQ_CON_RING 8 /* G. (DOM0) Bytes received on console */ + +/* Architecture-specific VIRQ definitions. */ +#define VIRQ_ARCH_0 16 +#define VIRQ_ARCH_1 17 +#define VIRQ_ARCH_2 18 +#define VIRQ_ARCH_3 19 +#define VIRQ_ARCH_4 20 +#define VIRQ_ARCH_5 21 +#define VIRQ_ARCH_6 22 +#define VIRQ_ARCH_7 23 + +#define NR_VIRQS 24 + +/* + * MMU-UPDATE REQUESTS + * + * HYPERVISOR_mmu_update() accepts a list of (ptr, val) pairs. + * A foreigndom (FD) can be specified (or DOMID_SELF for none). + * Where the FD has some effect, it is described below. + * ptr[1:0] specifies the appropriate MMU_* command. + * + * ptr[1:0] == MMU_NORMAL_PT_UPDATE: + * Updates an entry in a page table. If updating an L1 table, and the new + * table entry is valid/present, the mapped frame must belong to the FD, if + * an FD has been specified. If attempting to map an I/O page then the + * caller assumes the privilege of the FD. + * FD == DOMID_IO: Permit /only/ I/O mappings, at the priv level of the caller. + * FD == DOMID_XEN: Map restricted areas of Xen's heap space. + * ptr[:2] -- Machine address of the page-table entry to modify. + * val -- Value to write. + * + * ptr[1:0] == MMU_MACHPHYS_UPDATE: + * Updates an entry in the machine->pseudo-physical mapping table. + * ptr[:2] -- Machine address within the frame whose mapping to modify. + * The frame must belong to the FD, if one is specified. + * val -- Value to write into the mapping entry. + * + * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD: + * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed + * with those in @val. + */ +#define MMU_NORMAL_PT_UPDATE 0 /* checked '*ptr = val'. ptr is MA. */ +#define MMU_MACHPHYS_UPDATE 1 /* ptr = MA of frame to modify entry for */ +#define MMU_PT_UPDATE_PRESERVE_AD 2 /* atomically: *ptr = val | (*ptr&(A|D)) */ + +/* + * MMU EXTENDED OPERATIONS + * + * HYPERVISOR_mmuext_op() accepts a list of mmuext_op structures. + * A foreigndom (FD) can be specified (or DOMID_SELF for none). + * Where the FD has some effect, it is described below. + * + * cmd: MMUEXT_(UN)PIN_*_TABLE + * mfn: Machine frame number to be (un)pinned as a p.t. page. + * The frame must belong to the FD, if one is specified. + * + * cmd: MMUEXT_NEW_BASEPTR + * mfn: Machine frame number of new page-table base to install in MMU. + * + * cmd: MMUEXT_NEW_USER_BASEPTR [x86/64 only] + * mfn: Machine frame number of new page-table base to install in MMU + * when in user space. + * + * cmd: MMUEXT_TLB_FLUSH_LOCAL + * No additional arguments. Flushes local TLB. + * + * cmd: MMUEXT_INVLPG_LOCAL + * linear_addr: Linear address to be flushed from the local TLB. + * + * cmd: MMUEXT_TLB_FLUSH_MULTI + * vcpumask: Pointer to bitmap of VCPUs to be flushed. + * + * cmd: MMUEXT_INVLPG_MULTI + * linear_addr: Linear address to be flushed. + * vcpumask: Pointer to bitmap of VCPUs to be flushed. + * + * cmd: MMUEXT_TLB_FLUSH_ALL + * No additional arguments. Flushes all VCPUs' TLBs. + * + * cmd: MMUEXT_INVLPG_ALL + * linear_addr: Linear address to be flushed from all VCPUs' TLBs. + * + * cmd: MMUEXT_FLUSH_CACHE + * No additional arguments. Writes back and flushes cache contents. + * + * cmd: MMUEXT_SET_LDT + * linear_addr: Linear address of LDT base (NB. must be page-aligned). + * nr_ents: Number of entries in LDT. + * + * cmd: MMUEXT_CLEAR_PAGE + * mfn: Machine frame number to be cleared. + * + * cmd: MMUEXT_COPY_PAGE + * mfn: Machine frame number of the destination page. + * src_mfn: Machine frame number of the source page. + */ +#define MMUEXT_PIN_L1_TABLE 0 +#define MMUEXT_PIN_L2_TABLE 1 +#define MMUEXT_PIN_L3_TABLE 2 +#define MMUEXT_PIN_L4_TABLE 3 +#define MMUEXT_UNPIN_TABLE 4 +#define MMUEXT_NEW_BASEPTR 5 +#define MMUEXT_TLB_FLUSH_LOCAL 6 +#define MMUEXT_INVLPG_LOCAL 7 +#define MMUEXT_TLB_FLUSH_MULTI 8 +#define MMUEXT_INVLPG_MULTI 9 +#define MMUEXT_TLB_FLUSH_ALL 10 +#define MMUEXT_INVLPG_ALL 11 +#define MMUEXT_FLUSH_CACHE 12 +#define MMUEXT_SET_LDT 13 +#define MMUEXT_NEW_USER_BASEPTR 15 +#define MMUEXT_CLEAR_PAGE 16 +#define MMUEXT_COPY_PAGE 17 + +#ifndef __ASSEMBLY__ +struct mmuext_op { + unsigned int cmd; + union { + /* [UN]PIN_TABLE, NEW_BASEPTR, NEW_USER_BASEPTR + * CLEAR_PAGE, COPY_PAGE */ + xen_pfn_t mfn; + /* INVLPG_LOCAL, INVLPG_ALL, SET_LDT */ + unsigned long linear_addr; + } arg1; + union { + /* SET_LDT */ + unsigned int nr_ents; + /* TLB_FLUSH_MULTI, INVLPG_MULTI */ +#if __XEN_INTERFACE_VERSION__ >= 0x00030205 + XEN_GUEST_HANDLE(void) vcpumask; +#else + void *vcpumask; +#endif + /* COPY_PAGE */ + xen_pfn_t src_mfn; + } arg2; +}; +typedef struct mmuext_op mmuext_op_t; +DEFINE_XEN_GUEST_HANDLE(mmuext_op_t); +#endif + +/* These are passed as 'flags' to update_va_mapping. They can be ORed. */ +/* When specifying UVMF_MULTI, also OR in a pointer to a CPU bitmap. */ +/* UVMF_LOCAL is merely UVMF_MULTI with a NULL bitmap pointer. */ +#define UVMF_NONE (0UL<<0) /* No flushing at all. */ +#define UVMF_TLB_FLUSH (1UL<<0) /* Flush entire TLB(s). */ +#define UVMF_INVLPG (2UL<<0) /* Flush only one entry. */ +#define UVMF_FLUSHTYPE_MASK (3UL<<0) +#define UVMF_MULTI (0UL<<2) /* Flush subset of TLBs. */ +#define UVMF_LOCAL (0UL<<2) /* Flush local TLB. */ +#define UVMF_ALL (1UL<<2) /* Flush all TLBs. */ + +/* + * Commands to HYPERVISOR_console_io(). + */ +#define CONSOLEIO_write 0 +#define CONSOLEIO_read 1 + +/* + * Commands to HYPERVISOR_vm_assist(). + */ +#define VMASST_CMD_enable 0 +#define VMASST_CMD_disable 1 + +/* x86/32 guests: simulate full 4GB segment limits. */ +#define VMASST_TYPE_4gb_segments 0 + +/* x86/32 guests: trap (vector 15) whenever above vmassist is used. */ +#define VMASST_TYPE_4gb_segments_notify 1 + +/* + * x86 guests: support writes to bottom-level PTEs. + * NB1. Page-directory entries cannot be written. + * NB2. Guest must continue to remove all writable mappings of PTEs. + */ +#define VMASST_TYPE_writable_pagetables 2 + +/* x86/PAE guests: support PDPTs above 4GB. */ +#define VMASST_TYPE_pae_extended_cr3 3 + +#define MAX_VMASST_TYPE 3 + +#ifndef __ASSEMBLY__ + +typedef uint16_t domid_t; + +/* Domain ids >= DOMID_FIRST_RESERVED cannot be used for ordinary domains. */ +#define DOMID_FIRST_RESERVED (0x7FF0U) + +/* DOMID_SELF is used in certain contexts to refer to oneself. */ +#define DOMID_SELF (0x7FF0U) + +/* + * DOMID_IO is used to restrict page-table updates to mapping I/O memory. + * Although no Foreign Domain need be specified to map I/O pages, DOMID_IO + * is useful to ensure that no mappings to the OS's own heap are accidentally + * installed. (e.g., in Linux this could cause havoc as reference counts + * aren't adjusted on the I/O-mapping code path). + * This only makes sense in MMUEXT_SET_FOREIGNDOM, but in that context can + * be specified by any calling domain. + */ +#define DOMID_IO (0x7FF1U) + +/* + * DOMID_XEN is used to allow privileged domains to map restricted parts of + * Xen's heap space (e.g., the machine_to_phys table). + * This only makes sense in MMUEXT_SET_FOREIGNDOM, and is only permitted if + * the caller is privileged. + */ +#define DOMID_XEN (0x7FF2U) + +/* + * Send an array of these to HYPERVISOR_mmu_update(). + * NB. The fields are natural pointer/address size for this architecture. + */ +struct mmu_update { + uint64_t ptr; /* Machine address of PTE. */ + uint64_t val; /* New contents of PTE. */ +}; +typedef struct mmu_update mmu_update_t; +DEFINE_XEN_GUEST_HANDLE(mmu_update_t); + +/* + * Send an array of these to HYPERVISOR_multicall(). + * NB. The fields are natural register size for this architecture. + */ +struct multicall_entry { + unsigned long op, result; + unsigned long args[6]; +}; +typedef struct multicall_entry multicall_entry_t; +DEFINE_XEN_GUEST_HANDLE(multicall_entry_t); + +/* + * Event channel endpoints per domain: + * 1024 if a long is 32 bits; 4096 if a long is 64 bits. + */ +#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64) + +struct vcpu_time_info { + /* + * Updates to the following values are preceded and followed by an + * increment of 'version'. The guest can therefore detect updates by + * looking for changes to 'version'. If the least-significant bit of + * the version number is set then an update is in progress and the guest + * must wait to read a consistent set of values. + * The correct way to interact with the version number is similar to + * Linux's seqlock: see the implementations of read_seqbegin/read_seqretry. + */ + uint32_t version; + uint32_t pad0; + uint64_t tsc_timestamp; /* TSC at last update of time vals. */ + uint64_t system_time; /* Time, in nanosecs, since boot. */ + /* + * Current system time: + * system_time + + * ((((tsc - tsc_timestamp) << tsc_shift) * tsc_to_system_mul) >> 32) + * CPU frequency (Hz): + * ((10^9 << 32) / tsc_to_system_mul) >> tsc_shift + */ + uint32_t tsc_to_system_mul; + int8_t tsc_shift; + int8_t pad1[3]; +}; /* 32 bytes */ +typedef struct vcpu_time_info vcpu_time_info_t; + +struct vcpu_info { + /* + * 'evtchn_upcall_pending' is written non-zero by Xen to indicate + * a pending notification for a particular VCPU. It is then cleared + * by the guest OS /before/ checking for pending work, thus avoiding + * a set-and-check race. Note that the mask is only accessed by Xen + * on the CPU that is currently hosting the VCPU. This means that the + * pending and mask flags can be updated by the guest without special + * synchronisation (i.e., no need for the x86 LOCK prefix). + * This may seem suboptimal because if the pending flag is set by + * a different CPU then an IPI may be scheduled even when the mask + * is set. However, note: + * 1. The task of 'interrupt holdoff' is covered by the per-event- + * channel mask bits. A 'noisy' event that is continually being + * triggered can be masked at source at this very precise + * granularity. + * 2. The main purpose of the per-VCPU mask is therefore to restrict + * reentrant execution: whether for concurrency control, or to + * prevent unbounded stack usage. Whatever the purpose, we expect + * that the mask will be asserted only for short periods at a time, + * and so the likelihood of a 'spurious' IPI is suitably small. + * The mask is read before making an event upcall to the guest: a + * non-zero mask therefore guarantees that the VCPU will not receive + * an upcall activation. The mask is cleared when the VCPU requests + * to block: this avoids wakeup-waiting races. + */ + uint8_t evtchn_upcall_pending; + uint8_t evtchn_upcall_mask; + unsigned long evtchn_pending_sel; + struct arch_vcpu_info arch; + struct vcpu_time_info time; +}; /* 64 bytes (x86) */ +#ifndef __XEN__ +typedef struct vcpu_info vcpu_info_t; +#endif + +/* + * Xen/kernel shared data -- pointer provided in start_info. + * + * This structure is defined to be both smaller than a page, and the + * only data on the shared page, but may vary in actual size even within + * compatible Xen versions; guests should not rely on the size + * of this structure remaining constant. + */ +struct shared_info { + struct vcpu_info vcpu_info[MAX_VIRT_CPUS]; + + /* + * A domain can create "event channels" on which it can send and receive + * asynchronous event notifications. There are three classes of event that + * are delivered by this mechanism: + * 1. Bi-directional inter- and intra-domain connections. Domains must + * arrange out-of-band to set up a connection (usually by allocating + * an unbound 'listener' port and avertising that via a storage service + * such as xenstore). + * 2. Physical interrupts. A domain with suitable hardware-access + * privileges can bind an event-channel port to a physical interrupt + * source. + * 3. Virtual interrupts ('events'). A domain can bind an event-channel + * port to a virtual interrupt source, such as the virtual-timer + * device or the emergency console. + * + * Event channels are addressed by a "port index". Each channel is + * associated with two bits of information: + * 1. PENDING -- notifies the domain that there is a pending notification + * to be processed. This bit is cleared by the guest. + * 2. MASK -- if this bit is clear then a 0->1 transition of PENDING + * will cause an asynchronous upcall to be scheduled. This bit is only + * updated by the guest. It is read-only within Xen. If a channel + * becomes pending while the channel is masked then the 'edge' is lost + * (i.e., when the channel is unmasked, the guest must manually handle + * pending notifications as no upcall will be scheduled by Xen). + * + * To expedite scanning of pending notifications, any 0->1 pending + * transition on an unmasked channel causes a corresponding bit in a + * per-vcpu selector word to be set. Each bit in the selector covers a + * 'C long' in the PENDING bitfield array. + */ + unsigned long evtchn_pending[sizeof(unsigned long) * 8]; + unsigned long evtchn_mask[sizeof(unsigned long) * 8]; + + /* + * Wallclock time: updated only by control software. Guests should base + * their gettimeofday() syscall on this wallclock-base value. + */ + uint32_t wc_version; /* Version counter: see vcpu_time_info_t. */ + uint32_t wc_sec; /* Secs 00:00:00 UTC, Jan 1, 1970. */ + uint32_t wc_nsec; /* Nsecs 00:00:00 UTC, Jan 1, 1970. */ + + struct arch_shared_info arch; + +}; +#ifndef __XEN__ +typedef struct shared_info shared_info_t; +#endif + +/* + * Start-of-day memory layout: + * 1. The domain is started within contiguous virtual-memory region. + * 2. The contiguous region ends on an aligned 4MB boundary. + * 3. This the order of bootstrap elements in the initial virtual region: + * a. relocated kernel image + * b. initial ram disk [mod_start, mod_len] + * c. list of allocated page frames [mfn_list, nr_pages] + * d. start_info_t structure [register ESI (x86)] + * e. bootstrap page tables [pt_base, CR3 (x86)] + * f. bootstrap stack [register ESP (x86)] + * 4. Bootstrap elements are packed together, but each is 4kB-aligned. + * 5. The initial ram disk may be omitted. + * 6. The list of page frames forms a contiguous 'pseudo-physical' memory + * layout for the domain. In particular, the bootstrap virtual-memory + * region is a 1:1 mapping to the first section of the pseudo-physical map. + * 7. All bootstrap elements are mapped read-writable for the guest OS. The + * only exception is the bootstrap page table, which is mapped read-only. + * 8. There is guaranteed to be at least 512kB padding after the final + * bootstrap element. If necessary, the bootstrap virtual region is + * extended by an extra 4MB to ensure this. + */ + +#define MAX_GUEST_CMDLINE 1024 +struct start_info { + /* THE FOLLOWING ARE FILLED IN BOTH ON INITIAL BOOT AND ON RESUME. */ + char magic[32]; /* "xen-<version>-<platform>". */ + unsigned long nr_pages; /* Total pages allocated to this domain. */ + unsigned long shared_info; /* MACHINE address of shared info struct. */ + uint32_t flags; /* SIF_xxx flags. */ + xen_pfn_t store_mfn; /* MACHINE page number of shared page. */ + uint32_t store_evtchn; /* Event channel for store communication. */ + union { + struct { + xen_pfn_t mfn; /* MACHINE page number of console page. */ + uint32_t evtchn; /* Event channel for console page. */ + } domU; + struct { + uint32_t info_off; /* Offset of console_info struct. */ + uint32_t info_size; /* Size of console_info struct from start.*/ + } dom0; + } console; + /* THE FOLLOWING ARE ONLY FILLED IN ON INITIAL BOOT (NOT RESUME). */ + unsigned long pt_base; /* VIRTUAL address of page directory. */ + unsigned long nr_pt_frames; /* Number of bootstrap p.t. frames. */ + unsigned long mfn_list; /* VIRTUAL address of page-frame list. */ + unsigned long mod_start; /* VIRTUAL address of pre-loaded module. */ + unsigned long mod_len; /* Size (bytes) of pre-loaded module. */ + int8_t cmd_line[MAX_GUEST_CMDLINE]; + + /* hackish, for multiboot compatibility */ + unsigned mods_count; +}; +typedef struct start_info start_info_t; + +/* New console union for dom0 introduced in 0x00030203. */ +#if __XEN_INTERFACE_VERSION__ < 0x00030203 +#define console_mfn console.domU.mfn +#define console_evtchn console.domU.evtchn +#endif + +/* These flags are passed in the 'flags' field of start_info_t. */ +#define SIF_PRIVILEGED (1<<0) /* Is the domain privileged? */ +#define SIF_INITDOMAIN (1<<1) /* Is this the initial control domain? */ +#define SIF_MULTIBOOT_MOD (1<<2) /* Is this the initial control domain? */ +#define SIF_PM_MASK (0xFF<<8) /* reserve 1 byte for xen-pm options */ + +typedef struct dom0_vga_console_info { + uint8_t video_type; /* DOM0_VGA_CONSOLE_??? */ +#define XEN_VGATYPE_TEXT_MODE_3 0x03 +#define XEN_VGATYPE_VESA_LFB 0x23 + + union { + struct { + /* Font height, in pixels. */ + uint16_t font_height; + /* Cursor location (column, row). */ + uint16_t cursor_x, cursor_y; + /* Number of rows and columns (dimensions in characters). */ + uint16_t rows, columns; + } text_mode_3; + + struct { + /* Width and height, in pixels. */ + uint16_t width, height; + /* Bytes per scan line. */ + uint16_t bytes_per_line; + /* Bits per pixel. */ + uint16_t bits_per_pixel; + /* LFB physical address, and size (in units of 64kB). */ + uint32_t lfb_base; + uint32_t lfb_size; + /* RGB mask offsets and sizes, as defined by VBE 1.2+ */ + uint8_t red_pos, red_size; + uint8_t green_pos, green_size; + uint8_t blue_pos, blue_size; + uint8_t rsvd_pos, rsvd_size; +#if __XEN_INTERFACE_VERSION__ >= 0x00030206 + /* VESA capabilities (offset 0xa, VESA command 0x4f00). */ + uint32_t gbl_caps; + /* Mode attributes (offset 0x0, VESA command 0x4f01). */ + uint16_t mode_attrs; +#endif + } vesa_lfb; + } u; +} dom0_vga_console_info_t; +#define xen_vga_console_info dom0_vga_console_info +#define xen_vga_console_info_t dom0_vga_console_info_t + +typedef uint8_t xen_domain_handle_t[16]; + +/* Turn a plain number into a C unsigned long constant. */ +#define __mk_unsigned_long(x) x ## UL +#define mk_unsigned_long(x) __mk_unsigned_long(x) + +__DEFINE_XEN_GUEST_HANDLE(uint8, uint8_t); +__DEFINE_XEN_GUEST_HANDLE(uint16, uint16_t); +__DEFINE_XEN_GUEST_HANDLE(uint32, uint32_t); +__DEFINE_XEN_GUEST_HANDLE(uint64, uint64_t); + +#else /* __ASSEMBLY__ */ + +/* In assembly code we cannot use C numeric constant suffixes. */ +#define mk_unsigned_long(x) x + +#endif /* !__ASSEMBLY__ */ + +/* Default definitions for macros used by domctl/sysctl. */ +#if defined(__XEN__) || defined(__XEN_TOOLS__) +#ifndef uint64_aligned_t +#define uint64_aligned_t uint64_t +#endif +#ifndef XEN_GUEST_HANDLE_64 +#define XEN_GUEST_HANDLE_64(name) XEN_GUEST_HANDLE(name) +#endif +#endif + +#endif /* __XEN_PUBLIC_XEN_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/public/xencomm.h b/xen/public/xencomm.h new file mode 100644 index 0000000..ac45e07 --- /dev/null +++ b/xen/public/xencomm.h @@ -0,0 +1,41 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) IBM Corp. 2006 + */ + +#ifndef _XEN_XENCOMM_H_ +#define _XEN_XENCOMM_H_ + +/* A xencomm descriptor is a scatter/gather list containing physical + * addresses corresponding to a virtually contiguous memory area. The + * hypervisor translates these physical addresses to machine addresses to copy + * to and from the virtually contiguous area. + */ + +#define XENCOMM_MAGIC 0x58434F4D /* 'XCOM' */ +#define XENCOMM_INVALID (~0UL) + +struct xencomm_desc { + uint32_t magic; + uint32_t nr_addrs; /* the number of entries in address[] */ + uint64_t address[0]; +}; + +#endif /* _XEN_XENCOMM_H_ */ diff --git a/xen/public/xenoprof.h b/xen/public/xenoprof.h new file mode 100644 index 0000000..183078d --- /dev/null +++ b/xen/public/xenoprof.h @@ -0,0 +1,138 @@ +/****************************************************************************** + * xenoprof.h + * + * Interface for enabling system wide profiling based on hardware performance + * counters + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) 2005 Hewlett-Packard Co. + * Written by Aravind Menon & Jose Renato Santos + */ + +#ifndef __XEN_PUBLIC_XENOPROF_H__ +#define __XEN_PUBLIC_XENOPROF_H__ + +#include "xen.h" + +/* + * Commands to HYPERVISOR_xenoprof_op(). + */ +#define XENOPROF_init 0 +#define XENOPROF_reset_active_list 1 +#define XENOPROF_reset_passive_list 2 +#define XENOPROF_set_active 3 +#define XENOPROF_set_passive 4 +#define XENOPROF_reserve_counters 5 +#define XENOPROF_counter 6 +#define XENOPROF_setup_events 7 +#define XENOPROF_enable_virq 8 +#define XENOPROF_start 9 +#define XENOPROF_stop 10 +#define XENOPROF_disable_virq 11 +#define XENOPROF_release_counters 12 +#define XENOPROF_shutdown 13 +#define XENOPROF_get_buffer 14 +#define XENOPROF_set_backtrace 15 +#define XENOPROF_last_op 15 + +#define MAX_OPROF_EVENTS 32 +#define MAX_OPROF_DOMAINS 25 +#define XENOPROF_CPU_TYPE_SIZE 64 + +/* Xenoprof performance events (not Xen events) */ +struct event_log { + uint64_t eip; + uint8_t mode; + uint8_t event; +}; + +/* PC value that indicates a special code */ +#define XENOPROF_ESCAPE_CODE ~0UL +/* Transient events for the xenoprof->oprofile cpu buf */ +#define XENOPROF_TRACE_BEGIN 1 + +/* Xenoprof buffer shared between Xen and domain - 1 per VCPU */ +struct xenoprof_buf { + uint32_t event_head; + uint32_t event_tail; + uint32_t event_size; + uint32_t vcpu_id; + uint64_t xen_samples; + uint64_t kernel_samples; + uint64_t user_samples; + uint64_t lost_samples; + struct event_log event_log[1]; +}; +#ifndef __XEN__ +typedef struct xenoprof_buf xenoprof_buf_t; +DEFINE_XEN_GUEST_HANDLE(xenoprof_buf_t); +#endif + +struct xenoprof_init { + int32_t num_events; + int32_t is_primary; + char cpu_type[XENOPROF_CPU_TYPE_SIZE]; +}; +typedef struct xenoprof_init xenoprof_init_t; +DEFINE_XEN_GUEST_HANDLE(xenoprof_init_t); + +struct xenoprof_get_buffer { + int32_t max_samples; + int32_t nbuf; + int32_t bufsize; + uint64_t buf_gmaddr; +}; +typedef struct xenoprof_get_buffer xenoprof_get_buffer_t; +DEFINE_XEN_GUEST_HANDLE(xenoprof_get_buffer_t); + +struct xenoprof_counter { + uint32_t ind; + uint64_t count; + uint32_t enabled; + uint32_t event; + uint32_t hypervisor; + uint32_t kernel; + uint32_t user; + uint64_t unit_mask; +}; +typedef struct xenoprof_counter xenoprof_counter_t; +DEFINE_XEN_GUEST_HANDLE(xenoprof_counter_t); + +typedef struct xenoprof_passive { + uint16_t domain_id; + int32_t max_samples; + int32_t nbuf; + int32_t bufsize; + uint64_t buf_gmaddr; +} xenoprof_passive_t; +DEFINE_XEN_GUEST_HANDLE(xenoprof_passive_t); + + +#endif /* __XEN_PUBLIC_XENOPROF_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/ring.c b/xen/ring.c new file mode 100644 index 0000000..8644059 --- /dev/null +++ b/xen/ring.c @@ -0,0 +1,61 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <string.h> +#include "ring.h" + +/* dest is ring */ +void hyp_ring_store(void *dest, const void *src, size_t size, void *start, void *end) +{ + if (dest + size > end) { + size_t first_size = end - dest; + memcpy(dest, src, first_size); + src += first_size; + dest = start; + size -= first_size; + } + memcpy(dest, src, size); +} + +/* src is ring */ +void hyp_ring_fetch(void *dest, const void *src, size_t size, void *start, void *end) +{ + if (src + size > end) { + size_t first_size = end - src; + memcpy(dest, src, first_size); + dest += first_size; + src = start; + size -= first_size; + } + memcpy(dest, src, size); +} + +size_t hyp_ring_next_word(char **c, void *start, void *end) +{ + size_t n = 0; + + while (**c) { + n++; + if (++(*c) == end) + *c = start; + } + (*c)++; + + return n; +} diff --git a/xen/ring.h b/xen/ring.h new file mode 100644 index 0000000..6ed00ac --- /dev/null +++ b/xen/ring.h @@ -0,0 +1,34 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_RING_H +#define XEN_RING_H + +typedef unsigned32_t hyp_ring_pos_t; + +#define hyp_ring_idx(ring, pos) (((unsigned)(pos)) & (sizeof(ring)-1)) +#define hyp_ring_cell(ring, pos) (ring)[hyp_ring_idx((ring), (pos))] +#define hyp_ring_smash(ring, prod, cons) (hyp_ring_idx((ring), (prod) + 1) == \ + hyp_ring_idx((ring), (cons))) +#define hyp_ring_available(ring, prod, cons) hyp_ring_idx((ring), (cons)-(prod)-1) + +void hyp_ring_store(void *dest, const void *src, size_t size, void *start, void *end); +void hyp_ring_fetch(void *dest, const void *src, size_t size, void *start, void *end); +size_t hyp_ring_next_word(char **c, void *start, void *end); + +#endif /* XEN_RING_H */ diff --git a/xen/store.c b/xen/store.c new file mode 100644 index 0000000..3c6baeb --- /dev/null +++ b/xen/store.c @@ -0,0 +1,334 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <mach/mig_support.h> +#include <machine/pmap.h> +#include <machine/ipl.h> +#include <stdarg.h> +#include <string.h> +#include <alloca.h> +#include <xen/public/xen.h> +#include <xen/public/io/xs_wire.h> +#include <util/atoi.h> +#include "store.h" +#include "ring.h" +#include "evt.h" +#include "xen.h" + +/* TODO use events instead of just yielding */ + +/* Hypervisor part */ + +decl_simple_lock_data(static, lock); + +static struct xenstore_domain_interface *store; + +struct store_req { + const char *data; + unsigned len; +}; + +/* Send a request */ +static void store_put(hyp_store_transaction_t t, unsigned32_t type, struct store_req *req, unsigned nr_reqs) { + struct xsd_sockmsg head = { + .type = type, + .req_id = 0, + .tx_id = t, + }; + unsigned totlen, len; + unsigned i; + + totlen = 0; + for (i = 0; i < nr_reqs; i++) + totlen += req[i].len; + head.len = totlen; + totlen += sizeof(head); + + if (totlen > sizeof(store->req) - 1) + panic("too big store message %d, max %d", totlen, sizeof(store->req)); + + while (hyp_ring_available(store->req, store->req_prod, store->req_cons) < totlen) + hyp_yield(); + + mb(); + hyp_ring_store(&hyp_ring_cell(store->req, store->req_prod), &head, sizeof(head), store->req, store->req + sizeof(store->req)); + len = sizeof(head); + for (i=0; i<nr_reqs; i++) { + hyp_ring_store(&hyp_ring_cell(store->req, store->req_prod + len), req[i].data, req[i].len, store->req, store->req + sizeof(store->req)); + len += req[i].len; + } + + wmb(); + store->req_prod += totlen; + hyp_event_channel_send(boot_info.store_evtchn); +} + +static const char *errors[] = { + "EINVAL", + "EACCES", + "EEXIST", + "EISDIR", + "ENOENT", + "ENOMEM", + "ENOSPC", + "EIO", + "ENOTEMPTY", + "ENOSYS", + "EROFS", + "EBUSY", + "EAGAIN", + "EISCONN", + NULL, +}; + +/* Send a request and wait for a reply, whose header is put in head, and + * data is returned (beware, that's in the ring !) + * On error, returns NULL. Else takes the lock and return pointer on data and + * store_put_wait_end shall be called after reading it. */ +static struct xsd_sockmsg head; +const char *hyp_store_error; + +static void *store_put_wait(hyp_store_transaction_t t, unsigned32_t type, struct store_req *req, unsigned nr_reqs) { + unsigned len; + const char **error; + void *data; + + simple_lock(&lock); + store_put(t, type, req, nr_reqs); +again: + while (store->rsp_prod - store->rsp_cons < sizeof(head)) + hyp_yield(); + rmb(); + hyp_ring_fetch(&head, &hyp_ring_cell(store->rsp, store->rsp_cons), sizeof(head), store->rsp, store->rsp + sizeof(store->rsp)); + len = sizeof(head) + head.len; + while (store->rsp_prod - store->rsp_cons < len) + hyp_yield(); + rmb(); + if (head.type == XS_WATCH_EVENT) { + /* Spurious watch event, drop */ + store->rsp_cons += sizeof(head) + head.len; + hyp_event_channel_send(boot_info.store_evtchn); + goto again; + } + data = &hyp_ring_cell(store->rsp, store->rsp_cons + sizeof(head)); + if (head.len <= 10) { + char c[10]; + hyp_ring_fetch(c, data, head.len, store->rsp, store->rsp + sizeof(store->rsp)); + for (error = errors; *error; error++) { + if (head.len == strlen(*error) + 1 && !memcmp(*error, c, head.len)) { + hyp_store_error = *error; + store->rsp_cons += len; + hyp_event_channel_send(boot_info.store_evtchn); + simple_unlock(&lock); + return NULL; + } + } + } + return data; +} + +/* Must be called after each store_put_wait. Releases lock. */ +static void store_put_wait_end(void) { + mb(); + store->rsp_cons += sizeof(head) + head.len; + hyp_event_channel_send(boot_info.store_evtchn); + simple_unlock(&lock); +} + +/* Start a transaction. */ +hyp_store_transaction_t hyp_store_transaction_start(void) { + struct store_req req = { + .data = "", + .len = 1, + }; + char *rep; + char *s; + int i; + + rep = store_put_wait(0, XS_TRANSACTION_START, &req, 1); + if (!rep) + panic("couldn't start transaction (%s)", hyp_store_error); + s = alloca(head.len); + hyp_ring_fetch(s, rep, head.len, store->rsp, store->rsp + sizeof(store->rsp)); + mach_atoi((u_char*) s, &i); + if (i == MACH_ATOI_DEFAULT) + panic("bogus transaction id len %d '%s'", head.len, s); + store_put_wait_end(); + return i; +} + +/* Stop a transaction. */ +int hyp_store_transaction_stop(hyp_store_transaction_t t) { + struct store_req req = { + .data = "T", + .len = 2, + }; + int ret = 1; + void *rep; + rep = store_put_wait(t, XS_TRANSACTION_END, &req, 1); + if (!rep) + return 0; + store_put_wait_end(); + return ret; +} + +/* List a directory: returns an array to file names, terminated by NULL. Free + * with kfree. */ +char **hyp_store_ls(hyp_store_transaction_t t, int n, ...) { + struct store_req req[n]; + va_list listp; + int i; + char *rep; + char *c; + char **res, **rsp; + + va_start (listp, n); + for (i = 0; i < n; i++) { + req[i].data = va_arg(listp, char *); + req[i].len = strlen(req[i].data); + } + req[n - 1].len++; + va_end (listp); + + rep = store_put_wait(t, XS_DIRECTORY, req, n); + if (!rep) + return NULL; + i = 0; + for ( c = rep, n = 0; + n < head.len; + n += hyp_ring_next_word(&c, store->rsp, store->rsp + sizeof(store->rsp)) + 1) + i++; + res = (void*) kalloc((i + 1) * sizeof(char*) + head.len); + if (!res) + hyp_store_error = "ENOMEM"; + else { + hyp_ring_fetch(res + (i + 1), rep, head.len, store->rsp, store->rsp + sizeof(store->rsp)); + rsp = res; + for (c = (char*) (res + (i + 1)); i; i--, c += strlen(c) + 1) + *rsp++ = c; + *rsp = NULL; + } + store_put_wait_end(); + return res; +} + +/* Get the value of an entry, va version. */ +static void *hyp_store_read_va(hyp_store_transaction_t t, int n, va_list listp) { + struct store_req req[n]; + int i; + void *rep; + char *res; + + for (i = 0; i < n; i++) { + req[i].data = va_arg(listp, char *); + req[i].len = strlen(req[i].data); + } + req[n - 1].len++; + + rep = store_put_wait(t, XS_READ, req, n); + if (!rep) + return NULL; + res = (void*) kalloc(head.len + 1); + if (!res) + hyp_store_error = "ENOMEM"; + else { + hyp_ring_fetch(res, rep, head.len, store->rsp, store->rsp + sizeof(store->rsp)); + res[head.len] = 0; + } + store_put_wait_end(); + return res; +} + +/* Get the value of an entry. Free with kfree. */ +void *hyp_store_read(hyp_store_transaction_t t, int n, ...) { + va_list listp; + char *res; + + va_start(listp, n); + res = hyp_store_read_va(t, n, listp); + va_end(listp); + return res; +} + +/* Get the integer value of an entry, -1 on error. */ +int hyp_store_read_int(hyp_store_transaction_t t, int n, ...) { + va_list listp; + char *res; + int i; + + va_start(listp, n); + res = hyp_store_read_va(t, n, listp); + va_end(listp); + if (!res) + return -1; + mach_atoi((u_char *) res, &i); + if (i == MACH_ATOI_DEFAULT) + printf("bogus integer '%s'\n", res); + kfree((vm_offset_t) res, strlen(res)+1); + return i; +} + +/* Set the value of an entry. */ +char *hyp_store_write(hyp_store_transaction_t t, const char *data, int n, ...) { + struct store_req req[n + 1]; + va_list listp; + int i; + void *rep; + char *res; + + va_start (listp, n); + for (i = 0; i < n; i++) { + req[i].data = va_arg(listp, char *); + req[i].len = strlen(req[i].data); + } + req[n - 1].len++; + req[n].data = data; + req[n].len = strlen (data); + va_end (listp); + + rep = store_put_wait (t, XS_WRITE, req, n + 1); + if (!rep) + return NULL; + res = (void*) kalloc(head.len + 1); + if (!res) + hyp_store_error = NULL; + else { + hyp_ring_fetch(res, rep, head.len, store->rsp, store->rsp + sizeof(store->rsp)); + res[head.len] = 0; + } + store_put_wait_end(); + return res; +} + +static void hyp_store_handler(int unit) +{ + thread_wakeup(&boot_info.store_evtchn); +} + +/* Map store's shared page. */ +void hyp_store_init(void) +{ + if (store) + return; + simple_lock_init(&lock); + store = (void*) mfn_to_kv(boot_info.store_mfn); + pmap_set_page_readwrite(store); + /* SPL sched */ + hyp_evt_handler(boot_info.store_evtchn, hyp_store_handler, 0, SPL7); +} diff --git a/xen/store.h b/xen/store.h new file mode 100644 index 0000000..4b3ee18 --- /dev/null +++ b/xen/store.h @@ -0,0 +1,54 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_STORE_H +#define XEN_STORE_H +#include <machine/xen.h> +#include <xen/public/io/xenbus.h> + +typedef unsigned32_t hyp_store_transaction_t; + +#define hyp_store_state_unknown "0" +#define hyp_store_state_initializing "1" +#define hyp_store_state_init_wait "2" +#define hyp_store_state_initialized "3" +#define hyp_store_state_connected "4" +#define hyp_store_state_closing "5" +#define hyp_store_state_closed "6" + +void hyp_store_init(void); + +extern const char *hyp_store_error; + +/* Start a transaction. */ +hyp_store_transaction_t hyp_store_transaction_start(void); +/* Stop a transaction. Returns 1 if the transactions succeeded, 0 else. */ +int hyp_store_transaction_stop(hyp_store_transaction_t t); + +/* List a directory: returns an array to file names, terminated by NULL. Free + * with kfree. */ +char **hyp_store_ls(hyp_store_transaction_t t, int n, ...); + +/* Get the value of an entry. Free with kfree. */ +void *hyp_store_read(hyp_store_transaction_t t, int n, ...); +/* Get the integer value of an entry, -1 on error. */ +int hyp_store_read_int(hyp_store_transaction_t t, int n, ...); +/* Set the value of an entry. */ +char *hyp_store_write(hyp_store_transaction_t t, const char *data, int n, ...); + +#endif /* XEN_STORE_H */ diff --git a/xen/time.c b/xen/time.c new file mode 100644 index 0000000..4c5cc35 --- /dev/null +++ b/xen/time.c @@ -0,0 +1,148 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <mach/mach_types.h> +#include <kern/mach_clock.h> +#include <mach/xen.h> +#include <machine/xen.h> +#include <machine/spl.h> +#include <machine/ipl.h> +#include <mach/machine/eflags.h> +#include <xen/evt.h> +#include "time.h" +#include "store.h" + +static unsigned64_t lastnsec; + +/* 2^64 nanoseconds ~= 500 years */ +static unsigned64_t hyp_get_stime(void) { + unsigned32_t version; + unsigned64_t cpu_clock, last_cpu_clock, delta, system_time; + unsigned32_t mul; + signed8_t shift; + volatile struct vcpu_time_info *time = &hyp_shared_info.vcpu_info[0].time; + + do { + version = time->version; + rmb(); + cpu_clock = hyp_cpu_clock(); + last_cpu_clock = time->tsc_timestamp; + system_time = time->system_time; + mul = time->tsc_to_system_mul; + shift = time->tsc_shift; + rmb(); + } while (version != time->version); + + delta = cpu_clock - last_cpu_clock; + if (shift < 0) + delta >>= -shift; + else + delta <<= shift; + return system_time + ((delta * (unsigned64_t) mul) >> 32); +} + +unsigned64_t hyp_get_time(void) { + unsigned32_t version; + unsigned32_t sec, nsec; + + do { + version = hyp_shared_info.wc_version; + rmb(); + sec = hyp_shared_info.wc_sec; + nsec = hyp_shared_info.wc_nsec; + rmb(); + } while (version != hyp_shared_info.wc_version); + + return sec*1000000000ULL + nsec + hyp_get_stime(); +} + +static void hypclock_intr(int unit, int old_ipl, void *ret_addr, struct i386_interrupt_state *regs) { + unsigned64_t nsec, delta; + + if (!lastnsec) + return; + + nsec = hyp_get_stime(); + if (nsec < lastnsec) { + printf("warning: nsec 0x%08lx%08lx < lastnsec 0x%08lx%08lx\n",(unsigned long)(nsec>>32), (unsigned long)nsec, (unsigned long)(lastnsec>>32), (unsigned long)lastnsec); + nsec = lastnsec; + } + delta = nsec-lastnsec; + + lastnsec += (delta/1000)*1000; + hypclock_machine_intr(old_ipl, ret_addr, regs, delta); + /* 10ms tick rest */ + hyp_do_set_timer_op(hyp_get_stime()+10*1000*1000); + +#if 0 + char *c = hyp_store_read(0, 1, "control/shutdown"); + if (c) { + static int go_down = 0; + if (!go_down) { + printf("uh oh, shutdown: %s\n", c); + go_down = 1; + /* TODO: somehow send startup_reboot notification to init */ + if (!strcmp(c, "reboot")) { + /* this is just a reboot */ + } + } + } +#endif +} + +extern struct timeval time; +extern struct timezone tz; + +int +readtodc(tp) + u_int *tp; +{ + unsigned64_t t = hyp_get_time(); + u_int n = t / 1000000000; + +#ifndef MACH_KERNEL + n += tz.tz_minuteswest * 60; + if (tz.tz_dsttime) + n -= 3600; +#endif /* MACH_KERNEL */ + *tp = n; + + return(0); +} + +int +writetodc() +{ + /* Not allowed in Xen */ + return(-1); +} + +void +clkstart() +{ + evtchn_port_t port = hyp_event_channel_bind_virq(VIRQ_TIMER, 0); + hyp_evt_handler(port, hypclock_intr, 0, SPLHI); + + /* first clock tick */ + clock_interrupt(0, 0, 0); + lastnsec = hyp_get_stime(); + + /* 10ms tick rest */ + hyp_do_set_timer_op(hyp_get_stime()+10*1000*1000); +} diff --git a/xen/time.h b/xen/time.h new file mode 100644 index 0000000..f875588 --- /dev/null +++ b/xen/time.h @@ -0,0 +1,25 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_TIME_H +#define XEN_TIME_H + +#include <mach/mach_types.h> +unsigned64_t hyp_get_time(void); + +#endif /* XEN_TIME_H */ diff --git a/xen/xen.c b/xen/xen.c new file mode 100644 index 0000000..b3acef4 --- /dev/null +++ b/xen/xen.c @@ -0,0 +1,87 @@ +/* + * Copyright (C) 2007 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <sys/types.h> +#include <string.h> +#include <mach/xen.h> +#include <machine/xen.h> +#include <machine/ipl.h> +#include <xen/block.h> +#include <xen/console.h> +#include <xen/grant.h> +#include <xen/net.h> +#include <xen/store.h> +#include <xen/time.h> +#include "xen.h" +#include "evt.h" + +void hyp_invalidate_pte(pt_entry_t *pte) +{ + if (!hyp_mmu_update_pte(kv_to_ma(pte), (*pte) & ~INTEL_PTE_VALID)) + panic("%s:%d could not set pte %p(%p) to %p(%p)\n",__FILE__,__LINE__,pte,kv_to_ma(pte),*pte,pa_to_ma(*pte)); + hyp_mmuext_op_void(MMUEXT_TLB_FLUSH_LOCAL); +} + +void hyp_debug() +{ + panic("debug"); +} + +void hyp_init(void) +{ + hyp_grant_init(); + hyp_store_init(); + /* these depend on the above */ + hyp_block_init(); + hyp_net_init(); + evtchn_port_t port = hyp_event_channel_bind_virq(VIRQ_DEBUG, 0); + hyp_evt_handler(port, hyp_debug, 0, SPL7); +} + +void _hyp_halt(void) +{ + hyp_halt(); +} + +void _hyp_todo(unsigned long from) +{ + printf("TODO: at %lx\n",from); + hyp_halt(); +} + +extern int int_mask[]; +void hyp_idle(void) +{ + int cpu = 0; + hyp_shared_info.vcpu_info[cpu].evtchn_upcall_mask = 0xff; + barrier(); + /* Avoid blocking if there are pending events */ + if (!hyp_shared_info.vcpu_info[cpu].evtchn_upcall_pending && + !hyp_shared_info.evtchn_pending[cpu]) + hyp_block(); + while (1) { + hyp_shared_info.vcpu_info[cpu].evtchn_upcall_mask = 0x00; + barrier(); + if (!hyp_shared_info.vcpu_info[cpu].evtchn_upcall_pending && + !hyp_shared_info.evtchn_pending[cpu]) + /* Didn't miss any event, can return to threads. */ + break; + hyp_shared_info.vcpu_info[cpu].evtchn_upcall_mask = 0xff; + hyp_c_callback(NULL,NULL); + } +} diff --git a/xen/xen.h b/xen/xen.h new file mode 100644 index 0000000..87e1256 --- /dev/null +++ b/xen/xen.h @@ -0,0 +1,27 @@ +/* + * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef XEN_XEN_H +#define XEN_XEN_H + +void hyp_init(void); +void hyp_invalidate_pte(pt_entry_t *pte); +void hyp_idle(void); +void hyp_p2m_init(void); + +#endif /* XEN_XEN_H */ |