summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSamuel Thibault <samuel.thibault@ens-lyon.org>2009-12-16 01:11:51 +0100
committerSamuel Thibault <samuel.thibault@ens-lyon.org>2009-12-16 01:11:51 +0100
commit55dbc2b5d857d35262ad3116803dfb31b733d031 (patch)
tree9b9a7d7952ff74855de2a6c348e9c856a531158d
parent870925205c78415dc4c594bfae9de8eb31745b81 (diff)
Add Xen support
2009-03-11 Samuel Thibault <samuel.thibault@ens-lyon.org> * i386/i386/vm_param.h (VM_MIN_KERNEL_ADDRESS) [MACH_XEN]: Set to 0x20000000. * i386/i386/i386asm.sym (pfn_list) [VM_MIN_KERNEL_ADDRESS == LINEAR_MIN_KERNEL_ADDRESS]: Define to constant PFN_LIST. 2009-02-27 Samuel Thibault <samuel.thibault@ens-lyon.org> * i386/intel/pmap.c [MACH_HYP] (INVALIDATE_TLB): Call hyp_invlpg instead of flush_tlb when e - s is compile-time known to be PAGE_SIZE. 2008-11-27 Samuel Thibault <samuel.thibault@ens-lyon.org> * i386/configfrag.ac (enable_pae): Enable by default on the Xen platform. 2007-11-14 Samuel Thibault <samuel.thibault@ens-lyon.org> * i386/i386at/model_dep.c (machine_relax): New function. (c_boot_entry): Refuse to boot as dom0. 2007-10-17 Samuel Thibault <samuel.thibault@ens-lyon.org> * i386/i386/fpu.c [MACH_XEN]: Disable unused fpintr(). 2007-08-12 Samuel Thibault <samuel.thibault@ens-lyon.org> * Makefile.am (clib_routines): Add _START. * i386/xen/xen_boothdr: Use _START for VIRT_BASE and PADDR_OFFSET. Add GUEST_VERSION and XEN_ELFNOTE_FEATURES. 2007-06-13 Samuel Thibault <samuel.thibault@ens-lyon.org> * i386/i386/user_ldt.h (user_ldt) [MACH_XEN]: Add alloc field. * i386/i386/user_ldt.c (i386_set_ldt) [MACH_XEN]: Round allocation of LDT to a page, set back LDT pages read/write before freeing them. (user_ldt_free) [MACH_XEN]: Likewise. 2007-04-18 Samuel Thibault <samuel.thibault@ens-lyon.org> * device/ds_routines.c [MACH_HYP]: Add hypervisor block and net devices. 2007-02-19 Thomas Schwinge <tschwinge@gnu.org> * i386/xen/Makefrag.am [PLATFORM_xen] (gnumach_LINKFLAGS): Define. * Makefrag.am: Include `xen/Makefrag.am'. * configure.ac: Include `xen/configfrag.ac'. (--enable-platform): Support the `xen' platform. * i386/configfrag.ac: Likewise. * i386/Makefrag.am [PLATFORM_xen]: Include `i386/xen/Makefrag.am'. 2007-02-19 Samuel Thibault <samuel.thibault@ens-lyon.org> Thomas Schwinge <tschwinge@gnu.org> * i386/xen/Makefrag.am: New file. * xen/Makefrag.am: Likewise. * xen/configfrag.ac: Likewise. 2007-02-11 (and later dates) Samuel Thibault <samuel.thibault@ens-lyon.org> Xen support * Makefile.am (clib_routines): Add _start. * Makefrag.am (include_mach_HEADERS): Add include/mach/xen.h. * device/cons.c (cnputc): Call hyp_console_write. * i386/Makefrag.am (libkernel_a_SOURCES): Move non-Xen source to [PLATFORM_at]. * i386/i386/debug_trace.S: Include <i386/xen.h> * i386/i386/fpu.c [MACH_HYP] (init_fpu): Call set_ts() and clear_ts(), do not enable CR0_EM. * i386/i386/gdt.c: Include <mach/xen.h> and <intel/pmap.h>. [MACH_XEN]: Make gdt array extern. [MACH_XEN] (gdt_init): Register gdt with hypervisor. Request 4gb segments assist. Shift la_shift. [MACH_PSEUDO_PHYS] (gdt_init): Shift pfn_list. * i386/i386/gdt.h [MACH_XEN]: Don't define KERNEL_LDT and LINEAR_DS. * i386/i386/i386asm.sym: Include <i386/xen.h>. [MACH_XEN]: Remove KERNEL_LDT, Add shared_info's CPU_CLI, CPU_PENDING, CPU_PENDING_SEL, PENDING, EVTMASK and CR2. * i386/i386/idt.c [MACH_HYP] (idt_init): Register trap table with hypervisor. * i386/i386/idt_inittab.S: Include <i386/i386asm.h>. [MACH_XEN]: Set IDT_ENTRY() for hypervisor. Set trap table terminator. * i386/i386/ktss.c [MACH_XEN] (ktss_init): Request exception task switch from hypervisor. * i386/i386/ldt.c: Include <mach/xen.h> and <intel/pmap.h> [MACH_XEN]: Make ldt array extern. [MACH_XEN] (ldt_init): Set ldt readwrite. [MACH_HYP] (ldt_init): Register ldt with hypervisor. * i386/i386/locore.S: Include <i386/xen.h>. Handle KERNEL_RING == 1 case. [MACH_XEN]: Read hyp_shared_info's CR2 instead of %cr2. [MACH_PSEUDO_PHYS]: Add mfn_to_pfn computation. [MACH_HYP]: Drop Cyrix I/O-based detection. Read cr3 instead of %cr3. Make hypervisor call for pte invalidation. * i386/i386/mp_desc.c: Include <mach/xen.h>. [MACH_HYP] (mp_desc_init): Panic. * i386/i386/pcb.c: Include <mach/xen.h>. [MACH_XEN] (switch_ktss): Request stack switch from hypervisor. [MACH_HYP] (switch_ktss): Request ldt and gdt switch from hypervisor. * i386/i386/phys.c: Include <mach/xen.h> [MACH_PSEUDO_PHYS] (kvtophys): Do page translation. * i386/i386/proc_reg.h [MACH_HYP] (cr3): New declaration. (set_cr3, get_cr3, set_ts, clear_ts): Implement macros. * i386/i386/seg.h [MACH_HYP]: Define KERNEL_RING macro. Include <mach/xen.h> [MACH_XEN] (fill_descriptor): Register descriptor with hypervisor. * i386/i386/spl.S: Include <i386/xen.h> and <i386/i386/asm.h> [MACH_XEN] (pic_mask): #define to int_mask. [MACH_XEN] (SETMASK): Implement. * i386/i386/vm_param.h [MACH_XEN] (HYP_VIRT_START): New macro. [MACH_XEN]: Set VM_MAX_KERNEL_ADDRESS to HYP_VIRT_START- LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS. Increase KERNEL_STACK_SIZE and INTSTACK_SIZE to 4 pages. * i386/i386at/conf.c [MACH_HYP]: Remove hardware devices, add hypervisor console device. * i386/i386at/cons_conf.c [MACH_HYP]: Add hypervisor console device. * i386/i386at/model_dep.c: Include <sys/types.h>, <mach/xen.h>. [MACH_XEN] Include <xen/console.h>, <xen/store.h>, <xen/evt.h>, <xen/xen.h>. [MACH_PSEUDO_PHYS]: New boot_info, mfn_list, pfn_list variables. [MACH_XEN]: New la_shift variable. [MACH_HYP] (avail_next, mem_size_init): Drop BIOS skipping mecanism. [MACH_HYP] (machine_init): Call hyp_init(), drop hardware initialization. [MACH_HYP] (machine_idle): Call hyp_idle(). [MACH_HYP] (halt_cpu): Call hyp_halt(). [MACH_HYP] (halt_all_cpus): Call hyp_reboot() or hyp_halt(). [MACH_HYP] (i386at_init): Initialize with hypervisor. [MACH_XEN] (c_boot_entry): Add Xen-specific initialization. [MACH_HYP] (init_alloc_aligned, pmap_valid_page): Drop zones skipping mecanism. * i386/intel/pmap.c: Include <mach/xen.h>. [MACH_PSEUDO_PHYS] (WRITE_PTE): Do page translation. [MACH_HYP] (INVALIDATE_TLB): Request invalidation from hypervisor. [MACH_XEN] (pmap_map_bd, pmap_create, pmap_destroy, pmap_remove_range) (pmap_page_protect, pmap_protect, pmap_enter, pmap_change_wiring) (pmap_attribute_clear, pmap_unmap_page_zero, pmap_collect): Request MMU update from hypervisor. [MACH_XEN] (pmap_bootstrap): Request pagetable initialization from hypervisor. [MACH_XEN] (pmap_set_page_readwrite, pmap_set_page_readonly) (pmap_set_page_readonly_init, pmap_clear_bootstrap_pagetable) (pmap_map_mfn): New functions. * i386/intel/pmap.h [MACH_XEN] (INTEL_PTE_GLOBAL): Disable global page support. [MACH_PSEUDO_PHYS] (pte_to_pa): Do page translation. [MACH_XEN] (pmap_set_page_readwrite, pmap_set_page_readonly) (pmap_set_page_readonly_init, pmap_clear_bootstrap_pagetable) (pmap_map_mfn): Declare functions. * i386/i386/xen.h: New file. * i386/xen/xen.c: New file. * i386/xen/xen_boothdr.S: New file. * i386/xen/xen_locore.S: New file. * include/mach/xen.h: New file. * kern/bootstrap.c [MACH_XEN] (boot_info): Declare variable. [MACH_XEN] (bootstrap_create): Rebase multiboot header. * kern/debug.c: Include <mach/xen.h>. [MACH_HYP] (panic): Call hyp_crash() without delay. * linux/dev/include/asm-i386/segment.h [MACH_HYP] (KERNEL_CS) (KERNEL_DS): Use ring 1. * xen/block.c: New file. * xen/block.h: Likewise. * xen/console.c: Likewise. * xen/console.h: Likewise. * xen/evt.c: Likewise. * xen/evt.h: Likewise. * xen/grant.c: Likewise. * xen/grant.h: Likewise. * xen/net.c: Likewise. * xen/net.h: Likewise. * xen/ring.c: Likewise. * xen/ring.h: Likewise. * xen/store.c: Likewise. * xen/store.h: Likewise. * xen/time.c: Likewise. * xen/time.h: Likewise. * xen/xen.c: Likewise. * xen/xen.h: Likewise. * xen/public/COPYING: Import file from Xen. * xen/public/callback.h: Likewise. * xen/public/dom0_ops.h: Likewise. * xen/public/domctl.h: Likewise. * xen/public/elfnote.h: Likewise. * xen/public/elfstructs.h: Likewise. * xen/public/event_channel.h: Likewise. * xen/public/features.h: Likewise. * xen/public/grant_table.h: Likewise. * xen/public/kexec.h: Likewise. * xen/public/libelf.h: Likewise. * xen/public/memory.h: Likewise. * xen/public/nmi.h: Likewise. * xen/public/physdev.h: Likewise. * xen/public/platform.h: Likewise. * xen/public/sched.h: Likewise. * xen/public/sysctl.h: Likewise. * xen/public/trace.h: Likewise. * xen/public/vcpu.h: Likewise. * xen/public/version.h: Likewise. * xen/public/xen-compat.h: Likewise. * xen/public/xen.h: Likewise. * xen/public/xencomm.h: Likewise. * xen/public/xenoprof.h: Likewise. * xen/public/arch-x86/xen-mca.h: Likewise. * xen/public/arch-x86/xen-x86_32.h: Likewise. * xen/public/arch-x86/xen-x86_64.h: Likewise. * xen/public/arch-x86/xen.h: Likewise. * xen/public/arch-x86_32.h: Likewise. * xen/public/arch-x86_64.h: Likewise. * xen/public/io/blkif.h: Likewise. * xen/public/io/console.h: Likewise. * xen/public/io/fbif.h: Likewise. * xen/public/io/fsif.h: Likewise. * xen/public/io/kbdif.h: Likewise. * xen/public/io/netif.h: Likewise. * xen/public/io/pciif.h: Likewise. * xen/public/io/protocols.h: Likewise. * xen/public/io/ring.h: Likewise. * xen/public/io/tpmif.h: Likewise. * xen/public/io/xenbus.h: Likewise. * xen/public/io/xs_wire.h: Likewise.
-rw-r--r--Makefile.am2
-rw-r--r--Makefrag.am8
-rw-r--r--configure.ac10
-rw-r--r--device/cons.c9
-rw-r--r--device/ds_routines.c8
-rw-r--r--doc/mach.texi4
-rw-r--r--i386/Makefrag.am47
-rw-r--r--i386/configfrag.ac12
-rw-r--r--i386/i386/debug_trace.S1
-rw-r--r--i386/i386/fpu.c15
-rw-r--r--i386/i386/gdt.c27
-rw-r--r--i386/i386/gdt.h4
-rw-r--r--i386/i386/i386asm.sym15
-rw-r--r--i386/i386/idt.c5
-rw-r--r--i386/i386/idt_inittab.S16
-rw-r--r--i386/i386/ktss.c7
-rw-r--r--i386/i386/ldt.c15
-rw-r--r--i386/i386/locore.S50
-rw-r--r--i386/i386/mp_desc.c5
-rw-r--r--i386/i386/pcb.c31
-rw-r--r--i386/i386/phys.c7
-rw-r--r--i386/i386/proc_reg.h22
-rw-r--r--i386/i386/seg.h18
-rw-r--r--i386/i386/spl.S18
-rw-r--r--i386/i386/trap.c5
-rw-r--r--i386/i386/user_ldt.c27
-rw-r--r--i386/i386/user_ldt.h3
-rw-r--r--i386/i386/vm_param.h25
-rw-r--r--i386/i386/xen.h357
-rw-r--r--i386/i386at/conf.c21
-rw-r--r--i386/i386at/cons_conf.c8
-rw-r--r--i386/i386at/model_dep.c180
-rw-r--r--i386/intel/pmap.c388
-rw-r--r--i386/intel/pmap.h17
-rw-r--r--i386/xen/Makefrag.am33
-rw-r--r--i386/xen/xen.c77
-rw-r--r--i386/xen/xen_boothdr.S167
-rw-r--r--i386/xen/xen_locore.S110
-rw-r--r--include/mach/xen.h80
-rw-r--r--kern/bootstrap.c20
-rw-r--r--kern/debug.c6
-rw-r--r--linux/dev/include/asm-i386/segment.h5
-rw-r--r--xen/Makefrag.am32
-rw-r--r--xen/block.c689
-rw-r--r--xen/block.h24
-rw-r--r--xen/configfrag.ac44
-rw-r--r--xen/console.c234
-rw-r--r--xen/console.h33
-rw-r--r--xen/evt.c109
-rw-r--r--xen/evt.h29
-rw-r--r--xen/grant.c142
-rw-r--r--xen/grant.h33
-rw-r--r--xen/net.c665
-rw-r--r--xen/net.h24
-rw-r--r--xen/public/COPYING38
-rw-r--r--xen/public/arch-x86/xen-mca.h279
-rw-r--r--xen/public/arch-x86/xen-x86_32.h180
-rw-r--r--xen/public/arch-x86/xen-x86_64.h212
-rw-r--r--xen/public/arch-x86/xen.h204
-rw-r--r--xen/public/arch-x86_32.h27
-rw-r--r--xen/public/arch-x86_64.h27
-rw-r--r--xen/public/callback.h121
-rw-r--r--xen/public/dom0_ops.h120
-rw-r--r--xen/public/domctl.h680
-rw-r--r--xen/public/elfnote.h233
-rw-r--r--xen/public/elfstructs.h527
-rw-r--r--xen/public/event_channel.h264
-rw-r--r--xen/public/features.h83
-rw-r--r--xen/public/grant_table.h438
-rw-r--r--xen/public/io/blkif.h141
-rw-r--r--xen/public/io/console.h51
-rw-r--r--xen/public/io/fbif.h176
-rw-r--r--xen/public/io/fsif.h191
-rw-r--r--xen/public/io/kbdif.h132
-rw-r--r--xen/public/io/netif.h205
-rw-r--r--xen/public/io/pciif.h101
-rw-r--r--xen/public/io/protocols.h40
-rw-r--r--xen/public/io/ring.h307
-rw-r--r--xen/public/io/tpmif.h77
-rw-r--r--xen/public/io/xenbus.h80
-rw-r--r--xen/public/io/xs_wire.h132
-rw-r--r--xen/public/kexec.h189
-rw-r--r--xen/public/libelf.h265
-rw-r--r--xen/public/memory.h312
-rw-r--r--xen/public/nmi.h78
-rw-r--r--xen/public/physdev.h219
-rw-r--r--xen/public/platform.h346
-rw-r--r--xen/public/sched.h121
-rw-r--r--xen/public/sysctl.h308
-rw-r--r--xen/public/trace.h206
-rw-r--r--xen/public/vcpu.h213
-rw-r--r--xen/public/version.h91
-rw-r--r--xen/public/xen-compat.h44
-rw-r--r--xen/public/xen.h657
-rw-r--r--xen/public/xencomm.h41
-rw-r--r--xen/public/xenoprof.h138
-rw-r--r--xen/ring.c61
-rw-r--r--xen/ring.h34
-rw-r--r--xen/store.c334
-rw-r--r--xen/store.h54
-rw-r--r--xen/time.c148
-rw-r--r--xen/time.h25
-rw-r--r--xen/xen.c87
-rw-r--r--xen/xen.h27
104 files changed, 12952 insertions, 55 deletions
diff --git a/Makefile.am b/Makefile.am
index 25fd403..2907518 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -137,7 +137,7 @@ clib_routines := memcmp memcpy memmove memset bcopy bzero \
strchr strstr strsep strpbrk strtok \
htonl htons ntohl ntohs \
udivdi3 __udivdi3 \
- etext _edata end _end # actually ld magic, not libc.
+ _START _start etext _edata end _end # actually ld magic, not libc.
gnumach-undef: gnumach.$(OBJEXT)
$(NM) -u $< | sed 's/ *U *//' | sort -u > $@
MOSTLYCLEANFILES += gnumach-undef
diff --git a/Makefrag.am b/Makefrag.am
index 05aaeb4..0377e80 100644
--- a/Makefrag.am
+++ b/Makefrag.am
@@ -375,7 +375,8 @@ include_mach_HEADERS = \
include/mach/vm_param.h \
include/mach/vm_prot.h \
include/mach/vm_statistics.h \
- include/mach/inline.h
+ include/mach/inline.h \
+ include/mach/xen.h
# If we name this `*_execdir', Automake won't add it to `install-data'...
include_mach_eXecdir = $(includedir)/mach/exec
@@ -506,6 +507,11 @@ include linux/Makefrag.am
#
# Platform specific parts.
#
+
+# Xen.
+if PLATFORM_xen
+include xen/Makefrag.am
+endif
#
# Architecture specific parts.
diff --git a/configure.ac b/configure.ac
index 4e37590..fca0dd3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -36,19 +36,22 @@ dnl We require GNU make.
#
# Deduce the architecture we're building for.
#
+# TODO: Should we also support constructs like `i686_xen-pc-gnu' or
+# `i686-pc_xen-gnu'?
AC_CANONICAL_HOST
AC_ARG_ENABLE([platform],
AS_HELP_STRING([--enable-platform=PLATFORM], [specify the platform to build a
- kernel for. Defaults to `at' for `i?86'. No other possibilities.]),
+ kernel for. Defaults to `at' for `i?86'. The other possibility is
+ `xen'.]),
[host_platform=$enable_platform],
[host_platform=default])
[# Supported configurations.
case $host_platform:$host_cpu in
default:i?86)
host_platform=at;;
- at:i?86)
+ at:i?86 | xen:i?86)
:;;
*)]
AC_MSG_ERROR([unsupported combination of cpu type `$host_cpu' and platform
@@ -124,6 +127,9 @@ esac]
# PC AT.
# TODO. Currently handled in `i386/configfrag.ac'.
+# Xen.
+m4_include([xen/configfrag.ac])
+
# Machine-specific configuration.
# ix86.
diff --git a/device/cons.c b/device/cons.c
index fb96d69..e3e95ff 100644
--- a/device/cons.c
+++ b/device/cons.c
@@ -260,6 +260,15 @@ cnputc(c)
kmsg_putchar (c);
#endif
+#if defined(MACH_HYP) && 0
+ {
+ /* Also output on hypervisor's emergency console, for
+ * debugging */
+ unsigned char d = c;
+ hyp_console_write(&d, 1);
+ }
+#endif /* MACH_HYP */
+
if (cn_tab) {
(*cn_tab->cn_putc)(cn_tab->cn_dev, c);
if (c == '\n')
diff --git a/device/ds_routines.c b/device/ds_routines.c
index 5b8fb3e..5b57338 100644
--- a/device/ds_routines.c
+++ b/device/ds_routines.c
@@ -104,6 +104,10 @@ extern struct device_emulation_ops linux_pcmcia_emulation_ops;
#endif
#endif
#endif
+#ifdef MACH_HYP
+extern struct device_emulation_ops hyp_block_emulation_ops;
+extern struct device_emulation_ops hyp_net_emulation_ops;
+#endif
extern struct device_emulation_ops mach_device_emulation_ops;
/* List of emulations. */
@@ -118,6 +122,10 @@ static struct device_emulation_ops *emulation_list[] =
#endif
#endif
#endif
+#ifdef MACH_HYP
+ &hyp_block_emulation_ops,
+ &hyp_net_emulation_ops,
+#endif
&mach_device_emulation_ops,
};
diff --git a/doc/mach.texi b/doc/mach.texi
index bb451e5..858880a 100644
--- a/doc/mach.texi
+++ b/doc/mach.texi
@@ -527,8 +527,8 @@ unpageable memory footprint of the kernel. @xref{Kernel Debugger}.
@table @code
@item --enable-pae
@acronym{PAE, Physical Address Extension} feature (@samp{ix86}-only),
-which is available on modern @samp{ix86} processors; disabled by
-default.
+which is available on modern @samp{ix86} processors; on @samp{ix86-at} disabled
+by default, on @samp{ix86-xen} enabled by default.
@end table
@subsection Turning device drivers on or off
diff --git a/i386/Makefrag.am b/i386/Makefrag.am
index bad0ce9..876761c 100644
--- a/i386/Makefrag.am
+++ b/i386/Makefrag.am
@@ -19,33 +19,37 @@
libkernel_a_SOURCES += \
i386/i386at/autoconf.c \
+ i386/i386at/conf.c \
+ i386/i386at/cons_conf.c \
+ i386/i386at/idt.h \
+ i386/i386at/kd_event.c \
+ i386/i386at/kd_event.h \
+ i386/i386at/kd_queue.c \
+ i386/i386at/kd_queue.h \
+ i386/i386at/model_dep.c \
+ i386/include/mach/sa/stdarg.h
+
+if PLATFORM_at
+libkernel_a_SOURCES += \
i386/i386at/boothdr.S \
i386/i386at/com.c \
i386/i386at/comreg.h \
- i386/i386at/conf.c \
- i386/i386at/cons_conf.c \
i386/i386at/cram.h \
i386/i386at/disk.h \
i386/i386at/i8250.h \
- i386/i386at/idt.h \
i386/i386at/immc.c \
i386/i386at/int_init.c \
i386/i386at/interrupt.S \
i386/i386at/kd.c \
i386/i386at/kd.h \
- i386/i386at/kd_event.c \
- i386/i386at/kd_event.h \
i386/i386at/kd_mouse.c \
i386/i386at/kd_mouse.h \
- i386/i386at/kd_queue.c \
- i386/i386at/kd_queue.h \
i386/i386at/kdasm.S \
i386/i386at/kdsoft.h \
- i386/i386at/model_dep.c \
i386/i386at/pic_isa.c \
i386/i386at/rtc.c \
- i386/i386at/rtc.h \
- i386/include/mach/sa/stdarg.h
+ i386/i386at/rtc.h
+endif
#
# `lpr' device support.
@@ -80,11 +84,9 @@ libkernel_a_SOURCES += \
i386/i386/fpu.h \
i386/i386/gdt.c \
i386/i386/gdt.h \
- i386/i386/hardclock.c \
i386/i386/idt-gen.h \
i386/i386/idt.c \
i386/i386/idt_inittab.S \
- i386/i386/io_map.c \
i386/i386/io_perm.c \
i386/i386/io_perm.h \
i386/i386/ipl.h \
@@ -107,11 +109,7 @@ libkernel_a_SOURCES += \
i386/i386/pcb.c \
i386/i386/pcb.h \
i386/i386/phys.c \
- i386/i386/pic.c \
- i386/i386/pic.h \
i386/i386/pio.h \
- i386/i386/pit.c \
- i386/i386/pit.h \
i386/i386/pmap.h \
i386/i386/proc_reg.h \
i386/i386/sched_param.h \
@@ -139,6 +137,15 @@ libkernel_a_SOURCES += \
EXTRA_DIST += \
i386/i386/mach_i386.srv
+if PLATFORM_at
+libkernel_a_SOURCES += \
+ i386/i386/hardclock.c \
+ i386/i386/io_map.c \
+ i386/i386/pic.c \
+ i386/i386/pic.h \
+ i386/i386/pit.c \
+ i386/i386/pit.h
+endif
#
# KDB support.
@@ -225,3 +232,11 @@ EXTRA_DIST += \
# Instead of listing each file individually...
EXTRA_DIST += \
i386/include
+
+#
+# Platform specific parts.
+#
+
+if PLATFORM_xen
+include i386/xen/Makefrag.am
+endif
diff --git a/i386/configfrag.ac b/i386/configfrag.ac
index f95aa86..1132b69 100644
--- a/i386/configfrag.ac
+++ b/i386/configfrag.ac
@@ -51,6 +51,12 @@ case $host_platform:$host_cpu in
# i386/bogus/platforms.h]
AC_DEFINE([AT386], [1], [AT386])[;;
+ xen:i?86)
+ # TODO. That should probably not be needed.
+ ncom=1
+ # TODO. That should probably not be needed.
+ # i386/bogus/platforms.h]
+ AC_DEFINE([AT386], [1], [AT386])[;;
*)
:;;
esac]
@@ -105,9 +111,11 @@ if [ x"$enable_lpr" = xyes ]; then]
AC_ARG_ENABLE([pae],
- AS_HELP_STRING([--enable-pae], [PAE feature (ix86-only); disabled by
- default]))
+ AS_HELP_STRING([--enable-pae], [PAE support (ix86-only); on ix86-at disabled
+ by default, on ix86-xen enabled by default]))
[case $host_platform:$host_cpu in
+ xen:i?86)
+ enable_pae=${enable_pae-yes};;
*:i?86)
:;;
*)
diff --git a/i386/i386/debug_trace.S b/i386/i386/debug_trace.S
index e741516..f275e1b 100644
--- a/i386/i386/debug_trace.S
+++ b/i386/i386/debug_trace.S
@@ -24,6 +24,7 @@
#ifdef DEBUG
#include <mach/machine/asm.h>
+#include <i386/xen.h>
#include "debug.h"
diff --git a/i386/i386/fpu.c b/i386/i386/fpu.c
index 109d0d7..2a4b9c0 100644
--- a/i386/i386/fpu.c
+++ b/i386/i386/fpu.c
@@ -109,6 +109,10 @@ void
init_fpu()
{
unsigned short status, control;
+
+#ifdef MACH_HYP
+ clear_ts();
+#else /* MACH_HYP */
unsigned int native = 0;
if (machine_slot[cpu_number()].cpu_type >= CPU_TYPE_I486)
@@ -120,6 +124,7 @@ init_fpu()
* the control and status registers.
*/
set_cr0((get_cr0() & ~(CR0_EM|CR0_TS)) | native); /* allow use of FPU */
+#endif /* MACH_HYP */
fninit();
status = fnstsw();
@@ -153,8 +158,10 @@ init_fpu()
struct i386_xfp_save save;
unsigned long mask;
fp_kind = FP_387X;
+#ifndef MACH_HYP
printf("Enabling FXSR\n");
set_cr4(get_cr4() | CR4_OSFXSR);
+#endif /* MACH_HYP */
fxsave(&save);
mask = save.fp_mxcsr_mask;
if (!mask)
@@ -163,10 +170,14 @@ init_fpu()
} else
fp_kind = FP_387;
}
+#ifdef MACH_HYP
+ set_ts();
+#else /* MACH_HYP */
/*
* Trap wait instructions. Turn off FPU for now.
*/
set_cr0(get_cr0() | CR0_TS | CR0_MP);
+#endif /* MACH_HYP */
}
else {
/*
@@ -675,6 +686,7 @@ fpexterrflt()
/*NOTREACHED*/
}
+#ifndef MACH_XEN
/*
* FPU error. Called by AST.
*/
@@ -731,6 +743,7 @@ ASSERT_IPL(SPL0);
thread->pcb->ims.ifps->fp_save_state.fp_status);
/*NOTREACHED*/
}
+#endif /* MACH_XEN */
/*
* Save FPU state.
@@ -846,7 +859,7 @@ fp_state_alloc()
}
}
-#if AT386
+#if AT386 && !defined(MACH_XEN)
/*
* Handle a coprocessor error interrupt on the AT386.
* This comes in on line 5 of the slave PIC at SPL1.
diff --git a/i386/i386/gdt.c b/i386/i386/gdt.c
index 845e7c6..b5fb033 100644
--- a/i386/i386/gdt.c
+++ b/i386/i386/gdt.c
@@ -31,11 +31,18 @@
* Global descriptor table.
*/
#include <mach/machine/vm_types.h>
+#include <mach/xen.h>
+
+#include <intel/pmap.h>
#include "vm_param.h"
#include "seg.h"
#include "gdt.h"
+#ifdef MACH_XEN
+/* It is actually defined in xen_boothdr.S */
+extern
+#endif /* MACH_XEN */
struct real_descriptor gdt[GDTSZ];
void
@@ -50,11 +57,21 @@ gdt_init()
LINEAR_MIN_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS,
LINEAR_MAX_KERNEL_ADDRESS - (LINEAR_MIN_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) - 1,
ACC_PL_K|ACC_DATA_W, SZ_32);
+#ifndef MACH_HYP
fill_gdt_descriptor(LINEAR_DS,
0,
0xffffffff,
ACC_PL_K|ACC_DATA_W, SZ_32);
+#endif /* MACH_HYP */
+#ifdef MACH_XEN
+ unsigned long frame = kv_to_mfn(gdt);
+ pmap_set_page_readonly(gdt);
+ if (hyp_set_gdt(kv_to_la(&frame), GDTSZ))
+ panic("couldn't set gdt\n");
+ if (hyp_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments))
+ panic("couldn't set 4gb segments vm assist");
+#else /* MACH_XEN */
/* Load the new GDT. */
{
struct pseudo_descriptor pdesc;
@@ -63,6 +80,7 @@ gdt_init()
pdesc.linear_base = kvtolin(&gdt);
lgdt(&pdesc);
}
+#endif /* MACH_XEN */
/* Reload all the segment registers from the new GDT.
We must load ds and es with 0 before loading them with KERNEL_DS
@@ -79,5 +97,14 @@ gdt_init()
"movw %w1,%%es\n"
"movw %w1,%%ss\n"
: : "i" (KERNEL_CS), "r" (KERNEL_DS), "r" (0));
+#ifdef MACH_XEN
+#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+ /* things now get shifted */
+#ifdef MACH_PSEUDO_PHYS
+ pfn_list = (void*) pfn_list + VM_MIN_KERNEL_ADDRESS - LINEAR_MIN_KERNEL_ADDRESS;
+#endif /* MACH_PSEUDO_PHYS */
+ la_shift += LINEAR_MIN_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS;
+#endif
+#endif /* MACH_XEN */
}
diff --git a/i386/i386/gdt.h b/i386/i386/gdt.h
index 50e01e6..41ace79 100644
--- a/i386/i386/gdt.h
+++ b/i386/i386/gdt.h
@@ -40,12 +40,16 @@
*/
#define KERNEL_CS (0x08 | KERNEL_RING) /* kernel code */
#define KERNEL_DS (0x10 | KERNEL_RING) /* kernel data */
+#ifndef MACH_XEN
#define KERNEL_LDT 0x18 /* master LDT */
+#endif /* MACH_XEN */
#define KERNEL_TSS 0x20 /* master TSS (uniprocessor) */
#define USER_LDT 0x28 /* place for per-thread LDT */
#define USER_TSS 0x30 /* place for per-thread TSS
that holds IO bitmap */
+#ifndef MACH_HYP
#define LINEAR_DS 0x38 /* linear mapping */
+#endif /* MACH_HYP */
/* 0x40 was USER_FPREGS, now free */
#define USER_GDT 0x48 /* user-defined GDT entries */
diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 868bf09..b1670e8 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -45,6 +45,7 @@
#include <i386/gdt.h>
#include <i386/ldt.h>
#include <i386/mp_desc.h>
+#include <i386/xen.h>
offset thread th pcb
@@ -90,6 +91,9 @@ expr VM_MIN_ADDRESS
expr VM_MAX_ADDRESS
expr VM_MIN_KERNEL_ADDRESS KERNELBASE
expr KERNEL_STACK_SIZE
+#if VM_MIN_KERNEL_ADDRESS == LINEAR_MIN_KERNEL_ADDRESS
+expr PFN_LIST pfn_list
+#endif
#if PAE
expr PDPSHIFT
@@ -117,7 +121,9 @@ expr KERNEL_RING
expr KERNEL_CS
expr KERNEL_DS
expr KERNEL_TSS
+#ifndef MACH_XEN
expr KERNEL_LDT
+#endif /* MACH_XEN */
expr (VM_MIN_KERNEL_ADDRESS>>PDESHIFT)*sizeof(pt_entry_t) KERNELBASEPDE
@@ -135,3 +141,12 @@ expr TIMER_HIGH_UNIT
offset thread th system_timer
offset thread th user_timer
#endif
+
+#ifdef MACH_XEN
+offset shared_info si vcpu_info[0].evtchn_upcall_mask CPU_CLI
+offset shared_info si vcpu_info[0].evtchn_upcall_pending CPU_PENDING
+offset shared_info si vcpu_info[0].evtchn_pending_sel CPU_PENDING_SEL
+offset shared_info si evtchn_pending PENDING
+offset shared_info si evtchn_mask EVTMASK
+offset shared_info si vcpu_info[0].arch.cr2 CR2
+#endif /* MACH_XEN */
diff --git a/i386/i386/idt.c b/i386/i386/idt.c
index 1a8f917..b5e3d08 100644
--- a/i386/i386/idt.c
+++ b/i386/i386/idt.c
@@ -38,6 +38,10 @@ extern struct idt_init_entry idt_inittab[];
void idt_init()
{
+#ifdef MACH_HYP
+ if (hyp_set_trap_table(kvtolin(idt_inittab)))
+ panic("couldn't set trap table\n");
+#else /* MACH_HYP */
struct idt_init_entry *iie = idt_inittab;
/* Initialize the exception vectors from the idt_inittab. */
@@ -55,5 +59,6 @@ void idt_init()
pdesc.linear_base = kvtolin(&idt);
lidt(&pdesc);
}
+#endif /* MACH_HYP */
}
diff --git a/i386/i386/idt_inittab.S b/i386/i386/idt_inittab.S
index 7718568..4dcad8d 100644
--- a/i386/i386/idt_inittab.S
+++ b/i386/i386/idt_inittab.S
@@ -25,7 +25,8 @@
*/
#include <mach/machine/asm.h>
-#include "seg.h"
+#include <i386/seg.h>
+#include <i386/i386asm.h>
/* We'll be using macros to fill in a table in data hunk 2
@@ -38,12 +39,22 @@ ENTRY(idt_inittab)
/*
* Interrupt descriptor table and code vectors for it.
*/
+#ifdef MACH_XEN
+#define IDT_ENTRY(n,entry,type) \
+ .data 2 ;\
+ .byte n ;\
+ .byte (((type)&ACC_PL)>>5)|((((type)&(ACC_TYPE|ACC_A))==ACC_INTR_GATE)<<2) ;\
+ .word KERNEL_CS ;\
+ .long entry ;\
+ .text
+#else /* MACH_XEN */
#define IDT_ENTRY(n,entry,type) \
.data 2 ;\
.long entry ;\
.word n ;\
.word type ;\
.text
+#endif /* MACH_XEN */
/*
* No error code. Clear error code and push trap number.
@@ -118,4 +129,7 @@ EXCEPTION(0x1f,t_trap_1f)
/* Terminator */
.data 2
.long 0
+#ifdef MACH_XEN
+ .long 0
+#endif /* MACH_XEN */
diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c
index 03d9a04..66432f3 100644
--- a/i386/i386/ktss.c
+++ b/i386/i386/ktss.c
@@ -45,6 +45,12 @@ ktss_init()
/* XXX temporary exception stack */
static int exception_stack[1024];
+#ifdef MACH_XEN
+ /* Xen won't allow us to do any I/O by default anyway, just register
+ * exception stack */
+ if (hyp_stack_switch(KERNEL_DS, (unsigned)(exception_stack+1024)))
+ panic("couldn't register exception stack\n");
+#else /* MACH_XEN */
/* Initialize the master TSS descriptor. */
fill_gdt_descriptor(KERNEL_TSS,
kvtolin(&ktss), sizeof(struct task_tss) - 1,
@@ -59,5 +65,6 @@ ktss_init()
/* Load the TSS. */
ltr(KERNEL_TSS);
+#endif /* MACH_XEN */
}
diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index 7299377..0ef7a8c 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -28,6 +28,9 @@
* same LDT.
*/
#include <mach/machine/vm_types.h>
+#include <mach/xen.h>
+
+#include <intel/pmap.h>
#include "vm_param.h"
#include "seg.h"
@@ -36,15 +39,23 @@
extern int syscall();
+#ifdef MACH_XEN
+/* It is actually defined in xen_boothdr.S */
+extern
+#endif /* MACH_XEN */
struct real_descriptor ldt[LDTSZ];
void
ldt_init()
{
+#ifdef MACH_XEN
+ pmap_set_page_readwrite(ldt);
+#else /* MACH_XEN */
/* Initialize the master LDT descriptor in the GDT. */
fill_gdt_descriptor(KERNEL_LDT,
kvtolin(&ldt), sizeof(ldt)-1,
ACC_PL_K|ACC_LDT, 0);
+#endif /* MACH_XEN */
/* Initialize the LDT descriptors. */
fill_ldt_gate(USER_SCALL,
@@ -61,5 +72,9 @@ ldt_init()
ACC_PL_U|ACC_DATA_W, SZ_32);
/* Activate the LDT. */
+#ifdef MACH_HYP
+ hyp_set_ldt(&ldt, LDTSZ);
+#else /* MACH_HYP */
lldt(KERNEL_LDT);
+#endif /* MACH_HYP */
}
diff --git a/i386/i386/locore.S b/i386/i386/locore.S
index 13a44d9..663db43 100644
--- a/i386/i386/locore.S
+++ b/i386/i386/locore.S
@@ -36,6 +36,7 @@
#include <i386/ldt.h>
#include <i386/i386asm.h>
#include <i386/cpu_number.h>
+#include <i386/xen.h>
/*
* Fault recovery.
@@ -323,8 +324,9 @@ ENTRY(t_segnp)
trap_check_kernel_exit:
testl $(EFL_VM),16(%esp) /* is trap from V86 mode? */
jnz EXT(alltraps) /* isn`t kernel trap if so */
- testl $3,12(%esp) /* is trap from kernel mode? */
- jne EXT(alltraps) /* if so: */
+ /* Note: handling KERNEL_RING value by hand */
+ testl $2,12(%esp) /* is trap from kernel mode? */
+ jnz EXT(alltraps) /* if so: */
/* check for the kernel exit sequence */
cmpl $_kret_iret,8(%esp) /* on IRET? */
je fault_iret
@@ -410,7 +412,8 @@ push_segregs:
ENTRY(t_debug)
testl $(EFL_VM),8(%esp) /* is trap from V86 mode? */
jnz 0f /* isn`t kernel trap if so */
- testl $3,4(%esp) /* is trap from kernel mode? */
+ /* Note: handling KERNEL_RING value by hand */
+ testl $2,4(%esp) /* is trap from kernel mode? */
jnz 0f /* if so: */
cmpl $syscall_entry,(%esp) /* system call entry? */
jne 0f /* if so: */
@@ -429,7 +432,11 @@ ENTRY(t_debug)
ENTRY(t_page_fault)
pushl $(T_PAGE_FAULT) /* mark a page fault trap */
pusha /* save the general registers */
+#ifdef MACH_XEN
+ movl %ss:hyp_shared_info+CR2,%eax
+#else /* MACH_XEN */
movl %cr2,%eax /* get the faulting address */
+#endif /* MACH_XEN */
movl %eax,12(%esp) /* save in esp save slot */
jmp trap_push_segs /* continue fault */
@@ -465,7 +472,8 @@ trap_set_segs:
cld /* clear direction flag */
testl $(EFL_VM),R_EFLAGS(%esp) /* in V86 mode? */
jnz trap_from_user /* user mode trap if so */
- testb $3,R_CS(%esp) /* user mode trap? */
+ /* Note: handling KERNEL_RING value by hand */
+ testb $2,R_CS(%esp) /* user mode trap? */
jz trap_from_kernel /* kernel trap if not */
trap_from_user:
@@ -679,7 +687,8 @@ LEXT(return_to_iret) /* ( label for kdb_kintr and hardclock) */
testl $(EFL_VM),I_EFL(%esp) /* if in V86 */
jnz 0f /* or */
- testb $3,I_CS(%esp) /* user mode, */
+ /* Note: handling KERNEL_RING value by hand */
+ testb $2,I_CS(%esp) /* user mode, */
jz 1f /* check for ASTs */
0:
cmpl $0,CX(EXT(need_ast),%edx)
@@ -1156,9 +1165,14 @@ ENTRY(discover_x86_cpu_type)
movl %esp,%ebp /* Save stack pointer */
and $~0x3,%esp /* Align stack pointer */
+#ifdef MACH_HYP
+#warning Assuming not Cyrix CPU
+#else /* MACH_HYP */
inb $0xe8,%al /* Enable ID flag for Cyrix CPU ... */
andb $0x80,%al /* ... in CCR4 reg bit7 */
outb %al,$0xe8
+#endif /* MACH_HYP */
+
pushfl /* Fetch flags ... */
popl %eax /* ... into eax */
movl %eax,%ecx /* Save original flags for return */
@@ -1266,13 +1280,24 @@ Entry(copyoutmsg)
* XXX only have to do this on 386's.
*/
copyout_retry:
+#ifdef MACH_HYP
+ movl cr3,%ecx /* point to page directory */
+#else /* MACH_HYP */
movl %cr3,%ecx /* point to page directory */
+#endif /* MACH_HYP */
#if PAE
movl %edi,%eax /* get page directory pointer bits */
shrl $(PDPSHIFT),%eax /* from user address */
movl KERNELBASE(%ecx,%eax,PTE_SIZE),%ecx
/* get page directory pointer */
+#ifdef MACH_PSEUDO_PHYS
+ shrl $(PTESHIFT),%ecx
+ movl pfn_list,%eax
+ movl (%eax,%ecx,4),%ecx /* mfn_to_pfn */
+ shll $(PTESHIFT),%ecx
+#else /* MACH_PSEUDO_PHYS */
andl $(PTE_PFN),%ecx /* isolate page frame address */
+#endif /* MACH_PSEUDO_PHYS */
#endif /* PAE */
movl %edi,%eax /* get page directory bits */
shrl $(PDESHIFT),%eax /* from user address */
@@ -1283,7 +1308,14 @@ copyout_retry:
/* get page directory pointer */
testl $(PTE_V),%ecx /* present? */
jz 0f /* if not, fault is OK */
+#ifdef MACH_PSEUDO_PHYS
+ shrl $(PTESHIFT),%ecx
+ movl pfn_list,%eax
+ movl (%eax,%ecx,4),%ecx /* mfn_to_pfn */
+ shll $(PTESHIFT),%ecx
+#else /* MACH_PSEUDO_PHYS */
andl $(PTE_PFN),%ecx /* isolate page frame address */
+#endif /* MACH_PSEUDO_PHYS */
movl %edi,%eax /* get page table bits */
shrl $(PTESHIFT),%eax
andl $(PTEMASK),%eax /* from user address */
@@ -1297,9 +1329,17 @@ copyout_retry:
/*
* Not writable - must fake a fault. Turn off access to the page.
*/
+#ifdef MACH_HYP
+ pushl %edx
+ pushl %ecx
+ call hyp_invalidate_pte
+ popl %ecx
+ popl %edx
+#else /* MACH_HYP */
andl $(PTE_INVALID),(%ecx) /* turn off valid bit */
movl %cr3,%eax /* invalidate TLB */
movl %eax,%cr3
+#endif /* MACH_HYP */
0:
/*
diff --git a/i386/i386/mp_desc.c b/i386/i386/mp_desc.c
index 54660d5..2fd5ec2 100644
--- a/i386/i386/mp_desc.c
+++ b/i386/i386/mp_desc.c
@@ -31,6 +31,7 @@
#include <kern/cpu_number.h>
#include <kern/debug.h>
#include <mach/machine.h>
+#include <mach/xen.h>
#include <vm/vm_kern.h>
#include <i386/mp_desc.h>
@@ -149,6 +150,9 @@ mp_desc_init(mycpu)
* Fix up the entries in the GDT to point to
* this LDT and this TSS.
*/
+#ifdef MACH_HYP
+ panic("TODO %s:%d\n",__FILE__,__LINE__);
+#else /* MACH_HYP */
fill_descriptor(&mpt->gdt[sel_idx(KERNEL_LDT)],
(unsigned)&mpt->ldt,
LDTSZ * sizeof(struct real_descriptor) - 1,
@@ -161,6 +165,7 @@ mp_desc_init(mycpu)
mpt->ktss.tss.ss0 = KERNEL_DS;
mpt->ktss.tss.io_bit_map_offset = IOPB_INVAL;
mpt->ktss.barrier = 0xFF;
+#endif /* MACH_HYP */
return mpt;
}
diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 3226195..b9c52dd 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -31,6 +31,7 @@
#include <mach/kern_return.h>
#include <mach/thread_status.h>
#include <mach/exec/exec.h>
+#include <mach/xen.h>
#include "vm_param.h"
#include <kern/counters.h>
@@ -152,7 +153,12 @@ void switch_ktss(pcb)
? (int) (&pcb->iss + 1)
: (int) (&pcb->iss.v86_segs);
+#ifdef MACH_XEN
+ /* No IO mask here */
+ hyp_stack_switch(KERNEL_DS, pcb_stack_top);
+#else /* MACH_XEN */
curr_ktss(mycpu)->tss.esp0 = pcb_stack_top;
+#endif /* MACH_XEN */
}
{
@@ -164,22 +170,47 @@ void switch_ktss(pcb)
/*
* Use system LDT.
*/
+#ifdef MACH_HYP
+ hyp_set_ldt(&ldt, LDTSZ);
+#else /* MACH_HYP */
set_ldt(KERNEL_LDT);
+#endif /* MACH_HYP */
}
else {
/*
* Thread has its own LDT.
*/
+#ifdef MACH_HYP
+ hyp_set_ldt(tldt->ldt,
+ (tldt->desc.limit_low|(tldt->desc.limit_high<<16)) /
+ sizeof(struct real_descriptor));
+#else /* MACH_HYP */
*gdt_desc_p(mycpu,USER_LDT) = tldt->desc;
set_ldt(USER_LDT);
+#endif /* MACH_HYP */
}
}
+#ifdef MACH_XEN
+ {
+ int i;
+ for (i=0; i < USER_GDT_SLOTS; i++) {
+ if (memcmp(gdt_desc_p (mycpu, USER_GDT + (i << 3)),
+ &pcb->ims.user_gdt[i], sizeof pcb->ims.user_gdt[i])) {
+ if (hyp_do_update_descriptor(kv_to_ma(gdt_desc_p (mycpu, USER_GDT + (i << 3))),
+ *(unsigned long long *) &pcb->ims.user_gdt[i]))
+ panic("couldn't set user gdt %d\n",i);
+ }
+ }
+ }
+#else /* MACH_XEN */
+
/* Copy in the per-thread GDT slots. No reloading is necessary
because just restoring the segment registers on the way back to
user mode reloads the shadow registers from the in-memory GDT. */
memcpy (gdt_desc_p (mycpu, USER_GDT),
pcb->ims.user_gdt, sizeof pcb->ims.user_gdt);
+#endif /* MACH_XEN */
/*
* Load the floating-point context, if necessary.
diff --git a/i386/i386/phys.c b/i386/i386/phys.c
index 2c30f17..925593b 100644
--- a/i386/i386/phys.c
+++ b/i386/i386/phys.c
@@ -27,6 +27,7 @@
#include <string.h>
#include <mach/boolean.h>
+#include <mach/xen.h>
#include <kern/task.h>
#include <kern/thread.h>
#include <vm/vm_map.h>
@@ -104,5 +105,9 @@ vm_offset_t addr;
if ((pte = pmap_pte(kernel_pmap, addr)) == PT_ENTRY_NULL)
return 0;
- return i386_trunc_page(*pte) | (addr & INTEL_OFFMASK);
+ return i386_trunc_page(
+#ifdef MACH_PSEUDO_PHYS
+ ma_to_pa
+#endif /* MACH_PSEUDO_PHYS */
+ (*pte)) | (addr & INTEL_OFFMASK);
}
diff --git a/i386/i386/proc_reg.h b/i386/i386/proc_reg.h
index d9f32bc..64d8c43 100644
--- a/i386/i386/proc_reg.h
+++ b/i386/i386/proc_reg.h
@@ -72,8 +72,10 @@
#ifndef __ASSEMBLER__
#ifdef __GNUC__
+#ifndef MACH_HYP
#include <i386/gdt.h>
#include <i386/ldt.h>
+#endif /* MACH_HYP */
static inline unsigned
get_eflags(void)
@@ -122,6 +124,16 @@ set_eflags(unsigned eflags)
_temp__; \
})
+#ifdef MACH_HYP
+extern unsigned long cr3;
+#define get_cr3() (cr3)
+#define set_cr3(value) \
+ ({ \
+ cr3 = (value); \
+ if (!hyp_set_cr3(value)) \
+ panic("set_cr3"); \
+ })
+#else /* MACH_HYP */
#define get_cr3() \
({ \
register unsigned int _temp__; \
@@ -134,9 +146,11 @@ set_eflags(unsigned eflags)
register unsigned int _temp__ = (value); \
asm volatile("mov %0, %%cr3" : : "r" (_temp__)); \
})
+#endif /* MACH_HYP */
#define flush_tlb() set_cr3(get_cr3())
+#ifndef MACH_HYP
#define invlpg(addr) \
({ \
asm volatile("invlpg (%0)" : : "r" (addr)); \
@@ -164,6 +178,7 @@ set_eflags(unsigned eflags)
: "+r" (var) : "r" (end), \
"q" (LINEAR_DS), "q" (KERNEL_DS), "i" (PAGE_SIZE)); \
})
+#endif /* MACH_HYP */
#define get_cr4() \
({ \
@@ -179,11 +194,18 @@ set_eflags(unsigned eflags)
})
+#ifdef MACH_HYP
+#define set_ts() \
+ hyp_fpu_taskswitch(1)
+#define clear_ts() \
+ hyp_fpu_taskswitch(0)
+#else /* MACH_HYP */
#define set_ts() \
set_cr0(get_cr0() | CR0_TS)
#define clear_ts() \
asm volatile("clts")
+#endif /* MACH_HYP */
#define get_tr() \
({ \
diff --git a/i386/i386/seg.h b/i386/i386/seg.h
index 9a09af5..01b1a2e 100644
--- a/i386/i386/seg.h
+++ b/i386/i386/seg.h
@@ -37,7 +37,12 @@
* i386 segmentation.
*/
+/* Note: the value of KERNEL_RING is handled by hand in locore.S */
+#ifdef MACH_HYP
+#define KERNEL_RING 1
+#else /* MACH_HYP */
#define KERNEL_RING 0
+#endif /* MACH_HYP */
#ifndef __ASSEMBLER__
@@ -118,6 +123,7 @@ struct real_gate {
#ifndef __ASSEMBLER__
#include <mach/inline.h>
+#include <mach/xen.h>
/* Format of a "pseudo-descriptor", used for loading the IDT and GDT. */
@@ -152,9 +158,15 @@ MACH_INLINE void lldt(unsigned short ldt_selector)
/* Fill a segment descriptor. */
MACH_INLINE void
-fill_descriptor(struct real_descriptor *desc, unsigned base, unsigned limit,
+fill_descriptor(struct real_descriptor *_desc, unsigned base, unsigned limit,
unsigned char access, unsigned char sizebits)
{
+ /* TODO: when !MACH_XEN, setting desc and just memcpy isn't simpler actually */
+#ifdef MACH_XEN
+ struct real_descriptor __desc, *desc = &__desc;
+#else /* MACH_XEN */
+ struct real_descriptor *desc = _desc;
+#endif /* MACH_XEN */
if (limit > 0xfffff)
{
limit >>= 12;
@@ -167,6 +179,10 @@ fill_descriptor(struct real_descriptor *desc, unsigned base, unsigned limit,
desc->limit_high = limit >> 16;
desc->granularity = sizebits;
desc->base_high = base >> 24;
+#ifdef MACH_XEN
+ if (hyp_do_update_descriptor(kv_to_ma(_desc), *(unsigned long long*)desc))
+ panic("couldn't update descriptor(%p to %08lx%08lx)\n", kv_to_ma(_desc), *(((unsigned long*)desc)+1), *(unsigned long *)desc);
+#endif /* MACH_XEN */
}
/* Fill a gate with particular values. */
diff --git a/i386/i386/spl.S b/i386/i386/spl.S
index f77b556..f1d4b45 100644
--- a/i386/i386/spl.S
+++ b/i386/i386/spl.S
@@ -20,6 +20,8 @@
#include <mach/machine/asm.h>
#include <i386/ipl.h>
#include <i386/pic.h>
+#include <i386/i386asm.h>
+#include <i386/xen.h>
/*
* Set IPL to the specified value.
@@ -42,6 +44,7 @@
/*
* Program PICs with mask in %eax.
*/
+#ifndef MACH_XEN
#define SETMASK() \
cmpl EXT(curr_pic_mask),%eax; \
je 9f; \
@@ -50,6 +53,21 @@
movb %ah,%al; \
outb %al,$(PIC_SLAVE_OCW); \
9:
+#else /* MACH_XEN */
+#define pic_mask int_mask
+#define SETMASK() \
+ pushl %ebx; \
+ movl %eax,%ebx; \
+ xchgl %eax,hyp_shared_info+EVTMASK; \
+ notl %ebx; \
+ andl %eax,%ebx; /* Get unmasked events */ \
+ testl hyp_shared_info+PENDING, %ebx; \
+ popl %ebx; \
+ jz 9f; /* Check whether there was some pending */ \
+lock orl $1,hyp_shared_info+CPU_PENDING_SEL; /* Yes, activate it */ \
+ movb $1,hyp_shared_info+CPU_PENDING; \
+9:
+#endif /* MACH_XEN */
ENTRY(spl0)
movl EXT(curr_ipl),%eax /* save current ipl */
diff --git a/i386/i386/trap.c b/i386/i386/trap.c
index 4361fcd..28a9e0c 100644
--- a/i386/i386/trap.c
+++ b/i386/i386/trap.c
@@ -585,6 +585,7 @@ i386_astintr()
int mycpu = cpu_number();
(void) splsched(); /* block interrupts to check reasons */
+#ifndef MACH_XEN
if (need_ast[mycpu] & AST_I386_FP) {
/*
* AST was for delayed floating-point exception -
@@ -596,7 +597,9 @@ i386_astintr()
fpastintr();
}
- else {
+ else
+#endif /* MACH_XEN */
+ {
/*
* Not an FPU trap. Handle the AST.
* Interrupts are still blocked.
diff --git a/i386/i386/user_ldt.c b/i386/i386/user_ldt.c
index 942ad07..dfe6b1e 100644
--- a/i386/i386/user_ldt.c
+++ b/i386/i386/user_ldt.c
@@ -39,6 +39,7 @@
#include <i386/seg.h>
#include <i386/thread.h>
#include <i386/user_ldt.h>
+#include <stddef.h>
#include "ldt.h"
#include "vm_param.h"
@@ -195,9 +196,17 @@ i386_set_ldt(thread, first_selector, desc_list, count, desc_list_inline)
if (new_ldt == 0) {
simple_unlock(&pcb->lock);
+#ifdef MACH_XEN
+ /* LDT needs to be aligned on a page */
+ vm_offset_t alloc = kalloc(ldt_size_needed + PAGE_SIZE + offsetof(struct user_ldt, ldt));
+ new_ldt = (user_ldt_t) (round_page((alloc + offsetof(struct user_ldt, ldt))) - offsetof(struct user_ldt, ldt));
+ new_ldt->alloc = alloc;
+
+#else /* MACH_XEN */
new_ldt = (user_ldt_t)
kalloc(ldt_size_needed
+ sizeof(struct real_descriptor));
+#endif /* MACH_XEN */
/*
* Build a descriptor that describes the
* LDT itself
@@ -263,9 +272,19 @@ i386_set_ldt(thread, first_selector, desc_list, count, desc_list_inline)
simple_unlock(&pcb->lock);
if (new_ldt)
+#ifdef MACH_XEN
+ {
+ int i;
+ for (i=0; i<(new_ldt->desc.limit_low + 1)/sizeof(struct real_descriptor); i+=PAGE_SIZE/sizeof(struct real_descriptor))
+ pmap_set_page_readwrite(&new_ldt->ldt[i]);
+ kfree(new_ldt->alloc, new_ldt->desc.limit_low + 1
+ + PAGE_SIZE + offsetof(struct user_ldt, ldt));
+ }
+#else /* MACH_XEN */
kfree((vm_offset_t)new_ldt,
new_ldt->desc.limit_low + 1
+ sizeof(struct real_descriptor));
+#endif /* MACH_XEN */
/*
* Free the descriptor list, if it was
@@ -398,9 +417,17 @@ void
user_ldt_free(user_ldt)
user_ldt_t user_ldt;
{
+#ifdef MACH_XEN
+ int i;
+ for (i=0; i<(user_ldt->desc.limit_low + 1)/sizeof(struct real_descriptor); i+=PAGE_SIZE/sizeof(struct real_descriptor))
+ pmap_set_page_readwrite(&user_ldt->ldt[i]);
+ kfree(user_ldt->alloc, user_ldt->desc.limit_low + 1
+ + PAGE_SIZE + offsetof(struct user_ldt, ldt));
+#else /* MACH_XEN */
kfree((vm_offset_t)user_ldt,
user_ldt->desc.limit_low + 1
+ sizeof(struct real_descriptor));
+#endif /* MACH_XEN */
}
diff --git a/i386/i386/user_ldt.h b/i386/i386/user_ldt.h
index dd3ad4b..8d16ed8 100644
--- a/i386/i386/user_ldt.h
+++ b/i386/i386/user_ldt.h
@@ -36,6 +36,9 @@
#include <i386/seg.h>
struct user_ldt {
+#ifdef MACH_XEN
+ vm_offset_t alloc; /* allocation before alignment */
+#endif /* MACH_XEN */
struct real_descriptor desc; /* descriptor for self */
struct real_descriptor ldt[1]; /* descriptor table (variable) */
};
diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h
index 8e92e79..95df604 100644
--- a/i386/i386/vm_param.h
+++ b/i386/i386/vm_param.h
@@ -25,10 +25,25 @@
/* XXX use xu/vm_param.h */
#include <mach/vm_param.h>
+#include <xen/public/xen.h>
/* The kernel address space is 1GB, starting at virtual address 0. */
-#define VM_MIN_KERNEL_ADDRESS (0x00000000)
-#define VM_MAX_KERNEL_ADDRESS ((LINEAR_MAX_KERNEL_ADDRESS - LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS))
+#ifdef MACH_XEN
+#define VM_MIN_KERNEL_ADDRESS 0x20000000UL
+#else /* MACH_XEN */
+#define VM_MIN_KERNEL_ADDRESS 0x00000000UL
+#endif /* MACH_XEN */
+
+#ifdef MACH_XEN
+#if PAE
+#define HYP_VIRT_START HYPERVISOR_VIRT_START_PAE
+#else /* PAE */
+#define HYP_VIRT_START HYPERVISOR_VIRT_START_NONPAE
+#endif /* PAE */
+#define VM_MAX_KERNEL_ADDRESS (HYP_VIRT_START - LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS)
+#else /* MACH_XEN */
+#define VM_MAX_KERNEL_ADDRESS (LINEAR_MAX_KERNEL_ADDRESS - LINEAR_MIN_KERNEL_ADDRESS + VM_MIN_KERNEL_ADDRESS)
+#endif /* MACH_XEN */
/* The kernel virtual address space is actually located
at high linear addresses.
@@ -36,8 +51,14 @@
#define LINEAR_MIN_KERNEL_ADDRESS (VM_MAX_ADDRESS)
#define LINEAR_MAX_KERNEL_ADDRESS (0xffffffffUL)
+#ifdef MACH_XEN
+/* need room for mmu updates (2*8bytes) */
+#define KERNEL_STACK_SIZE (4*I386_PGBYTES)
+#define INTSTACK_SIZE (4*I386_PGBYTES)
+#else /* MACH_XEN */
#define KERNEL_STACK_SIZE (1*I386_PGBYTES)
#define INTSTACK_SIZE (1*I386_PGBYTES)
+#endif /* MACH_XEN */
/* interrupt stack size */
/*
diff --git a/i386/i386/xen.h b/i386/i386/xen.h
new file mode 100644
index 0000000..a7fb641
--- /dev/null
+++ b/i386/i386/xen.h
@@ -0,0 +1,357 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_HYPCALL_H
+#define XEN_HYPCALL_H
+
+#ifdef MACH_XEN
+#ifndef __ASSEMBLER__
+#include <kern/printf.h>
+#include <mach/machine/vm_types.h>
+#include <mach/vm_param.h>
+#include <mach/inline.h>
+#include <machine/vm_param.h>
+#include <intel/pmap.h>
+#include <kern/debug.h>
+#include <xen/public/xen.h>
+
+/* TODO: this should be moved in appropriate non-Xen place. */
+#define barrier() __asm__ __volatile__ ("": : :"memory")
+#define mb() __asm__ __volatile__("lock; addl $0,0(%esp)")
+#define rmb() mb()
+#define wmb() mb()
+MACH_INLINE unsigned long xchgl(volatile unsigned long *ptr, unsigned long x)
+{
+ __asm__ __volatile__("xchgl %0, %1"
+ : "=r" (x)
+ : "m" (*(ptr)), "0" (x): "memory");
+ return x;
+}
+#define _TOSTR(x) #x
+#define TOSTR(x) _TOSTR (x)
+
+
+
+/* x86-specific hypercall interface. */
+#define _hypcall0(type, name) \
+MACH_INLINE type hyp_##name(void) \
+{ \
+ long __ret; \
+ asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \
+ : "=a" (__ret) \
+ : : "memory"); \
+ return __ret; \
+}
+
+#define _hypcall1(type, name, type1, arg1) \
+MACH_INLINE type hyp_##name(type1 arg1) \
+{ \
+ long __ret; \
+ long foo1; \
+ asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \
+ : "=a" (__ret), \
+ "=b" (foo1) \
+ : "1" ((long)arg1) \
+ : "memory"); \
+ return __ret; \
+}
+
+#define _hypcall2(type, name, type1, arg1, type2, arg2) \
+MACH_INLINE type hyp_##name(type1 arg1, type2 arg2) \
+{ \
+ long __ret; \
+ long foo1, foo2; \
+ asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \
+ : "=a" (__ret), \
+ "=b" (foo1), \
+ "=c" (foo2) \
+ : "1" ((long)arg1), \
+ "2" ((long)arg2) \
+ : "memory"); \
+ return __ret; \
+}
+
+#define _hypcall3(type, name, type1, arg1, type2, arg2, type3, arg3) \
+MACH_INLINE type hyp_##name(type1 arg1, type2 arg2, type3 arg3) \
+{ \
+ long __ret; \
+ long foo1, foo2, foo3; \
+ asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \
+ : "=a" (__ret), \
+ "=b" (foo1), \
+ "=c" (foo2), \
+ "=d" (foo3) \
+ : "1" ((long)arg1), \
+ "2" ((long)arg2), \
+ "3" ((long)arg3) \
+ : "memory"); \
+ return __ret; \
+}
+
+#define _hypcall4(type, name, type1, arg1, type2, arg2, type3, arg3, type4, arg4) \
+MACH_INLINE type hyp_##name(type1 arg1, type2 arg2, type3 arg3, type4 arg4) \
+{ \
+ long __ret; \
+ long foo1, foo2, foo3, foo4; \
+ asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \
+ : "=a" (__ret), \
+ "=b" (foo1), \
+ "=c" (foo2), \
+ "=d" (foo3), \
+ "=S" (foo4) \
+ : "1" ((long)arg1), \
+ "2" ((long)arg2), \
+ "3" ((long)arg3), \
+ "4" ((long)arg4) \
+ : "memory"); \
+ return __ret; \
+}
+
+#define _hypcall5(type, name, type1, arg1, type2, arg2, type3, arg3, type4, arg4, type5, arg5) \
+MACH_INLINE type hyp_##name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, type5 arg5) \
+{ \
+ long __ret; \
+ long foo1, foo2, foo3, foo4, foo5; \
+ asm volatile ("call hypcalls+("TOSTR(__HYPERVISOR_##name)"*32)" \
+ : "=a" (__ret), \
+ "=b" (foo1), \
+ "=c" (foo2), \
+ "=d" (foo3), \
+ "=S" (foo4), \
+ "=D" (foo5) \
+ : "1" ((long)arg1), \
+ "2" ((long)arg2), \
+ "3" ((long)arg3), \
+ "4" ((long)arg4), \
+ "5" ((long)arg5) \
+ : "memory"); \
+ return __ret; \
+}
+
+/* x86 Hypercalls */
+
+/* Note: since Hypervisor uses flat memory model, remember to always use
+ * kvtolin when giving pointers as parameters for the hypercall to read data
+ * at. Use kv_to_la when they may be used before GDT got set up. */
+
+_hypcall1(long, set_trap_table, vm_offset_t /* struct trap_info * */, traps);
+
+_hypcall4(int, mmu_update, vm_offset_t /* struct mmu_update * */, req, int, count, vm_offset_t /* int * */, success_count, domid_t, domid)
+MACH_INLINE int hyp_mmu_update_pte(unsigned long pte, unsigned long long val)
+{
+ struct mmu_update update =
+ {
+ .ptr = pte,
+ .val = val,
+ };
+ int count;
+ hyp_mmu_update(kv_to_la(&update), 1, kv_to_la(&count), DOMID_SELF);
+ return count;
+}
+/* Note: make sure this fits in KERNEL_STACK_SIZE */
+#define HYP_BATCH_MMU_UPDATES 256
+
+#define hyp_mmu_update_la(la, val) hyp_mmu_update_pte( \
+ (unsigned long)(((pt_entry_t*)(kernel_pmap->dirbase[lin2pdenum((unsigned long)la)] & INTEL_PTE_PFN)) \
+ + ptenum((unsigned long)la)), val)
+
+_hypcall2(long, set_gdt, vm_offset_t /* unsigned long * */, frame_list, unsigned int, entries)
+
+_hypcall2(long, stack_switch, unsigned long, ss, unsigned long, esp);
+
+_hypcall4(long, set_callbacks, unsigned long, es, void *, ea,
+ unsigned long, fss, void *, fsa);
+_hypcall1(long, fpu_taskswitch, int, set);
+
+_hypcall4(long, update_descriptor, unsigned long, ma_lo, unsigned long, ma_hi, unsigned long, desc_lo, unsigned long, desc_hi);
+#define hyp_do_update_descriptor(ma, desc) ({ \
+ unsigned long long __desc = (desc); \
+ hyp_update_descriptor(ma, 0, __desc, __desc >> 32); \
+})
+
+#include <xen/public/memory.h>
+_hypcall2(long, memory_op, unsigned long, cmd, vm_offset_t /* void * */, arg);
+MACH_INLINE void hyp_free_mfn(unsigned long mfn)
+{
+ struct xen_memory_reservation reservation;
+ reservation.extent_start = (void*) kvtolin(&mfn);
+ reservation.nr_extents = 1;
+ reservation.extent_order = 0;
+ reservation.address_bits = 0;
+ reservation.domid = DOMID_SELF;
+ if (hyp_memory_op(XENMEM_decrease_reservation, kvtolin(&reservation)) != 1)
+ panic("couldn't free page %d\n", mfn);
+}
+
+_hypcall4(int, update_va_mapping, unsigned long, va, unsigned long, val_lo, unsigned long, val_hi, unsigned long, flags);
+#define hyp_do_update_va_mapping(va, val, flags) ({ \
+ unsigned long long __val = (val); \
+ hyp_update_va_mapping(va, __val & 0xffffffffU, __val >> 32, flags); \
+})
+
+MACH_INLINE void hyp_free_page(unsigned long pfn, void *va)
+{
+ /* save mfn */
+ unsigned long mfn = pfn_to_mfn(pfn);
+
+ /* remove from mappings */
+ if (hyp_do_update_va_mapping(kvtolin(va), 0, UVMF_INVLPG|UVMF_ALL))
+ panic("couldn't clear page %d at %p\n", pfn, va);
+
+#ifdef MACH_PSEUDO_PHYS
+ /* drop machine page */
+ mfn_list[pfn] = ~0;
+#endif /* MACH_PSEUDO_PHYS */
+
+ /* and free from Xen */
+ hyp_free_mfn(mfn);
+}
+
+_hypcall4(int, mmuext_op, vm_offset_t /* struct mmuext_op * */, op, int, count, vm_offset_t /* int * */, success_count, domid_t, domid);
+MACH_INLINE int hyp_mmuext_op_void(unsigned int cmd)
+{
+ struct mmuext_op op = {
+ .cmd = cmd,
+ };
+ int count;
+ hyp_mmuext_op(kv_to_la(&op), 1, kv_to_la(&count), DOMID_SELF);
+ return count;
+}
+MACH_INLINE int hyp_mmuext_op_mfn(unsigned int cmd, unsigned long mfn)
+{
+ struct mmuext_op op = {
+ .cmd = cmd,
+ .arg1.mfn = mfn,
+ };
+ int count;
+ hyp_mmuext_op(kv_to_la(&op), 1, kv_to_la(&count), DOMID_SELF);
+ return count;
+}
+MACH_INLINE void hyp_set_ldt(void *ldt, unsigned long nbentries) {
+ struct mmuext_op op = {
+ .cmd = MMUEXT_SET_LDT,
+ .arg1.linear_addr = kvtolin(ldt),
+ .arg2.nr_ents = nbentries,
+ };
+ int count;
+ if (((unsigned long)ldt) & PAGE_MASK)
+ panic("ldt %p is not aligned on a page\n", ldt);
+ for (count=0; count<nbentries; count+= PAGE_SIZE/8)
+ pmap_set_page_readonly(ldt+count*8);
+ hyp_mmuext_op(kvtolin(&op), 1, kvtolin(&count), DOMID_SELF);
+ if (!count)
+ panic("couldn't set LDT\n");
+}
+/* TODO: use xen_pfn_to_cr3/xen_cr3_to_pfn to cope with pdp above 4GB */
+#define hyp_set_cr3(value) hyp_mmuext_op_mfn(MMUEXT_NEW_BASEPTR, pa_to_mfn(value))
+MACH_INLINE void hyp_invlpg(vm_offset_t lin) {
+ struct mmuext_op ops;
+ int n;
+ ops.cmd = MMUEXT_INVLPG_ALL;
+ ops.arg1.linear_addr = lin;
+ hyp_mmuext_op(kvtolin(&ops), 1, kvtolin(&n), DOMID_SELF);
+ if (n < 1)
+ panic("couldn't invlpg\n");
+}
+
+_hypcall2(long, set_timer_op, unsigned long, absolute_lo, unsigned long, absolute_hi);
+#define hyp_do_set_timer_op(absolute_nsec) ({ \
+ unsigned long long __absolute = (absolute_nsec); \
+ hyp_set_timer_op(__absolute, __absolute >> 32); \
+})
+
+#include <xen/public/event_channel.h>
+_hypcall1(int, event_channel_op, vm_offset_t /* evtchn_op_t * */, op);
+MACH_INLINE int hyp_event_channel_send(evtchn_port_t port) {
+ evtchn_op_t op = {
+ .cmd = EVTCHNOP_send,
+ .u.send.port = port,
+ };
+ return hyp_event_channel_op(kvtolin(&op));
+}
+MACH_INLINE evtchn_port_t hyp_event_channel_alloc(domid_t domid) {
+ evtchn_op_t op = {
+ .cmd = EVTCHNOP_alloc_unbound,
+ .u.alloc_unbound.dom = DOMID_SELF,
+ .u.alloc_unbound.remote_dom = domid,
+ };
+ if (hyp_event_channel_op(kvtolin(&op)))
+ panic("couldn't allocate event channel");
+ return op.u.alloc_unbound.port;
+}
+MACH_INLINE evtchn_port_t hyp_event_channel_bind_virq(uint32_t virq, uint32_t vcpu) {
+ evtchn_op_t op = { .cmd = EVTCHNOP_bind_virq, .u.bind_virq = { .virq = virq, .vcpu = vcpu }};
+ if (hyp_event_channel_op(kvtolin(&op)))
+ panic("can't bind virq %d\n",virq);
+ return op.u.bind_virq.port;
+}
+
+_hypcall3(int, console_io, int, cmd, int, count, vm_offset_t /* const char * */, buffer);
+
+_hypcall3(long, grant_table_op, unsigned int, cmd, vm_offset_t /* void * */, uop, unsigned int, count);
+
+_hypcall2(long, vm_assist, unsigned int, cmd, unsigned int, type);
+
+_hypcall0(long, iret);
+
+#include <xen/public/sched.h>
+_hypcall2(long, sched_op, int, cmd, vm_offset_t /* void* */, arg)
+#define hyp_yield() hyp_sched_op(SCHEDOP_yield, 0)
+#define hyp_block() hyp_sched_op(SCHEDOP_block, 0)
+MACH_INLINE void __attribute__((noreturn)) hyp_crash(void)
+{
+ unsigned int shut = SHUTDOWN_crash;
+ hyp_sched_op(SCHEDOP_shutdown, kvtolin(&shut));
+ /* really shouldn't return */
+ printf("uh, shutdown returned?!\n");
+ for(;;);
+}
+
+MACH_INLINE void __attribute__((noreturn)) hyp_halt(void)
+{
+ unsigned int shut = SHUTDOWN_poweroff;
+ hyp_sched_op(SCHEDOP_shutdown, kvtolin(&shut));
+ /* really shouldn't return */
+ printf("uh, shutdown returned?!\n");
+ for(;;);
+}
+
+MACH_INLINE void __attribute__((noreturn)) hyp_reboot(void)
+{
+ unsigned int shut = SHUTDOWN_reboot;
+ hyp_sched_op(SCHEDOP_shutdown, kvtolin(&shut));
+ /* really shouldn't return */
+ printf("uh, reboot returned?!\n");
+ for(;;);
+}
+
+/* x86-specific */
+MACH_INLINE unsigned64_t hyp_cpu_clock(void) {
+ unsigned64_t tsc;
+ asm volatile("rdtsc":"=A"(tsc));
+ return tsc;
+}
+
+#else /* __ASSEMBLER__ */
+/* TODO: SMP */
+#define cli movb $0xff,hyp_shared_info+CPU_CLI
+#define sti call hyp_sti
+#endif /* ASSEMBLER */
+#endif /* MACH_XEN */
+
+#endif /* XEN_HYPCALL_H */
diff --git a/i386/i386at/conf.c b/i386/i386at/conf.c
index 23c2a6f..f5ab36c 100644
--- a/i386/i386at/conf.c
+++ b/i386/i386at/conf.c
@@ -34,6 +34,7 @@ extern int timeopen(), timeclose();
extern vm_offset_t timemmap();
#define timename "time"
+#ifndef MACH_HYP
extern int kdopen(), kdclose(), kdread(), kdwrite();
extern int kdgetstat(), kdsetstat(), kdportdeath();
extern vm_offset_t kdmmap();
@@ -50,17 +51,26 @@ extern int lpropen(), lprclose(), lprread(), lprwrite();
extern int lprgetstat(), lprsetstat(), lprportdeath();
#define lprname "lpr"
#endif /* NLPR > 0 */
+#endif /* MACH_HYP */
extern int kbdopen(), kbdclose(), kbdread();
extern int kbdgetstat(), kbdsetstat();
#define kbdname "kbd"
+#ifndef MACH_HYP
extern int mouseopen(), mouseclose(), mouseread(), mousegetstat();
#define mousename "mouse"
+#endif /* MACH_HYP */
extern int kmsgopen(), kmsgclose(), kmsgread(), kmsggetstat();
#define kmsgname "kmsg"
+#ifdef MACH_HYP
+extern int hypcnopen(), hypcnclose(), hypcnread(), hypcnwrite();
+extern int hypcngetstat(), hypcnsetstat(), hypcnportdeath();
+#define hypcnname "hyp"
+#endif /* MACH_HYP */
+
/*
* List of devices - console must be at slot 0
*/
@@ -79,16 +89,19 @@ struct dev_ops dev_name_list[] =
nodev, nulldev, nulldev, 0,
nodev },
+#ifndef MACH_HYP
{ kdname, kdopen, kdclose, kdread,
kdwrite, kdgetstat, kdsetstat, kdmmap,
nodev, nulldev, kdportdeath, 0,
nodev },
+#endif /* MACH_HYP */
{ timename, timeopen, timeclose, nulldev,
nulldev, nulldev, nulldev, timemmap,
nodev, nulldev, nulldev, 0,
nodev },
+#ifndef MACH_HYP
#if NCOM > 0
{ comname, comopen, comclose, comread,
comwrite, comgetstat, comsetstat, nomap,
@@ -107,6 +120,7 @@ struct dev_ops dev_name_list[] =
nodev, mousegetstat, nulldev, nomap,
nodev, nulldev, nulldev, 0,
nodev },
+#endif /* MACH_HYP */
{ kbdname, kbdopen, kbdclose, kbdread,
nodev, kbdgetstat, kbdsetstat, nomap,
@@ -120,6 +134,13 @@ struct dev_ops dev_name_list[] =
nodev },
#endif
+#ifdef MACH_HYP
+ { hypcnname, hypcnopen, hypcnclose, hypcnread,
+ hypcnwrite, hypcngetstat, hypcnsetstat, nomap,
+ nodev, nulldev, hypcnportdeath, 0,
+ nodev },
+#endif /* MACH_HYP */
+
};
int dev_name_count = sizeof(dev_name_list)/sizeof(dev_name_list[0]);
diff --git a/i386/i386at/cons_conf.c b/i386/i386at/cons_conf.c
index 8784ed9..ea8ccb5 100644
--- a/i386/i386at/cons_conf.c
+++ b/i386/i386at/cons_conf.c
@@ -30,19 +30,27 @@
#include <sys/types.h>
#include <device/cons.h>
+#ifdef MACH_HYP
+extern int hypcnprobe(), hypcninit(), hypcngetc(), hypcnputc();
+#else /* MACH_HYP */
extern int kdcnprobe(), kdcninit(), kdcngetc(), kdcnputc();
#if NCOM > 0 && RCLINE >= 0
extern int comcnprobe(), comcninit(), comcngetc(), comcnputc();
#endif
+#endif /* MACH_HYP */
/*
* The rest of the consdev fields are filled in by the respective
* cnprobe routine.
*/
struct consdev constab[] = {
+#ifdef MACH_HYP
+ {"hyp", hypcnprobe, hypcninit, hypcngetc, hypcnputc},
+#else /* MACH_HYP */
{"kd", kdcnprobe, kdcninit, kdcngetc, kdcnputc},
#if NCOM > 0 && RCLINE >= 0 && 1
{"com", comcnprobe, comcninit, comcngetc, comcnputc},
#endif
+#endif /* MACH_HYP */
{0}
};
diff --git a/i386/i386at/model_dep.c b/i386/i386at/model_dep.c
index 3ebe2e6..61605a1 100644
--- a/i386/i386at/model_dep.c
+++ b/i386/i386at/model_dep.c
@@ -40,6 +40,7 @@
#include <mach/vm_prot.h>
#include <mach/machine.h>
#include <mach/machine/multiboot.h>
+#include <mach/xen.h>
#include <i386/vm_param.h>
#include <kern/assert.h>
@@ -48,6 +49,7 @@
#include <kern/mach_clock.h>
#include <kern/printf.h>
#include <sys/time.h>
+#include <sys/types.h>
#include <vm/vm_page.h>
#include <i386/fpu.h>
#include <i386/gdt.h>
@@ -65,6 +67,12 @@
#include <i386at/int_init.h>
#include <i386at/kd.h>
#include <i386at/rtc.h>
+#ifdef MACH_XEN
+#include <xen/console.h>
+#include <xen/store.h>
+#include <xen/evt.h>
+#include <xen/xen.h>
+#endif /* MACH_XEN */
/* Location of the kernel's symbol table.
Both of these are 0 if none is available. */
@@ -81,7 +89,20 @@ vm_offset_t phys_first_addr = 0;
vm_offset_t phys_last_addr;
/* A copy of the multiboot info structure passed by the boot loader. */
+#ifdef MACH_XEN
+struct start_info boot_info;
+#ifdef MACH_PSEUDO_PHYS
+unsigned long *mfn_list;
+#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+unsigned long *pfn_list = (void*) PFN_LIST;
+#endif
+#endif /* MACH_PSEUDO_PHYS */
+#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+unsigned long la_shift = VM_MIN_KERNEL_ADDRESS;
+#endif
+#else /* MACH_XEN */
struct multiboot_info boot_info;
+#endif /* MACH_XEN */
/* Command line supplied to kernel. */
char *kernel_cmdline = "";
@@ -90,7 +111,11 @@ char *kernel_cmdline = "";
it gets bumped up through physical memory
that exists and is not occupied by boot gunk.
It is not necessarily page-aligned. */
-static vm_offset_t avail_next = 0x1000; /* XX end of BIOS data area */
+static vm_offset_t avail_next
+#ifndef MACH_HYP
+ = 0x1000 /* XX end of BIOS data area */
+#endif /* MACH_HYP */
+ ;
/* Possibly overestimated amount of available memory
still remaining to be handed to the VM system. */
@@ -135,6 +160,9 @@ void machine_init(void)
*/
init_fpu();
+#ifdef MACH_HYP
+ hyp_init();
+#else /* MACH_HYP */
#ifdef LINUX_DEV
/*
* Initialize Linux drivers.
@@ -146,16 +174,19 @@ void machine_init(void)
* Find the devices
*/
probeio();
+#endif /* MACH_HYP */
/*
* Get the time
*/
inittodr();
+#ifndef MACH_HYP
/*
* Tell the BIOS not to clear and test memory.
*/
*(unsigned short *)phystokv(0x472) = 0x1234;
+#endif /* MACH_HYP */
/*
* Unmap page 0 to trap NULL references.
@@ -166,8 +197,17 @@ void machine_init(void)
/* Conserve power on processor CPU. */
void machine_idle (int cpu)
{
+#ifdef MACH_HYP
+ hyp_idle();
+#else /* MACH_HYP */
assert (cpu == cpu_number ());
asm volatile ("hlt" : : : "memory");
+#endif /* MACH_HYP */
+}
+
+void machine_relax ()
+{
+ asm volatile ("rep; nop" : : : "memory");
}
/*
@@ -175,9 +215,13 @@ void machine_idle (int cpu)
*/
void halt_cpu(void)
{
+#ifdef MACH_HYP
+ hyp_halt();
+#else /* MACH_HYP */
asm volatile("cli");
while (TRUE)
machine_idle (cpu_number ());
+#endif /* MACH_HYP */
}
/*
@@ -187,10 +231,16 @@ void halt_all_cpus(reboot)
boolean_t reboot;
{
if (reboot) {
+#ifdef MACH_HYP
+ hyp_reboot();
+#endif /* MACH_HYP */
kdreboot();
}
else {
rebootflag = 1;
+#ifdef MACH_HYP
+ hyp_halt();
+#endif /* MACH_HYP */
printf("In tight loop: hit ctl-alt-del to reboot\n");
(void) spl0();
}
@@ -215,22 +265,26 @@ void db_reset_cpu(void)
void
mem_size_init(void)
{
- vm_size_t phys_last_kb;
-
/* Physical memory on all PCs starts at physical address 0.
XX make it a constant. */
phys_first_addr = 0;
- phys_last_kb = 0x400 + boot_info.mem_upper;
+#ifdef MACH_HYP
+ if (boot_info.nr_pages >= 0x100000) {
+ printf("Truncating memory size to 4GiB\n");
+ phys_last_addr = 0xffffffffU;
+ } else
+ phys_last_addr = boot_info.nr_pages * 0x1000;
+#else /* MACH_HYP */
+ /* TODO: support mmap */
+ vm_size_t phys_last_kb = 0x400 + boot_info.mem_upper;
/* Avoid 4GiB overflow. */
if (phys_last_kb < 0x400 || phys_last_kb >= 0x400000) {
printf("Truncating memory size to 4GiB\n");
- phys_last_kb = 0x400000 - 1;
- }
-
- /* TODO: support mmap */
-
- phys_last_addr = phys_last_kb * 0x400;
+ phys_last_addr = 0xffffffffU;
+ } else
+ phys_last_addr = phys_last_kb * 0x400;
+#endif /* MACH_HYP */
printf("AT386 boot: physical memory from 0x%x to 0x%x\n",
phys_first_addr, phys_last_addr);
@@ -240,14 +294,20 @@ mem_size_init(void)
if (phys_last_addr > ((VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / 6) * 5) {
phys_last_addr = ((VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / 6) * 5;
printf("Truncating memory size to %dMiB\n", (phys_last_addr - phys_first_addr) / (1024 * 1024));
+ /* TODO Xen: free lost memory */
}
phys_first_addr = round_page(phys_first_addr);
phys_last_addr = trunc_page(phys_last_addr);
+#ifdef MACH_HYP
+ /* Memory is just contiguous */
+ avail_remaining = phys_last_addr;
+#else /* MACH_HYP */
avail_remaining
= phys_last_addr - (0x100000 - (boot_info.mem_lower * 0x400)
- 0x1000);
+#endif /* MACH_HYP */
}
/*
@@ -263,13 +323,20 @@ i386at_init(void)
/*
* Initialize the PIC prior to any possible call to an spl.
*/
+#ifndef MACH_HYP
picinit();
+#else /* MACH_HYP */
+ hyp_intrinit();
+#endif /* MACH_HYP */
/*
* Find memory size parameters.
*/
mem_size_init();
+#ifdef MACH_XEN
+ kernel_cmdline = (char*) boot_info.cmd_line;
+#else /* MACH_XEN */
/* Copy content pointed by boot_info before losing access to it when it
* is too far in physical memory. */
if (boot_info.flags & MULTIBOOT_CMDLINE) {
@@ -304,6 +371,7 @@ i386at_init(void)
m[i].string = addr;
}
}
+#endif /* MACH_XEN */
/*
* Initialize kernel physical map, mapping the
@@ -325,19 +393,42 @@ i386at_init(void)
kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS)] =
kernel_page_dir[lin2pdenum(LINEAR_MIN_KERNEL_ADDRESS)];
#if PAE
+ /* PAE page tables are 2MB only */
kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 1] =
kernel_page_dir[lin2pdenum(LINEAR_MIN_KERNEL_ADDRESS) + 1];
+ kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 2] =
+ kernel_page_dir[lin2pdenum(LINEAR_MIN_KERNEL_ADDRESS) + 2];
+#endif /* PAE */
+#ifdef MACH_XEN
+ {
+ int i;
+ for (i = 0; i < PDPNUM; i++)
+ pmap_set_page_readonly_init((void*) kernel_page_dir + i * INTEL_PGBYTES);
+#if PAE
+ pmap_set_page_readonly_init(kernel_pmap->pdpbase);
+#endif /* PAE */
+ }
+#endif /* MACH_XEN */
+#if PAE
set_cr3((unsigned)_kvtophys(kernel_pmap->pdpbase));
+#ifndef MACH_HYP
if (!CPU_HAS_FEATURE(CPU_FEATURE_PAE))
panic("CPU doesn't have support for PAE.");
set_cr4(get_cr4() | CR4_PAE);
+#endif /* MACH_HYP */
#else
set_cr3((unsigned)_kvtophys(kernel_page_dir));
#endif /* PAE */
+#ifndef MACH_HYP
if (CPU_HAS_FEATURE(CPU_FEATURE_PGE))
set_cr4(get_cr4() | CR4_PGE);
+ /* already set by Hypervisor */
set_cr0(get_cr0() | CR0_PG | CR0_WP);
+#endif /* MACH_HYP */
flush_instr_queue();
+#ifdef MACH_XEN
+ pmap_clear_bootstrap_pagetable((void *)boot_info.pt_base);
+#endif /* MACH_XEN */
/* Interrupt stacks are allocated in physical memory,
while kernel stacks are allocated in kernel virtual memory,
@@ -349,18 +440,47 @@ i386at_init(void)
*/
gdt_init();
idt_init();
+#ifndef MACH_HYP
int_init();
+#endif /* MACH_HYP */
ldt_init();
ktss_init();
/* Get rid of the temporary direct mapping and flush it out of the TLB. */
+#ifdef MACH_XEN
+#ifdef MACH_PSEUDO_PHYS
+ if (!hyp_mmu_update_pte(kv_to_ma(&kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS)]), 0))
+#else /* MACH_PSEUDO_PHYS */
+ if (hyp_do_update_va_mapping(VM_MIN_KERNEL_ADDRESS, 0, UVMF_INVLPG | UVMF_ALL))
+#endif /* MACH_PSEUDO_PHYS */
+ printf("couldn't unmap frame 0\n");
+#if PAE
+#ifdef MACH_PSEUDO_PHYS
+ if (!hyp_mmu_update_pte(kv_to_ma(&kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 1]), 0))
+#else /* MACH_PSEUDO_PHYS */
+ if (hyp_do_update_va_mapping(VM_MIN_KERNEL_ADDRESS + INTEL_PGBYTES, 0, UVMF_INVLPG | UVMF_ALL))
+#endif /* MACH_PSEUDO_PHYS */
+ printf("couldn't unmap frame 1\n");
+#ifdef MACH_PSEUDO_PHYS
+ if (!hyp_mmu_update_pte(kv_to_ma(&kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 2]), 0))
+#else /* MACH_PSEUDO_PHYS */
+ if (hyp_do_update_va_mapping(VM_MIN_KERNEL_ADDRESS + 2*INTEL_PGBYTES, 0, UVMF_INVLPG | UVMF_ALL))
+#endif /* MACH_PSEUDO_PHYS */
+ printf("couldn't unmap frame 2\n");
+#endif /* PAE */
+ hyp_free_page(0, (void*) VM_MIN_KERNEL_ADDRESS);
+#else /* MACH_XEN */
kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS)] = 0;
#if PAE
kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 1] = 0;
+ kernel_page_dir[lin2pdenum(VM_MIN_KERNEL_ADDRESS) + 2] = 0;
#endif /* PAE */
+#endif /* MACH_XEN */
flush_tlb();
-
+#ifdef MACH_XEN
+ hyp_p2m_init();
+#endif /* MACH_XEN */
/* XXX We'll just use the initialization stack we're already running on
as the interrupt stack for now. Later this will have to change,
@@ -384,6 +504,15 @@ void c_boot_entry(vm_offset_t bi)
printf(version);
printf("\n");
+#ifdef MACH_XEN
+ printf("Running on %s.\n", boot_info.magic);
+ if (boot_info.flags & SIF_PRIVILEGED)
+ panic("Mach can't run as dom0.");
+#ifdef MACH_PSEUDO_PHYS
+ mfn_list = (void*)boot_info.mfn_list;
+#endif
+#else /* MACH_XEN */
+
#if MACH_KDB
/*
* Locate the kernel's symbol table, if the boot loader provided it.
@@ -405,6 +534,7 @@ void c_boot_entry(vm_offset_t bi)
symtab_size, strtab_size);
}
#endif /* MACH_KDB */
+#endif /* MACH_XEN */
cpu_type = discover_x86_cpu_type ();
@@ -525,6 +655,12 @@ boolean_t
init_alloc_aligned(vm_size_t size, vm_offset_t *addrp)
{
vm_offset_t addr;
+
+#ifdef MACH_HYP
+ /* There is none */
+ if (!avail_next)
+ avail_next = _kvtophys(boot_info.pt_base) + (boot_info.nr_pt_frames + 3) * 0x1000;
+#else /* MACH_HYP */
extern char start[], end[];
int i;
static int wrapped = 0;
@@ -543,11 +679,14 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp)
: 0;
retry:
+#endif /* MACH_HYP */
/* Page-align the start address. */
avail_next = round_page(avail_next);
+#ifndef MACH_HYP
/* Start with memory above 16MB, reserving the low memory for later. */
+ /* Don't care on Xen */
if (!wrapped && phys_last_addr > 16 * 1024*1024)
{
if (avail_next < 16 * 1024*1024)
@@ -563,9 +702,15 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp)
wrapped = 1;
}
}
+#endif /* MACH_HYP */
/* Check if we have reached the end of memory. */
- if (avail_next == (wrapped ? 16 * 1024*1024 : phys_last_addr))
+ if (avail_next ==
+ (
+#ifndef MACH_HYP
+ wrapped ? 16 * 1024*1024 :
+#endif /* MACH_HYP */
+ phys_last_addr))
return FALSE;
/* Tentatively assign the current location to the caller. */
@@ -575,6 +720,7 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp)
and see where that puts us. */
avail_next += size;
+#ifndef MACH_HYP
/* Skip past the I/O and ROM area. */
if ((avail_next > (boot_info.mem_lower * 0x400)) && (addr < 0x100000))
{
@@ -620,6 +766,7 @@ init_alloc_aligned(vm_size_t size, vm_offset_t *addrp)
/* XXX string */
}
}
+#endif /* MACH_HYP */
avail_remaining -= size;
@@ -649,6 +796,11 @@ boolean_t pmap_valid_page(x)
vm_offset_t x;
{
/* XXX is this OK? What does it matter for? */
- return (((phys_first_addr <= x) && (x < phys_last_addr)) &&
- !(((boot_info.mem_lower * 1024) <= x) && (x < 1024*1024)));
+ return (((phys_first_addr <= x) && (x < phys_last_addr))
+#ifndef MACH_HYP
+ && !(
+ ((boot_info.mem_lower * 1024) <= x) &&
+ (x < 1024*1024))
+#endif /* MACH_HYP */
+ );
}
diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index c633fd9..ee19c4b 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -77,13 +77,18 @@
#include <vm/vm_user.h>
#include <mach/machine/vm_param.h>
+#include <mach/xen.h>
#include <machine/thread.h>
#include <i386/cpu_number.h>
#include <i386/proc_reg.h>
#include <i386/locore.h>
#include <i386/model_dep.h>
+#ifdef MACH_PSEUDO_PHYS
+#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = pte_entry?pa_to_ma(pte_entry):0;
+#else /* MACH_PSEUDO_PHYS */
#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = (pte_entry);
+#endif /* MACH_PSEUDO_PHYS */
/*
* Private data structures.
@@ -325,6 +330,19 @@ lock_data_t pmap_system_lock;
#define MAX_TBIS_SIZE 32 /* > this -> TBIA */ /* XXX */
+#ifdef MACH_HYP
+#if 1
+#define INVALIDATE_TLB(pmap, s, e) hyp_mmuext_op_void(MMUEXT_TLB_FLUSH_LOCAL)
+#else
+#define INVALIDATE_TLB(pmap, s, e) do { \
+ if (__builtin_constant_p((e) - (s)) \
+ && (e) - (s) == PAGE_SIZE) \
+ hyp_invlpg((pmap) == kernel_pmap ? kvtolin(s) : (s)); \
+ else \
+ hyp_mmuext_op_void(MMUEXT_TLB_FLUSH_LOCAL); \
+} while(0)
+#endif
+#else /* MACH_HYP */
#if 0
/* It is hard to know when a TLB flush becomes less expensive than a bunch of
* invlpgs. But it surely is more expensive than just one invlpg. */
@@ -338,6 +356,7 @@ lock_data_t pmap_system_lock;
#else
#define INVALIDATE_TLB(pmap, s, e) flush_tlb()
#endif
+#endif /* MACH_HYP */
#if NCPUS > 1
@@ -507,6 +526,10 @@ vm_offset_t pmap_map_bd(virt, start, end, prot)
register pt_entry_t template;
register pt_entry_t *pte;
int spl;
+#ifdef MACH_XEN
+ int n, i = 0;
+ struct mmu_update update[HYP_BATCH_MMU_UPDATES];
+#endif /* MACH_XEN */
template = pa_to_pte(start)
| INTEL_PTE_NCACHE|INTEL_PTE_WTHRU
@@ -521,11 +544,30 @@ vm_offset_t pmap_map_bd(virt, start, end, prot)
pte = pmap_pte(kernel_pmap, virt);
if (pte == PT_ENTRY_NULL)
panic("pmap_map_bd: Invalid kernel address\n");
+#ifdef MACH_XEN
+ update[i].ptr = kv_to_ma(pte);
+ update[i].val = pa_to_ma(template);
+ i++;
+ if (i == HYP_BATCH_MMU_UPDATES) {
+ hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF);
+ if (n != i)
+ panic("couldn't pmap_map_bd\n");
+ i = 0;
+ }
+#else /* MACH_XEN */
WRITE_PTE(pte, template)
+#endif /* MACH_XEN */
pte_increment_pa(template);
virt += PAGE_SIZE;
start += PAGE_SIZE;
}
+#ifdef MACH_XEN
+ if (i > HYP_BATCH_MMU_UPDATES)
+ panic("overflowed array in pmap_map_bd");
+ hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF);
+ if (n != i)
+ panic("couldn't pmap_map_bd\n");
+#endif /* MACH_XEN */
PMAP_READ_UNLOCK(pmap, spl);
return(virt);
}
@@ -583,6 +625,8 @@ void pmap_bootstrap()
/*
* Allocate and clear a kernel page directory.
*/
+ /* Note: initial Xen mapping holds at least 512kB free mapped page.
+ * We use that for directly building our linear mapping. */
#if PAE
{
vm_offset_t addr;
@@ -604,6 +648,53 @@ void pmap_bootstrap()
kernel_pmap->dirbase[i] = 0;
}
+#ifdef MACH_XEN
+ /*
+ * Xen may only provide as few as 512KB extra bootstrap linear memory,
+ * which is far from enough to map all available memory, so we need to
+ * map more bootstrap linear memory. We here map 1 (resp. 4 for PAE)
+ * other L1 table(s), thus 4MiB extra memory (resp. 8MiB), which is
+ * enough for a pagetable mapping 4GiB.
+ */
+#ifdef PAE
+#define NSUP_L1 4
+#else
+#define NSUP_L1 1
+#endif
+ pt_entry_t *l1_map[NSUP_L1];
+ {
+ pt_entry_t *base = (pt_entry_t*) boot_info.pt_base;
+ int i;
+ int n_l1map;
+#ifdef PAE
+ pt_entry_t *l2_map = (pt_entry_t*) phystokv(pte_to_pa(base[0]));
+#else /* PAE */
+ pt_entry_t *l2_map = base;
+#endif /* PAE */
+ for (n_l1map = 0, i = lin2pdenum(VM_MIN_KERNEL_ADDRESS); i < NPTES; i++) {
+ if (!(l2_map[i] & INTEL_PTE_VALID)) {
+ struct mmu_update update;
+ int j, n;
+
+ l1_map[n_l1map] = (pt_entry_t*) phystokv(pmap_grab_page());
+ for (j = 0; j < NPTES; j++)
+ l1_map[n_l1map][j] = intel_ptob(pfn_to_mfn((i - lin2pdenum(VM_MIN_KERNEL_ADDRESS)) * NPTES + j)) | INTEL_PTE_VALID | INTEL_PTE_WRITE;
+ pmap_set_page_readonly_init(l1_map[n_l1map]);
+ if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn (l1_map[n_l1map])))
+ panic("couldn't pin page %p(%p)", l1_map[n_l1map], kv_to_ma (l1_map[n_l1map]));
+ update.ptr = kv_to_ma(&l2_map[i]);
+ update.val = kv_to_ma(l1_map[n_l1map]) | INTEL_PTE_VALID | INTEL_PTE_WRITE;
+ hyp_mmu_update(kv_to_la(&update), 1, kv_to_la(&n), DOMID_SELF);
+ if (n != 1)
+ panic("couldn't complete bootstrap map");
+ /* added the last L1 table, can stop */
+ if (++n_l1map >= NSUP_L1)
+ break;
+ }
+ }
+ }
+#endif /* MACH_XEN */
+
/*
* Allocate and set up the kernel page tables.
*/
@@ -640,19 +731,42 @@ void pmap_bootstrap()
WRITE_PTE(pte, 0);
}
else
+#ifdef MACH_XEN
+ if (va == (vm_offset_t) &hyp_shared_info)
+ {
+ *pte = boot_info.shared_info | INTEL_PTE_VALID | INTEL_PTE_WRITE;
+ va += INTEL_PGBYTES;
+ }
+ else
+#endif /* MACH_XEN */
{
extern char _start[], etext[];
- if ((va >= (vm_offset_t)_start)
+ if (((va >= (vm_offset_t) _start)
&& (va + INTEL_PGBYTES <= (vm_offset_t)etext))
+#ifdef MACH_XEN
+ || (va >= (vm_offset_t) boot_info.pt_base
+ && (va + INTEL_PGBYTES <=
+ (vm_offset_t) ptable + INTEL_PGBYTES))
+#endif /* MACH_XEN */
+ )
{
WRITE_PTE(pte, pa_to_pte(_kvtophys(va))
| INTEL_PTE_VALID | global);
}
else
{
- WRITE_PTE(pte, pa_to_pte(_kvtophys(va))
- | INTEL_PTE_VALID | INTEL_PTE_WRITE | global);
+#ifdef MACH_XEN
+ int i;
+ for (i = 0; i < NSUP_L1; i++)
+ if (va == (vm_offset_t) l1_map[i])
+ WRITE_PTE(pte, pa_to_pte(_kvtophys(va))
+ | INTEL_PTE_VALID | global);
+ if (i == NSUP_L1)
+#endif /* MACH_XEN */
+ WRITE_PTE(pte, pa_to_pte(_kvtophys(va))
+ | INTEL_PTE_VALID | INTEL_PTE_WRITE | global)
+
}
va += INTEL_PGBYTES;
}
@@ -662,6 +776,11 @@ void pmap_bootstrap()
WRITE_PTE(pte, 0);
va += INTEL_PGBYTES;
}
+#ifdef MACH_XEN
+ pmap_set_page_readonly_init(ptable);
+ if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn (ptable)))
+ panic("couldn't pin page %p(%p)\n", ptable, kv_to_ma (ptable));
+#endif /* MACH_XEN */
}
}
@@ -669,6 +788,100 @@ void pmap_bootstrap()
soon after we return from here. */
}
+#ifdef MACH_XEN
+/* These are only required because of Xen security policies */
+
+/* Set back a page read write */
+void pmap_set_page_readwrite(void *_vaddr) {
+ vm_offset_t vaddr = (vm_offset_t) _vaddr;
+ vm_offset_t paddr = kvtophys(vaddr);
+ vm_offset_t canon_vaddr = phystokv(paddr);
+ if (hyp_do_update_va_mapping (kvtolin(vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID | INTEL_PTE_WRITE, UVMF_NONE))
+ panic("couldn't set hiMMU readwrite for addr %p(%p)\n", vaddr, pa_to_ma (paddr));
+ if (canon_vaddr != vaddr)
+ if (hyp_do_update_va_mapping (kvtolin(canon_vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID | INTEL_PTE_WRITE, UVMF_NONE))
+ panic("couldn't set hiMMU readwrite for paddr %p(%p)\n", canon_vaddr, pa_to_ma (paddr));
+}
+
+/* Set a page read only (so as to pin it for instance) */
+void pmap_set_page_readonly(void *_vaddr) {
+ vm_offset_t vaddr = (vm_offset_t) _vaddr;
+ vm_offset_t paddr = kvtophys(vaddr);
+ vm_offset_t canon_vaddr = phystokv(paddr);
+ if (*pmap_pde(kernel_pmap, vaddr) & INTEL_PTE_VALID) {
+ if (hyp_do_update_va_mapping (kvtolin(vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID, UVMF_NONE))
+ panic("couldn't set hiMMU readonly for vaddr %p(%p)\n", vaddr, pa_to_ma (paddr));
+ }
+ if (canon_vaddr != vaddr &&
+ *pmap_pde(kernel_pmap, canon_vaddr) & INTEL_PTE_VALID) {
+ if (hyp_do_update_va_mapping (kvtolin(canon_vaddr), pa_to_pte (pa_to_ma(paddr)) | INTEL_PTE_VALID, UVMF_NONE))
+ panic("couldn't set hiMMU readonly for vaddr %p canon_vaddr %p paddr %p (%p)\n", vaddr, canon_vaddr, paddr, pa_to_ma (paddr));
+ }
+}
+
+/* This needs to be called instead of pmap_set_page_readonly as long as RC3
+ * still points to the bootstrap dirbase. */
+void pmap_set_page_readonly_init(void *_vaddr) {
+ vm_offset_t vaddr = (vm_offset_t) _vaddr;
+#if PAE
+ pt_entry_t *pdpbase = (void*) boot_info.pt_base;
+ vm_offset_t dirbase = ptetokv(pdpbase[0]);
+#else
+ vm_offset_t dirbase = boot_info.pt_base;
+#endif
+ struct pmap linear_pmap = {
+ .dirbase = (void*) dirbase,
+ };
+ /* Modify our future kernel map (can't use update_va_mapping for this)... */
+ if (*pmap_pde(kernel_pmap, vaddr) & INTEL_PTE_VALID)
+ if (!hyp_mmu_update_la (kvtolin(vaddr), pa_to_pte (kv_to_ma(vaddr)) | INTEL_PTE_VALID))
+ panic("couldn't set hiMMU readonly for vaddr %p(%p)\n", vaddr, kv_to_ma (vaddr));
+ /* ... and the bootstrap map. */
+ if (*pmap_pde(&linear_pmap, vaddr) & INTEL_PTE_VALID)
+ if (hyp_do_update_va_mapping (vaddr, pa_to_pte (kv_to_ma(vaddr)) | INTEL_PTE_VALID, UVMF_NONE))
+ panic("couldn't set MMU readonly for vaddr %p(%p)\n", vaddr, kv_to_ma (vaddr));
+}
+
+void pmap_clear_bootstrap_pagetable(pt_entry_t *base) {
+ int i;
+ pt_entry_t *dir;
+ vm_offset_t va = 0;
+#if PAE
+ int j;
+#endif /* PAE */
+ if (!hyp_mmuext_op_mfn (MMUEXT_UNPIN_TABLE, kv_to_mfn(base)))
+ panic("pmap_clear_bootstrap_pagetable: couldn't unpin page %p(%p)\n", base, kv_to_ma(base));
+#if PAE
+ for (j = 0; j < PDPNUM; j++)
+ {
+ pt_entry_t pdpe = base[j];
+ if (pdpe & INTEL_PTE_VALID) {
+ dir = (pt_entry_t *) phystokv(pte_to_pa(pdpe));
+#else /* PAE */
+ dir = base;
+#endif /* PAE */
+ for (i = 0; i < NPTES; i++) {
+ pt_entry_t pde = dir[i];
+ unsigned long pfn = mfn_to_pfn(atop(pde));
+ void *pgt = (void*) phystokv(ptoa(pfn));
+ if (pde & INTEL_PTE_VALID)
+ hyp_free_page(pfn, pgt);
+ va += NPTES * INTEL_PGBYTES;
+ if (va >= HYP_VIRT_START)
+ break;
+ }
+#if PAE
+ hyp_free_page(atop(_kvtophys(dir)), dir);
+ } else
+ va += NPTES * NPTES * INTEL_PGBYTES;
+ if (va >= HYP_VIRT_START)
+ break;
+ }
+#endif /* PAE */
+ hyp_free_page(atop(_kvtophys(base)), base);
+}
+#endif /* MACH_XEN */
+
void pmap_virtual_space(startp, endp)
vm_offset_t *startp;
vm_offset_t *endp;
@@ -823,6 +1036,29 @@ pmap_page_table_page_alloc()
return pa;
}
+#ifdef MACH_XEN
+void pmap_map_mfn(void *_addr, unsigned long mfn) {
+ vm_offset_t addr = (vm_offset_t) _addr;
+ pt_entry_t *pte, *pdp;
+ vm_offset_t ptp;
+ if ((pte = pmap_pte(kernel_pmap, addr)) == PT_ENTRY_NULL) {
+ ptp = phystokv(pmap_page_table_page_alloc());
+ pmap_set_page_readonly((void*) ptp);
+ if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, pa_to_mfn(ptp)))
+ panic("couldn't pin page %p(%p)\n",ptp,kv_to_ma(ptp));
+ pdp = pmap_pde(kernel_pmap, addr);
+ if (!hyp_mmu_update_pte(kv_to_ma(pdp),
+ pa_to_pte(kv_to_ma(ptp)) | INTEL_PTE_VALID
+ | INTEL_PTE_USER
+ | INTEL_PTE_WRITE))
+ panic("%s:%d could not set pde %p(%p) to %p(%p)\n",__FILE__,__LINE__,kvtophys((vm_offset_t)pdp),kv_to_ma(pdp), ptp, pa_to_ma(ptp));
+ pte = pmap_pte(kernel_pmap, addr);
+ }
+ if (!hyp_mmu_update_pte(kv_to_ma(pte), ptoa(mfn) | INTEL_PTE_VALID | INTEL_PTE_WRITE))
+ panic("%s:%d could not set pte %p(%p) to %p(%p)\n",__FILE__,__LINE__,pte,kv_to_ma(pte), ptoa(mfn), pa_to_ma(ptoa(mfn)));
+}
+#endif /* MACH_XEN */
+
/*
* Deallocate a page-table page.
* The page-table page must have all mappings removed,
@@ -884,6 +1120,13 @@ pmap_t pmap_create(size)
panic("pmap_create");
memcpy(p->dirbase, kernel_page_dir, PDPNUM * INTEL_PGBYTES);
+#ifdef MACH_XEN
+ {
+ int i;
+ for (i = 0; i < PDPNUM; i++)
+ pmap_set_page_readonly((void*) p->dirbase + i * INTEL_PGBYTES);
+ }
+#endif /* MACH_XEN */
#if PAE
if (kmem_alloc_wired(kernel_map,
@@ -895,6 +1138,9 @@ pmap_t pmap_create(size)
for (i = 0; i < PDPNUM; i++)
WRITE_PTE(&p->pdpbase[i], pa_to_pte(kvtophys((vm_offset_t) p->dirbase + i * INTEL_PGBYTES)) | INTEL_PTE_VALID);
}
+#ifdef MACH_XEN
+ pmap_set_page_readonly(p->pdpbase);
+#endif /* MACH_XEN */
#endif /* PAE */
p->ref_count = 1;
@@ -954,14 +1200,29 @@ void pmap_destroy(p)
if (m == VM_PAGE_NULL)
panic("pmap_destroy: pte page not in object");
vm_page_lock_queues();
+#ifdef MACH_XEN
+ if (!hyp_mmuext_op_mfn (MMUEXT_UNPIN_TABLE, pa_to_mfn(pa)))
+ panic("pmap_destroy: couldn't unpin page %p(%p)\n", pa, kv_to_ma(pa));
+ pmap_set_page_readwrite((void*) phystokv(pa));
+#endif /* MACH_XEN */
vm_page_free(m);
inuse_ptepages_count--;
vm_page_unlock_queues();
vm_object_unlock(pmap_object);
}
}
+#ifdef MACH_XEN
+ {
+ int i;
+ for (i = 0; i < PDPNUM; i++)
+ pmap_set_page_readwrite((void*) p->dirbase + i * INTEL_PGBYTES);
+ }
+#endif /* MACH_XEN */
kmem_free(kernel_map, (vm_offset_t)p->dirbase, PDPNUM * INTEL_PGBYTES);
#if PAE
+#ifdef MACH_XEN
+ pmap_set_page_readwrite(p->pdpbase);
+#endif /* MACH_XEN */
kmem_free(kernel_map, (vm_offset_t)p->pdpbase, INTEL_PGBYTES);
#endif /* PAE */
zfree(pmap_zone, (vm_offset_t) p);
@@ -1007,6 +1268,10 @@ void pmap_remove_range(pmap, va, spte, epte)
int num_removed, num_unwired;
int pai;
vm_offset_t pa;
+#ifdef MACH_XEN
+ int n, ii = 0;
+ struct mmu_update update[HYP_BATCH_MMU_UPDATES];
+#endif /* MACH_XEN */
#if DEBUG_PTE_PAGE
if (pmap != kernel_pmap)
@@ -1035,7 +1300,19 @@ void pmap_remove_range(pmap, va, spte, epte)
register int i = ptes_per_vm_page;
register pt_entry_t *lpte = cpte;
do {
+#ifdef MACH_XEN
+ update[ii].ptr = kv_to_ma(lpte);
+ update[ii].val = 0;
+ ii++;
+ if (ii == HYP_BATCH_MMU_UPDATES) {
+ hyp_mmu_update(kvtolin(&update), ii, kvtolin(&n), DOMID_SELF);
+ if (n != ii)
+ panic("couldn't pmap_remove_range\n");
+ ii = 0;
+ }
+#else /* MACH_XEN */
*lpte = 0;
+#endif /* MACH_XEN */
lpte++;
} while (--i > 0);
continue;
@@ -1056,7 +1333,19 @@ void pmap_remove_range(pmap, va, spte, epte)
do {
pmap_phys_attributes[pai] |=
*lpte & (PHYS_MODIFIED|PHYS_REFERENCED);
+#ifdef MACH_XEN
+ update[ii].ptr = kv_to_ma(lpte);
+ update[ii].val = 0;
+ ii++;
+ if (ii == HYP_BATCH_MMU_UPDATES) {
+ hyp_mmu_update(kvtolin(&update), ii, kvtolin(&n), DOMID_SELF);
+ if (n != ii)
+ panic("couldn't pmap_remove_range\n");
+ ii = 0;
+ }
+#else /* MACH_XEN */
*lpte = 0;
+#endif /* MACH_XEN */
lpte++;
} while (--i > 0);
}
@@ -1102,6 +1391,14 @@ void pmap_remove_range(pmap, va, spte, epte)
}
}
+#ifdef MACH_XEN
+ if (ii > HYP_BATCH_MMU_UPDATES)
+ panic("overflowed array in pmap_remove_range");
+ hyp_mmu_update(kvtolin(&update), ii, kvtolin(&n), DOMID_SELF);
+ if (n != ii)
+ panic("couldn't pmap_remove_range\n");
+#endif /* MACH_XEN */
+
/*
* Update the counts
*/
@@ -1246,7 +1543,12 @@ void pmap_page_protect(phys, prot)
do {
pmap_phys_attributes[pai] |=
*pte & (PHYS_MODIFIED|PHYS_REFERENCED);
+#ifdef MACH_XEN
+ if (!hyp_mmu_update_pte(kv_to_ma(pte++), 0))
+ panic("%s:%d could not clear pte %p\n",__FILE__,__LINE__,pte-1);
+#else /* MACH_XEN */
*pte++ = 0;
+#endif /* MACH_XEN */
} while (--i > 0);
}
@@ -1276,7 +1578,12 @@ void pmap_page_protect(phys, prot)
register int i = ptes_per_vm_page;
do {
+#ifdef MACH_XEN
+ if (!hyp_mmu_update_pte(kv_to_ma(pte), *pte & ~INTEL_PTE_WRITE))
+ panic("%s:%d could not enable write on pte %p\n",__FILE__,__LINE__,pte);
+#else /* MACH_XEN */
*pte &= ~INTEL_PTE_WRITE;
+#endif /* MACH_XEN */
pte++;
} while (--i > 0);
@@ -1365,11 +1672,36 @@ void pmap_protect(map, s, e, prot)
spte = &spte[ptenum(s)];
epte = &spte[intel_btop(l-s)];
+#ifdef MACH_XEN
+ int n, i = 0;
+ struct mmu_update update[HYP_BATCH_MMU_UPDATES];
+#endif /* MACH_XEN */
+
while (spte < epte) {
- if (*spte & INTEL_PTE_VALID)
+ if (*spte & INTEL_PTE_VALID) {
+#ifdef MACH_XEN
+ update[i].ptr = kv_to_ma(spte);
+ update[i].val = *spte & ~INTEL_PTE_WRITE;
+ i++;
+ if (i == HYP_BATCH_MMU_UPDATES) {
+ hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF);
+ if (n != i)
+ panic("couldn't pmap_protect\n");
+ i = 0;
+ }
+#else /* MACH_XEN */
*spte &= ~INTEL_PTE_WRITE;
+#endif /* MACH_XEN */
+ }
spte++;
}
+#ifdef MACH_XEN
+ if (i > HYP_BATCH_MMU_UPDATES)
+ panic("overflowed array in pmap_protect");
+ hyp_mmu_update(kvtolin(&update), i, kvtolin(&n), DOMID_SELF);
+ if (n != i)
+ panic("couldn't pmap_protect\n");
+#endif /* MACH_XEN */
}
s = l;
pde++;
@@ -1412,6 +1744,8 @@ if (pmap_debug) printf("pmap(%x, %x)\n", v, pa);
if (pmap == PMAP_NULL)
return;
+ if (pmap == kernel_pmap && (v < kernel_virtual_start || v >= kernel_virtual_end))
+ panic("pmap_enter(%p, %p) falls in physical memory area!\n", v, pa);
if (pmap == kernel_pmap && (prot & VM_PROT_WRITE) == 0
&& !wired /* hack for io_wire */ ) {
/*
@@ -1502,9 +1836,20 @@ Retry:
/*XX pdp = &pmap->dirbase[pdenum(v) & ~(i-1)];*/
pdp = pmap_pde(pmap, v);
do {
+#ifdef MACH_XEN
+ pmap_set_page_readonly((void *) ptp);
+ if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn(ptp)))
+ panic("couldn't pin page %p(%p)\n",ptp,kv_to_ma(ptp));
+ if (!hyp_mmu_update_pte(pa_to_ma(kvtophys((vm_offset_t)pdp)),
+ pa_to_pte(pa_to_ma(kvtophys(ptp))) | INTEL_PTE_VALID
+ | INTEL_PTE_USER
+ | INTEL_PTE_WRITE))
+ panic("%s:%d could not set pde %p(%p,%p) to %p(%p,%p) %p\n",__FILE__,__LINE__, pdp, kvtophys((vm_offset_t)pdp), pa_to_ma(kvtophys((vm_offset_t)pdp)), ptp, kvtophys(ptp), pa_to_ma(kvtophys(ptp)), pa_to_pte(kv_to_ma(ptp)));
+#else /* MACH_XEN */
*pdp = pa_to_pte(ptp) | INTEL_PTE_VALID
| INTEL_PTE_USER
| INTEL_PTE_WRITE;
+#endif /* MACH_XEN */
pdp++;
ptp += INTEL_PGBYTES;
} while (--i > 0);
@@ -1544,7 +1889,12 @@ Retry:
do {
if (*pte & INTEL_PTE_MOD)
template |= INTEL_PTE_MOD;
+#ifdef MACH_XEN
+ if (!hyp_mmu_update_pte(kv_to_ma(pte), pa_to_ma(template)))
+ panic("%s:%d could not set pte %p to %p\n",__FILE__,__LINE__,pte,template);
+#else /* MACH_XEN */
WRITE_PTE(pte, template)
+#endif /* MACH_XEN */
pte++;
pte_increment_pa(template);
} while (--i > 0);
@@ -1649,7 +1999,12 @@ Retry:
template |= INTEL_PTE_WIRED;
i = ptes_per_vm_page;
do {
+#ifdef MACH_XEN
+ if (!(hyp_mmu_update_pte(kv_to_ma(pte), pa_to_ma(template))))
+ panic("%s:%d could not set pte %p to %p\n",__FILE__,__LINE__,pte,template);
+#else /* MACH_XEN */
WRITE_PTE(pte, template)
+#endif /* MACH_XEN */
pte++;
pte_increment_pa(template);
} while (--i > 0);
@@ -1704,7 +2059,12 @@ void pmap_change_wiring(map, v, wired)
map->stats.wired_count--;
i = ptes_per_vm_page;
do {
+#ifdef MACH_XEN
+ if (!(hyp_mmu_update_pte(kv_to_ma(pte), *pte & ~INTEL_PTE_WIRED)))
+ panic("%s:%d could not wire down pte %p\n",__FILE__,__LINE__,pte);
+#else /* MACH_XEN */
*pte &= ~INTEL_PTE_WIRED;
+#endif /* MACH_XEN */
pte++;
} while (--i > 0);
}
@@ -1835,7 +2195,17 @@ void pmap_collect(p)
register int i = ptes_per_vm_page;
register pt_entry_t *pdep = pdp;
do {
+#ifdef MACH_XEN
+ unsigned long pte = *pdep;
+ void *ptable = (void*) ptetokv(pte);
+ if (!(hyp_mmu_update_pte(pa_to_ma(kvtophys((vm_offset_t)pdep++)), 0)))
+ panic("%s:%d could not clear pde %p\n",__FILE__,__LINE__,pdep-1);
+ if (!hyp_mmuext_op_mfn (MMUEXT_UNPIN_TABLE, kv_to_mfn(ptable)))
+ panic("couldn't unpin page %p(%p)\n", ptable, pa_to_ma(kvtophys((vm_offset_t)ptable)));
+ pmap_set_page_readwrite(ptable);
+#else /* MACH_XEN */
*pdep++ = 0;
+#endif /* MACH_XEN */
} while (--i > 0);
}
@@ -2052,7 +2422,12 @@ phys_attribute_clear(phys, bits)
{
register int i = ptes_per_vm_page;
do {
+#ifdef MACH_XEN
+ if (!(hyp_mmu_update_pte(kv_to_ma(pte), *pte & ~bits)))
+ panic("%s:%d could not clear bits %lx from pte %p\n",__FILE__,__LINE__,bits,pte);
+#else /* MACH_XEN */
*pte &= ~bits;
+#endif /* MACH_XEN */
} while (--i > 0);
}
PMAP_UPDATE_TLBS(pmap, va, va + PAGE_SIZE);
@@ -2413,7 +2788,12 @@ pmap_unmap_page_zero ()
if (!pte)
return;
assert (pte);
+#ifdef MACH_XEN
+ if (!hyp_mmu_update_pte(kv_to_ma(pte), 0))
+ printf("couldn't unmap page 0\n");
+#else /* MACH_XEN */
*pte = 0;
INVALIDATE_TLB(kernel_pmap, 0, PAGE_SIZE);
+#endif /* MACH_XEN */
}
#endif /* i386 */
diff --git a/i386/intel/pmap.h b/i386/intel/pmap.h
index 7354a0f..a2b6442 100644
--- a/i386/intel/pmap.h
+++ b/i386/intel/pmap.h
@@ -126,12 +126,21 @@ typedef unsigned int pt_entry_t;
#define INTEL_PTE_NCACHE 0x00000010
#define INTEL_PTE_REF 0x00000020
#define INTEL_PTE_MOD 0x00000040
+#ifdef MACH_XEN
+/* Not supported */
+#define INTEL_PTE_GLOBAL 0x00000000
+#else /* MACH_XEN */
#define INTEL_PTE_GLOBAL 0x00000100
+#endif /* MACH_XEN */
#define INTEL_PTE_WIRED 0x00000200
#define INTEL_PTE_PFN 0xfffff000
#define pa_to_pte(a) ((a) & INTEL_PTE_PFN)
+#ifdef MACH_PSEUDO_PHYS
+#define pte_to_pa(p) ma_to_pa((p) & INTEL_PTE_PFN)
+#else /* MACH_PSEUDO_PHYS */
#define pte_to_pa(p) ((p) & INTEL_PTE_PFN)
+#endif /* MACH_PSEUDO_PHYS */
#define pte_increment_pa(p) ((p) += INTEL_OFFMASK+1)
/*
@@ -159,6 +168,14 @@ typedef struct pmap *pmap_t;
#define PMAP_NULL ((pmap_t) 0)
+#ifdef MACH_XEN
+extern void pmap_set_page_readwrite(void *addr);
+extern void pmap_set_page_readonly(void *addr);
+extern void pmap_set_page_readonly_init(void *addr);
+extern void pmap_map_mfn(void *addr, unsigned long mfn);
+extern void pmap_clear_bootstrap_pagetable(pt_entry_t *addr);
+#endif /* MACH_XEN */
+
#if PAE
#define set_pmap(pmap) set_cr3(kvtophys((vm_offset_t)(pmap)->pdpbase))
#else /* PAE */
diff --git a/i386/xen/Makefrag.am b/i386/xen/Makefrag.am
new file mode 100644
index 0000000..b15b7db
--- /dev/null
+++ b/i386/xen/Makefrag.am
@@ -0,0 +1,33 @@
+# Makefile fragment for the ix86 specific part of the Xen platform.
+
+# Copyright (C) 2007 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; either version 2, or (at your option) any later
+# version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+#
+# Xen support.
+#
+
+libkernel_a_SOURCES += \
+ i386/xen/xen.c \
+ i386/xen/xen_locore.S \
+ i386/xen/xen_boothdr.S
+
+
+if PLATFORM_xen
+gnumach_LINKFLAGS += \
+ --defsym _START=0x20000000 \
+ -T '$(srcdir)'/i386/ldscript
+endif
diff --git a/i386/xen/xen.c b/i386/xen/xen.c
new file mode 100644
index 0000000..aa3c2cc
--- /dev/null
+++ b/i386/xen/xen.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <kern/printf.h>
+#include <kern/debug.h>
+
+#include <mach/machine/eflags.h>
+#include <machine/thread.h>
+#include <machine/ipl.h>
+
+#include <machine/model_dep.h>
+
+unsigned long cr3;
+
+struct failsafe_callback_regs {
+ unsigned int ds;
+ unsigned int es;
+ unsigned int fs;
+ unsigned int gs;
+ unsigned int ip;
+ unsigned int cs_and_mask;
+ unsigned int flags;
+};
+
+void hyp_failsafe_c_callback(struct failsafe_callback_regs *regs) {
+ printf("Fail-Safe callback!\n");
+ printf("IP: %08X CS: %4X DS: %4X ES: %4X FS: %4X GS: %4X FLAGS %08X MASK %04X\n", regs->ip, regs->cs_and_mask & 0xffff, regs->ds, regs->es, regs->fs, regs->gs, regs->flags, regs->cs_and_mask >> 16);
+ panic("failsafe");
+}
+
+extern void clock_interrupt();
+extern void return_to_iret;
+
+void hypclock_machine_intr(int old_ipl, void *ret_addr, struct i386_interrupt_state *regs, unsigned64_t delta) {
+ if (ret_addr == &return_to_iret) {
+ clock_interrupt(delta/1000, /* usec per tick */
+ (regs->efl & EFL_VM) || /* user mode */
+ ((regs->cs & 0x02) != 0), /* user mode */
+ old_ipl == SPL0); /* base priority */
+ } else
+ clock_interrupt(delta/1000, FALSE, FALSE);
+}
+
+void hyp_p2m_init(void) {
+ unsigned long nb_pfns = atop(phys_last_addr);
+#ifdef MACH_PSEUDO_PHYS
+#define P2M_PAGE_ENTRIES (PAGE_SIZE / sizeof(unsigned long))
+ unsigned long *l3 = (unsigned long *)phystokv(pmap_grab_page()), *l2 = NULL;
+ unsigned long i;
+
+ for (i = 0; i < (nb_pfns + P2M_PAGE_ENTRIES) / P2M_PAGE_ENTRIES; i++) {
+ if (!(i % P2M_PAGE_ENTRIES)) {
+ l2 = (unsigned long *) phystokv(pmap_grab_page());
+ l3[i / P2M_PAGE_ENTRIES] = kv_to_mfn(l2);
+ }
+ l2[i % P2M_PAGE_ENTRIES] = kv_to_mfn(&mfn_list[i * P2M_PAGE_ENTRIES]);
+ }
+
+ hyp_shared_info.arch.pfn_to_mfn_frame_list_list = kv_to_mfn(l3);
+#endif
+ hyp_shared_info.arch.max_pfn = nb_pfns;
+}
diff --git a/i386/xen/xen_boothdr.S b/i386/xen/xen_boothdr.S
new file mode 100644
index 0000000..3d84e0c
--- /dev/null
+++ b/i386/xen/xen_boothdr.S
@@ -0,0 +1,167 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <xen/public/elfnote.h>
+
+.section __xen_guest
+ .ascii "GUEST_OS=GNU Mach"
+ .ascii ",GUEST_VERSION=1.3"
+ .ascii ",XEN_VER=xen-3.0"
+ .ascii ",VIRT_BASE=0x20000000"
+ .ascii ",ELF_PADDR_OFFSET=0x20000000"
+ .ascii ",HYPERCALL_PAGE=0x2"
+#if PAE
+ .ascii ",PAE=yes"
+#else
+ .ascii ",PAE=no"
+#endif
+ .ascii ",LOADER=generic"
+#ifndef MACH_PSEUDO_PHYS
+ .ascii ",FEATURES=!auto_translated_physmap"
+#endif
+ .byte 0
+
+/* Macro taken from linux/include/linux/elfnote.h */
+#define ELFNOTE(name, type, desctype, descdata) \
+.pushsection .note.name ; \
+ .align 4 ; \
+ .long 2f - 1f /* namesz */ ; \
+ .long 4f - 3f /* descsz */ ; \
+ .long type ; \
+1:.asciz "name" ; \
+2:.align 4 ; \
+3:desctype descdata ; \
+4:.align 4 ; \
+.popsection ;
+
+ ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "GNU Mach")
+ ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "1.3")
+ ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0")
+ ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .long, _START)
+ ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .long, _START)
+ ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .long, start)
+ ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .long, hypcalls)
+#if PAE
+ ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes")
+#else
+ ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "no")
+#endif
+ ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic")
+ ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, ""
+#ifndef MACH_PSEUDO_PHYS
+ "!auto_translated_physmap"
+#endif
+ )
+
+#include <mach/machine/asm.h>
+
+#include <i386/i386/i386asm.h>
+
+ .text
+ .globl gdt, ldt
+ .globl start, _start, gdt
+start:
+_start:
+
+ /* Switch to our own interrupt stack. */
+ movl $(_intstack+INTSTACK_SIZE),%eax
+ movl %eax,%esp
+
+ /* Reset EFLAGS to a known state. */
+ pushl $0
+ popf
+
+ /* Push the start_info pointer to be the second argument. */
+ subl $KERNELBASE,%esi
+ pushl %esi
+
+ /* Jump into C code. */
+ call EXT(c_boot_entry)
+
+/* Those need to be aligned on page boundaries. */
+.global hyp_shared_info, hypcalls
+
+ .org (start + 0x1000)
+hyp_shared_info:
+ .org hyp_shared_info + 0x1000
+
+/* Labels just for debuggers */
+#define hypcall(name, n) \
+ .org hypcalls + n*32 ; \
+__hyp_##name:
+
+hypcalls:
+ hypcall(set_trap_table, 0)
+ hypcall(mmu_update, 1)
+ hypcall(set_gdt, 2)
+ hypcall(stack_switch, 3)
+ hypcall(set_callbacks, 4)
+ hypcall(fpu_taskswitch, 5)
+ hypcall(sched_op_compat, 6)
+ hypcall(platform_op, 7)
+ hypcall(set_debugreg, 8)
+ hypcall(get_debugreg, 9)
+ hypcall(update_descriptor, 10)
+ hypcall(memory_op, 12)
+ hypcall(multicall, 13)
+ hypcall(update_va_mapping, 14)
+ hypcall(set_timer_op, 15)
+ hypcall(event_channel_op_compat, 16)
+ hypcall(xen_version, 17)
+ hypcall(console_io, 18)
+ hypcall(physdev_op_compat, 19)
+ hypcall(grant_table_op, 20)
+ hypcall(vm_assist, 21)
+ hypcall(update_va_mapping_otherdomain, 22)
+ hypcall(iret, 23)
+ hypcall(vcpu_op, 24)
+ hypcall(set_segment_base, 25)
+ hypcall(mmuext_op, 26)
+ hypcall(acm_op, 27)
+ hypcall(nmi_op, 28)
+ hypcall(sched_op, 29)
+ hypcall(callback_op, 30)
+ hypcall(xenoprof_op, 31)
+ hypcall(event_channel_op, 32)
+ hypcall(physdev_op, 33)
+ hypcall(hvm_op, 34)
+ hypcall(sysctl, 35)
+ hypcall(domctl, 36)
+ hypcall(kexec_op, 37)
+
+ hypcall(arch_0, 48)
+ hypcall(arch_1, 49)
+ hypcall(arch_2, 50)
+ hypcall(arch_3, 51)
+ hypcall(arch_4, 52)
+ hypcall(arch_5, 53)
+ hypcall(arch_6, 54)
+ hypcall(arch_7, 55)
+
+ .org hypcalls + 0x1000
+
+gdt:
+ .org gdt + 0x1000
+
+ldt:
+ .org ldt + 0x1000
+
+stack:
+ .long _intstack+INTSTACK_SIZE,0xe021
+ .comm _intstack,INTSTACK_SIZE
+
diff --git a/i386/xen/xen_locore.S b/i386/xen/xen_locore.S
new file mode 100644
index 0000000..51f823f
--- /dev/null
+++ b/i386/xen/xen_locore.S
@@ -0,0 +1,110 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <mach/machine/asm.h>
+
+#include <i386/i386asm.h>
+#include <i386/cpu_number.h>
+#include <i386/xen.h>
+
+ .data 2
+int_active:
+ .long 0
+
+
+ .text
+ .globl hyp_callback, hyp_failsafe_callback
+ P2ALIGN(TEXT_ALIGN)
+hyp_callback:
+ pushl %eax
+ jmp EXT(all_intrs)
+
+ENTRY(interrupt)
+ incl int_active /* currently handling interrupts */
+ call EXT(hyp_c_callback) /* call generic interrupt routine */
+ decl int_active /* stopped handling interrupts */
+ sti
+ ret
+
+/* FIXME: if we're _very_ unlucky, we may be re-interrupted, filling stack
+ *
+ * Far from trivial, see mini-os. That said, maybe we could just, before poping
+ * everything (which is _not_ destructive), save sp into a known place and use
+ * it+jmp back?
+ *
+ * Mmm, there seems to be an iret hypcall that does exactly what we want:
+ * perform iret, and if IF is set, clear the interrupt mask.
+ */
+
+/* Pfff, we have to check pending interrupts ourselves. Some other DomUs just make an hypercall for retriggering the irq. Not sure it's really easier/faster */
+ENTRY(hyp_sti)
+ pushl %ebp
+ movl %esp, %ebp
+_hyp_sti:
+ movb $0,hyp_shared_info+CPU_CLI /* Enable interrupts */
+ cmpl $0,int_active /* Check whether we were already checking pending interrupts */
+ jz 0f
+ popl %ebp
+ ret /* Already active, just return */
+0:
+ /* Not active, check pending interrupts by hand */
+ /* no memory barrier needed on x86 */
+ cmpb $0,hyp_shared_info+CPU_PENDING
+ jne 0f
+ popl %ebp
+ ret
+0:
+ movb $0xff,hyp_shared_info+CPU_CLI
+1:
+ pushl %eax
+ pushl %ecx
+ pushl %edx
+ incl int_active /* currently handling interrupts */
+
+ pushl $0
+ pushl $0
+ call EXT(hyp_c_callback)
+ popl %edx
+ popl %edx
+
+ popl %edx
+ popl %ecx
+ popl %eax
+ decl int_active /* stopped handling interrupts */
+ cmpb $0,hyp_shared_info+CPU_PENDING
+ jne 1b
+ jmp _hyp_sti
+
+/* Hypervisor failed to reload segments. Dump them. */
+hyp_failsafe_callback:
+#if 1
+ /* load sane segments */
+ mov %ss, %ax
+ mov %ax, %ds
+ mov %ax, %es
+ mov %ax, %fs
+ mov %ax, %gs
+ push %esp
+ call EXT(hyp_failsafe_c_callback)
+#else
+ popl %ds
+ popl %es
+ popl %fs
+ popl %gs
+ iret
+#endif
diff --git a/include/mach/xen.h b/include/mach/xen.h
new file mode 100644
index 0000000..f954701
--- /dev/null
+++ b/include/mach/xen.h
@@ -0,0 +1,80 @@
+
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _MACH_XEN_H
+#define _MACH_XEN_H
+#ifdef MACH_XEN
+#include <sys/types.h>
+#include <xen/public/xen.h>
+#include <i386/vm_param.h>
+
+extern struct start_info boot_info;
+
+extern volatile struct shared_info hyp_shared_info;
+
+/* Memory translations */
+
+/* pa are physical addresses, from 0 to size of memory */
+/* ma are machine addresses, i.e. _real_ hardware adresses */
+/* la are linear addresses, i.e. without segmentation */
+
+/* This might also be useful out of Xen */
+#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+extern unsigned long la_shift;
+#else
+#define la_shift LINEAR_MIN_KERNEL_ADDRESS
+#endif
+#define la_to_pa(a) ((vm_offset_t)(((vm_offset_t)(a)) - la_shift))
+#define pa_to_la(a) ((vm_offset_t)(((vm_offset_t)(a)) + la_shift))
+
+#define kv_to_la(a) pa_to_la(_kvtophys(a))
+#define la_to_kv(a) phystokv(la_to_pa(a))
+
+#ifdef MACH_PSEUDO_PHYS
+#if PAE
+#define PFN_LIST MACH2PHYS_VIRT_START_PAE
+#else
+#define PFN_LIST MACH2PHYS_VIRT_START_NONPAE
+#endif
+#if VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+extern unsigned long *pfn_list;
+#else
+#define pfn_list ((unsigned long *) PFN_LIST)
+#endif
+#define mfn_to_pfn(n) (pfn_list[n])
+
+extern unsigned long *mfn_list;
+#define pfn_to_mfn(n) (mfn_list[n])
+#else
+#define mfn_to_pfn(n) (n)
+#define pfn_to_mfn(n) (n)
+#endif /* MACH_PSEUDO_PHYS */
+
+#define pa_to_mfn(a) (pfn_to_mfn(atop(a)))
+#define pa_to_ma(a) ({ vm_offset_t __a = (vm_offset_t) (a); ptoa(pa_to_mfn(__a)) | (__a & PAGE_MASK); })
+#define ma_to_pa(a) ({ vm_offset_t __a = (vm_offset_t) (a); (mfn_to_pfn(atop((vm_offset_t)(__a))) << PAGE_SHIFT) | (__a & PAGE_MASK); })
+
+#define kv_to_mfn(a) pa_to_mfn(_kvtophys(a))
+#define kv_to_ma(a) pa_to_ma(_kvtophys(a))
+#define mfn_to_kv(mfn) (phystokv(ma_to_pa(ptoa(mfn))))
+
+#include <machine/xen.h>
+
+#endif /* MACH_XEN */
+#endif /* _MACH_XEN_H */
diff --git a/kern/bootstrap.c b/kern/bootstrap.c
index b7db2df..cf10d67 100644
--- a/kern/bootstrap.c
+++ b/kern/bootstrap.c
@@ -63,7 +63,12 @@
#else
#include <mach/machine/multiboot.h>
#include <mach/exec/exec.h>
+#ifdef MACH_XEN
+#include <mach/xen.h>
+extern struct start_info boot_info; /* XXX put this in a header! */
+#else /* MACH_XEN */
extern struct multiboot_info boot_info; /* XXX put this in a header! */
+#endif /* MACH_XEN */
#endif
#include "boot_script.h"
@@ -101,10 +106,23 @@ task_insert_send_right(
void bootstrap_create()
{
+ int compat;
+#ifdef MACH_XEN
+ struct multiboot_module *bmods = ((struct multiboot_module *)
+ boot_info.mod_start);
+ int n;
+ for (n = 0; bmods[n].mod_start; n++) {
+ bmods[n].mod_start = kvtophys(bmods[n].mod_start + (vm_offset_t) bmods);
+ bmods[n].mod_end = kvtophys(bmods[n].mod_end + (vm_offset_t) bmods);
+ bmods[n].string = kvtophys(bmods[n].string + (vm_offset_t) bmods);
+ }
+ boot_info.mods_count = n;
+ boot_info.flags |= MULTIBOOT_MODS;
+#else /* MACH_XEN */
struct multiboot_module *bmods = ((struct multiboot_module *)
phystokv(boot_info.mods_addr));
- int compat;
+#endif /* MACH_XEN */
if (!(boot_info.flags & MULTIBOOT_MODS)
|| (boot_info.mods_count == 0))
panic ("No bootstrap code loaded with the kernel!");
diff --git a/kern/debug.c b/kern/debug.c
index 67b04ad..57a8126 100644
--- a/kern/debug.c
+++ b/kern/debug.c
@@ -24,6 +24,8 @@
* the rights to redistribute these changes.
*/
+#include <mach/xen.h>
+
#include <kern/printf.h>
#include <stdarg.h>
@@ -164,6 +166,9 @@ panic(const char *s, ...)
#if MACH_KDB
Debugger("panic");
#else
+# ifdef MACH_HYP
+ hyp_crash();
+# else
/* Give the user time to see the message */
{
int i = 1000; /* seconds */
@@ -172,6 +177,7 @@ panic(const char *s, ...)
}
halt_all_cpus (reboot_on_panic);
+# endif /* MACH_HYP */
#endif
}
diff --git a/linux/dev/include/asm-i386/segment.h b/linux/dev/include/asm-i386/segment.h
index c7b3ff5..300ba53 100644
--- a/linux/dev/include/asm-i386/segment.h
+++ b/linux/dev/include/asm-i386/segment.h
@@ -3,8 +3,13 @@
#ifdef MACH
+#ifdef MACH_HYP
+#define KERNEL_CS 0x09
+#define KERNEL_DS 0x11
+#else /* MACH_HYP */
#define KERNEL_CS 0x08
#define KERNEL_DS 0x10
+#endif /* MACH_HYP */
#define USER_CS 0x17
#define USER_DS 0x1F
diff --git a/xen/Makefrag.am b/xen/Makefrag.am
new file mode 100644
index 0000000..c5792a9
--- /dev/null
+++ b/xen/Makefrag.am
@@ -0,0 +1,32 @@
+# Makefile fragment for the Xen platform.
+
+# Copyright (C) 2007 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; either version 2, or (at your option) any later
+# version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+#
+# Xen support.
+#
+
+libkernel_a_SOURCES += \
+ xen/block.c \
+ xen/console.c \
+ xen/evt.c \
+ xen/grant.c \
+ xen/net.c \
+ xen/ring.c \
+ xen/store.c \
+ xen/time.c \
+ xen/xen.c
diff --git a/xen/block.c b/xen/block.c
new file mode 100644
index 0000000..3c188bf
--- /dev/null
+++ b/xen/block.c
@@ -0,0 +1,689 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <mach/mig_errors.h>
+#include <ipc/ipc_port.h>
+#include <ipc/ipc_space.h>
+#include <vm/vm_kern.h>
+#include <vm/vm_user.h>
+#include <device/device_types.h>
+#include <device/device_port.h>
+#include <device/disk_status.h>
+#include <device/device_reply.user.h>
+#include <device/device_emul.h>
+#include <device/ds_routines.h>
+#include <xen/public/io/blkif.h>
+#include <xen/evt.h>
+#include <string.h>
+#include <util/atoi.h>
+#include "store.h"
+#include "block.h"
+#include "grant.h"
+#include "ring.h"
+#include "xen.h"
+
+/* Hypervisor part */
+
+struct block_data {
+ struct device device;
+ char *name;
+ int open_count;
+ char *backend;
+ domid_t domid;
+ char *vbd;
+ int handle;
+ unsigned info;
+ dev_mode_t mode;
+ unsigned sector_size;
+ unsigned long nr_sectors;
+ ipc_port_t port;
+ blkif_front_ring_t ring;
+ evtchn_port_t evt;
+ simple_lock_data_t lock;
+ simple_lock_data_t pushlock;
+};
+
+static int n_vbds;
+static struct block_data *vbd_data;
+
+struct device_emulation_ops hyp_block_emulation_ops;
+
+static void hyp_block_intr(int unit) {
+ struct block_data *bd = &vbd_data[unit];
+ blkif_response_t *rsp;
+ int more;
+ io_return_t *err;
+
+ simple_lock(&bd->lock);
+ more = RING_HAS_UNCONSUMED_RESPONSES(&bd->ring);
+ while (more) {
+ rmb(); /* make sure we see responses */
+ rsp = RING_GET_RESPONSE(&bd->ring, bd->ring.rsp_cons++);
+ err = (void *) (unsigned long) rsp->id;
+ switch (rsp->status) {
+ case BLKIF_RSP_ERROR:
+ *err = D_IO_ERROR;
+ break;
+ case BLKIF_RSP_OKAY:
+ break;
+ default:
+ printf("Unrecognized blkif status %d\n", rsp->status);
+ goto drop;
+ }
+ thread_wakeup(err);
+drop:
+ thread_wakeup_one(bd);
+ RING_FINAL_CHECK_FOR_RESPONSES(&bd->ring, more);
+ }
+ simple_unlock(&bd->lock);
+}
+
+#define VBD_PATH "device/vbd"
+void hyp_block_init(void) {
+ char **vbds, **vbd;
+ char *c;
+ int i, disk, partition;
+ int n;
+ int grant;
+ char port_name[10];
+ char *prefix;
+ char device_name[32];
+ domid_t domid;
+ evtchn_port_t evt;
+ hyp_store_transaction_t t;
+ vm_offset_t addr;
+ struct block_data *bd;
+ blkif_sring_t *ring;
+
+ vbds = hyp_store_ls(0, 1, VBD_PATH);
+ if (!vbds) {
+ printf("hd: No block device (%s). Hoping you don't need any\n", hyp_store_error);
+ n_vbds = 0;
+ return;
+ }
+
+ n = 0;
+ for (vbd = vbds; *vbd; vbd++)
+ n++;
+
+ vbd_data = (void*) kalloc(n * sizeof(*vbd_data));
+ if (!vbd_data) {
+ printf("hd: No memory room for VBD\n");
+ n_vbds = 0;
+ return;
+ }
+ n_vbds = n;
+
+ for (n = 0; n < n_vbds; n++) {
+ bd = &vbd_data[n];
+ mach_atoi((u_char *) vbds[n], &bd->handle);
+ if (bd->handle == MACH_ATOI_DEFAULT)
+ continue;
+
+ bd->open_count = -2;
+ bd->vbd = vbds[n];
+
+ /* Get virtual number. */
+ i = hyp_store_read_int(0, 5, VBD_PATH, "/", vbds[n], "/", "virtual-device");
+ if (i == -1)
+ panic("hd: couldn't virtual device of VBD %s\n",vbds[n]);
+ if ((i >> 28) == 1) {
+ /* xvd, new format */
+ prefix = "xvd";
+ disk = (i >> 8) & ((1 << 20) - 1);
+ partition = i & ((1 << 8) - 1);
+ } else if ((i >> 8) == 202) {
+ /* xvd, old format */
+ prefix = "xvd";
+ disk = (i >> 4) & ((1 << 4) - 1);
+ partition = i & ((1 << 4) - 1);
+ } else if ((i >> 8) == 8) {
+ /* SCSI */
+ prefix = "sd";
+ disk = (i >> 4) & ((1 << 4) - 1);
+ partition = i & ((1 << 4) - 1);
+ } else if ((i >> 8) == 22) {
+ /* IDE secondary */
+ prefix = "hd";
+ disk = ((i >> 6) & ((1 << 2) - 1)) + 2;
+ partition = i & ((1 << 6) - 1);
+ } else if ((i >> 8) == 3) {
+ /* IDE primary */
+ prefix = "hd";
+ disk = (i >> 6) & ((1 << 2) - 1);
+ partition = i & ((1 << 6) - 1);
+ }
+ if (partition)
+ sprintf(device_name, "%s%us%u", prefix, disk, partition);
+ else
+ sprintf(device_name, "%s%u", prefix, disk);
+ bd->name = (char*) kalloc(strlen(device_name));
+ strcpy(bd->name, device_name);
+
+ /* Get domain id of backend driver. */
+ i = hyp_store_read_int(0, 5, VBD_PATH, "/", vbds[n], "/", "backend-id");
+ if (i == -1)
+ panic("%s: couldn't read backend domid (%s)", device_name, hyp_store_error);
+ bd->domid = domid = i;
+
+ do {
+ t = hyp_store_transaction_start();
+
+ /* Get a page for ring */
+ if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS)
+ panic("%s: couldn't allocate space for store ring\n", device_name);
+ ring = (void*) addr;
+ SHARED_RING_INIT(ring);
+ FRONT_RING_INIT(&bd->ring, ring, PAGE_SIZE);
+ grant = hyp_grant_give(domid, atop(kvtophys(addr)), 0);
+
+ /* and give it to backend. */
+ i = sprintf(port_name, "%u", grant);
+ c = hyp_store_write(t, port_name, 5, VBD_PATH, "/", vbds[n], "/", "ring-ref");
+ if (!c)
+ panic("%s: couldn't store ring reference (%s)", device_name, hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ /* Allocate an event channel and give it to backend. */
+ bd->evt = evt = hyp_event_channel_alloc(domid);
+ hyp_evt_handler(evt, hyp_block_intr, n, SPL7);
+ i = sprintf(port_name, "%lu", evt);
+ c = hyp_store_write(t, port_name, 5, VBD_PATH, "/", vbds[n], "/", "event-channel");
+ if (!c)
+ panic("%s: couldn't store event channel (%s)", device_name, hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+ c = hyp_store_write(t, hyp_store_state_initialized, 5, VBD_PATH, "/", vbds[n], "/", "state");
+ if (!c)
+ panic("%s: couldn't store state (%s)", device_name, hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+ } while (!hyp_store_transaction_stop(t));
+
+ c = hyp_store_read(0, 5, VBD_PATH, "/", vbds[n], "/", "backend");
+ if (!c)
+ panic("%s: couldn't get path to backend (%s)", device_name, hyp_store_error);
+ bd->backend = c;
+
+ while(1) {
+ i = hyp_store_read_int(0, 3, bd->backend, "/", "state");
+ if (i == MACH_ATOI_DEFAULT)
+ panic("can't read state from %s", bd->backend);
+ if (i == XenbusStateConnected)
+ break;
+ hyp_yield();
+ }
+
+ i = hyp_store_read_int(0, 3, bd->backend, "/", "sectors");
+ if (i == -1)
+ panic("%s: couldn't get number of sectors (%s)", device_name, hyp_store_error);
+ bd->nr_sectors = i;
+
+ i = hyp_store_read_int(0, 3, bd->backend, "/", "sector-size");
+ if (i == -1)
+ panic("%s: couldn't get sector size (%s)", device_name, hyp_store_error);
+ if (i & ~(2*(i-1)+1))
+ panic("sector size %d is not a power of 2\n", i);
+ if (i > PAGE_SIZE || PAGE_SIZE % i != 0)
+ panic("%s: couldn't handle sector size %d with pages of size %d\n", device_name, i, PAGE_SIZE);
+ bd->sector_size = i;
+
+ i = hyp_store_read_int(0, 3, bd->backend, "/", "info");
+ if (i == -1)
+ panic("%s: couldn't get info (%s)", device_name, hyp_store_error);
+ bd->info = i;
+
+ c = hyp_store_read(0, 3, bd->backend, "/", "mode");
+ if (!c)
+ panic("%s: couldn't get backend's mode (%s)", device_name, hyp_store_error);
+ if ((c[0] == 'w') && !(bd->info & VDISK_READONLY))
+ bd->mode = D_READ|D_WRITE;
+ else
+ bd->mode = D_READ;
+
+ c = hyp_store_read(0, 3, bd->backend, "/", "params");
+ if (!c)
+ panic("%s: couldn't get backend's real device (%s)", device_name, hyp_store_error);
+
+ /* TODO: change suffix */
+ printf("%s: dom%d's VBD %s (%s,%c%s) %ldMB\n", device_name, domid,
+ vbds[n], c, bd->mode & D_WRITE ? 'w' : 'r',
+ bd->info & VDISK_CDROM ? ", cdrom" : "",
+ bd->nr_sectors / ((1<<20) / 512));
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ c = hyp_store_write(0, hyp_store_state_connected, 5, VBD_PATH, "/", bd->vbd, "/", "state");
+ if (!c)
+ panic("couldn't store state for %s (%s)", device_name, hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ bd->open_count = -1;
+ bd->device.emul_ops = &hyp_block_emulation_ops;
+ bd->device.emul_data = bd;
+ simple_lock_init(&bd->lock);
+ simple_lock_init(&bd->pushlock);
+ }
+}
+
+static ipc_port_t
+dev_to_port(void *d)
+{
+ struct block_data *b = d;
+ if (!d)
+ return IP_NULL;
+ return ipc_port_make_send(b->port);
+}
+
+static int
+device_close(void *devp)
+{
+ struct block_data *bd = devp;
+ if (--bd->open_count < 0)
+ panic("too many closes on %s", bd->name);
+ printf("close, %s count %d\n", bd->name, bd->open_count);
+ if (bd->open_count)
+ return 0;
+ ipc_kobject_set(bd->port, IKO_NULL, IKOT_NONE);
+ ipc_port_dealloc_kernel(bd->port);
+ return 0;
+}
+
+static io_return_t
+device_open (ipc_port_t reply_port, mach_msg_type_name_t reply_port_type,
+ dev_mode_t mode, char *name, device_t *devp /* out */)
+{
+ int i, err = 0;
+ ipc_port_t port, notify;
+ struct block_data *bd;
+
+ for (i = 0; i < n_vbds; i++)
+ if (!strcmp(name, vbd_data[i].name))
+ break;
+
+ if (i == n_vbds)
+ return D_NO_SUCH_DEVICE;
+
+ bd = &vbd_data[i];
+ if (bd->open_count == -2)
+ /* couldn't be initialized */
+ return D_NO_SUCH_DEVICE;
+
+ if ((mode & D_WRITE) && !(bd->mode & D_WRITE))
+ return D_READ_ONLY;
+
+ if (bd->open_count >= 0) {
+ *devp = &bd->device ;
+ bd->open_count++ ;
+ printf("re-open, %s count %d\n", bd->name, bd->open_count);
+ return D_SUCCESS;
+ }
+
+ bd->open_count = 1;
+ printf("%s count %d\n", bd->name, bd->open_count);
+
+ port = ipc_port_alloc_kernel();
+ if (port == IP_NULL) {
+ err = KERN_RESOURCE_SHORTAGE;
+ goto out;
+ }
+ bd->port = port;
+
+ *devp = &bd->device;
+
+ ipc_kobject_set (port, (ipc_kobject_t) &bd->device, IKOT_DEVICE);
+
+ notify = ipc_port_make_sonce (bd->port);
+ ip_lock (bd->port);
+ ipc_port_nsrequest (bd->port, 1, notify, &notify);
+ assert (notify == IP_NULL);
+
+out:
+ if (IP_VALID (reply_port))
+ ds_device_open_reply (reply_port, reply_port_type, D_SUCCESS, port);
+ else
+ device_close(bd);
+ return MIG_NO_REPLY;
+}
+
+static io_return_t
+device_read (void *d, ipc_port_t reply_port,
+ mach_msg_type_name_t reply_port_type, dev_mode_t mode,
+ recnum_t bn, int count, io_buf_ptr_t *data,
+ unsigned *bytes_read)
+{
+ int resid, amt;
+ io_return_t err = 0;
+ vm_page_t pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+ grant_ref_t gref[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+ int nbpages;
+ vm_map_copy_t copy;
+ vm_offset_t offset, alloc_offset, o;
+ vm_object_t object;
+ vm_page_t m;
+ vm_size_t len, size;
+ struct block_data *bd = d;
+ struct blkif_request *req;
+
+ *data = 0;
+ *bytes_read = 0;
+
+ if (count < 0)
+ return D_INVALID_SIZE;
+ if (count == 0)
+ return 0;
+
+ /* Allocate an object to hold the data. */
+ size = round_page (count);
+ object = vm_object_allocate (size);
+ if (! object)
+ {
+ err = D_NO_MEMORY;
+ goto out;
+ }
+ alloc_offset = offset = 0;
+ resid = count;
+
+ while (resid && !err)
+ {
+ unsigned reqn;
+ int i;
+ int last_sect;
+
+ nbpages = 0;
+
+ /* Determine size of I/O this time around. */
+ len = round_page(offset + resid) - offset;
+ if (len > PAGE_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST)
+ len = PAGE_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+ /* Allocate pages. */
+ while (alloc_offset < offset + len)
+ {
+ while ((m = vm_page_grab (FALSE)) == 0)
+ VM_PAGE_WAIT (0);
+ assert (! m->active && ! m->inactive);
+ m->busy = TRUE;
+ assert(nbpages < BLKIF_MAX_SEGMENTS_PER_REQUEST);
+ pages[nbpages++] = m;
+ alloc_offset += PAGE_SIZE;
+ }
+
+ /* Do the read. */
+ amt = len;
+ if (amt > resid)
+ amt = resid;
+
+ /* allocate a request */
+ spl_t spl = splsched();
+ while(1) {
+ simple_lock(&bd->lock);
+ if (!RING_FULL(&bd->ring))
+ break;
+ thread_sleep(bd, &bd->lock, FALSE);
+ }
+ mb();
+ reqn = bd->ring.req_prod_pvt++;;
+ simple_lock(&bd->pushlock);
+ simple_unlock(&bd->lock);
+ (void) splx(spl);
+
+ req = RING_GET_REQUEST(&bd->ring, reqn);
+ req->operation = BLKIF_OP_READ;
+ req->nr_segments = nbpages;
+ req->handle = bd->handle;
+ req->id = (unsigned64_t) (unsigned long) &err; /* pointer on the stack */
+ req->sector_number = bn + offset / 512;
+ for (i = 0; i < nbpages; i++) {
+ req->seg[i].gref = gref[i] = hyp_grant_give(bd->domid, atop(pages[i]->phys_addr), 0);
+ req->seg[i].first_sect = 0;
+ req->seg[i].last_sect = PAGE_SIZE/512 - 1;
+ }
+ last_sect = ((amt - 1) & PAGE_MASK) / 512;
+ req->seg[nbpages-1].last_sect = last_sect;
+
+ memset((void*) phystokv(pages[nbpages-1]->phys_addr
+ + (last_sect + 1) * 512),
+ 0, PAGE_SIZE - (last_sect + 1) * 512);
+
+ /* no need for a lock: as long as the request is not pushed, the event won't be triggered */
+ assert_wait((event_t) &err, FALSE);
+
+ int notify;
+ wmb(); /* make sure it sees requests */
+ RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bd->ring, notify);
+ if (notify)
+ hyp_event_channel_send(bd->evt);
+ simple_unlock(&bd->pushlock);
+
+ thread_block(NULL);
+
+ if (err)
+ printf("error reading %d bytes at sector %d\n", amt,
+ bn + offset / 512);
+
+ for (i = 0; i < nbpages; i++)
+ hyp_grant_takeback(gref[i]);
+
+ /* Compute number of pages to insert in object. */
+ o = offset;
+
+ resid -= amt;
+ if (resid == 0)
+ offset = o + len;
+ else
+ offset += amt;
+
+ /* Add pages to the object. */
+ vm_object_lock (object);
+ for (i = 0; i < nbpages; i++)
+ {
+ m = pages[i];
+ assert (m->busy);
+ vm_page_lock_queues ();
+ PAGE_WAKEUP_DONE (m);
+ m->dirty = TRUE;
+ vm_page_insert (m, object, o);
+ vm_page_unlock_queues ();
+ o += PAGE_SIZE;
+ }
+ vm_object_unlock (object);
+ }
+
+out:
+ if (! err)
+ err = vm_map_copyin_object (object, 0, round_page (count), &copy);
+ if (! err)
+ {
+ *data = (io_buf_ptr_t) copy;
+ *bytes_read = count - resid;
+ }
+ else
+ vm_object_deallocate (object);
+ return err;
+}
+
+static io_return_t
+device_write(void *d, ipc_port_t reply_port,
+ mach_msg_type_name_t reply_port_type, dev_mode_t mode,
+ recnum_t bn, io_buf_ptr_t data, unsigned int count,
+ int *bytes_written)
+{
+ io_return_t err = 0;
+ vm_map_copy_t copy = (vm_map_copy_t) data;
+ vm_offset_t aligned_buffer = 0;
+ int copy_npages = atop(round_page(count));
+ vm_offset_t phys_addrs[copy_npages];
+ struct block_data *bd = d;
+ blkif_request_t *req;
+ grant_ref_t gref[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+ unsigned reqn, size;
+ int i, nbpages, j;
+
+ if (!(bd->mode & D_WRITE))
+ return D_READ_ONLY;
+
+ if (count == 0) {
+ vm_map_copy_discard(copy);
+ return 0;
+ }
+
+ if (count % bd->sector_size)
+ return D_INVALID_SIZE;
+
+ if (count > copy->size)
+ return D_INVALID_SIZE;
+
+ if (copy->type != VM_MAP_COPY_PAGE_LIST || copy->offset & PAGE_MASK) {
+ /* Unaligned write. Has to copy data before passing it to the backend. */
+ kern_return_t kr;
+ vm_offset_t buffer;
+
+ kr = kmem_alloc(device_io_map, &aligned_buffer, count);
+ if (kr != KERN_SUCCESS)
+ return kr;
+
+ kr = vm_map_copyout(device_io_map, &buffer, vm_map_copy_copy(copy));
+ if (kr != KERN_SUCCESS) {
+ kmem_free(device_io_map, aligned_buffer, count);
+ return kr;
+ }
+
+ memcpy((void*) aligned_buffer, (void*) buffer, count);
+
+ vm_deallocate (device_io_map, buffer, count);
+
+ for (i = 0; i < copy_npages; i++)
+ phys_addrs[i] = kvtophys(aligned_buffer + ptoa(i));
+ } else {
+ for (i = 0; i < copy_npages; i++)
+ phys_addrs[i] = copy->cpy_page_list[i]->phys_addr;
+ }
+
+ for (i=0; i<copy_npages; i+=nbpages) {
+
+ nbpages = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+ if (nbpages > copy_npages-i)
+ nbpages = copy_npages-i;
+
+ /* allocate a request */
+ spl_t spl = splsched();
+ while(1) {
+ simple_lock(&bd->lock);
+ if (!RING_FULL(&bd->ring))
+ break;
+ thread_sleep(bd, &bd->lock, FALSE);
+ }
+ mb();
+ reqn = bd->ring.req_prod_pvt++;;
+ simple_lock(&bd->pushlock);
+ simple_unlock(&bd->lock);
+ (void) splx(spl);
+
+ req = RING_GET_REQUEST(&bd->ring, reqn);
+ req->operation = BLKIF_OP_WRITE;
+ req->nr_segments = nbpages;
+ req->handle = bd->handle;
+ req->id = (unsigned64_t) (unsigned long) &err; /* pointer on the stack */
+ req->sector_number = bn + i*PAGE_SIZE / 512;
+
+ for (j = 0; j < nbpages; j++) {
+ req->seg[j].gref = gref[j] = hyp_grant_give(bd->domid, atop(phys_addrs[i + j]), 1);
+ req->seg[j].first_sect = 0;
+ size = PAGE_SIZE;
+ if ((i + j + 1) * PAGE_SIZE > count)
+ size = count - (i + j) * PAGE_SIZE;
+ req->seg[j].last_sect = size/512 - 1;
+ }
+
+ /* no need for a lock: as long as the request is not pushed, the event won't be triggered */
+ assert_wait((event_t) &err, FALSE);
+
+ int notify;
+ wmb(); /* make sure it sees requests */
+ RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bd->ring, notify);
+ if (notify)
+ hyp_event_channel_send(bd->evt);
+ simple_unlock(&bd->pushlock);
+
+ thread_block(NULL);
+
+ for (j = 0; j < nbpages; j++)
+ hyp_grant_takeback(gref[j]);
+
+ if (err) {
+ printf("error writing %d bytes at sector %d\n", count, bn);
+ break;
+ }
+ }
+
+ if (aligned_buffer)
+ kmem_free(device_io_map, aligned_buffer, count);
+
+ vm_map_copy_discard (copy);
+
+ if (!err)
+ *bytes_written = count;
+
+ if (IP_VALID(reply_port))
+ ds_device_write_reply (reply_port, reply_port_type, err, count);
+
+ return MIG_NO_REPLY;
+}
+
+static io_return_t
+device_get_status(void *d, dev_flavor_t flavor, dev_status_t status,
+ mach_msg_type_number_t *status_count)
+{
+ struct block_data *bd = d;
+
+ switch (flavor)
+ {
+ case DEV_GET_SIZE:
+ status[DEV_GET_SIZE_DEVICE_SIZE] = (unsigned long long) bd->nr_sectors * 512;
+ status[DEV_GET_SIZE_RECORD_SIZE] = bd->sector_size;
+ *status_count = DEV_GET_SIZE_COUNT;
+ break;
+ case DEV_GET_RECORDS:
+ status[DEV_GET_RECORDS_DEVICE_RECORDS] = ((unsigned long long) bd->nr_sectors * 512) / bd->sector_size;
+ status[DEV_GET_RECORDS_RECORD_SIZE] = bd->sector_size;
+ *status_count = DEV_GET_RECORDS_COUNT;
+ break;
+ default:
+ printf("TODO: block_%s(%d)\n", __func__, flavor);
+ return D_INVALID_OPERATION;
+ }
+ return D_SUCCESS;
+}
+
+struct device_emulation_ops hyp_block_emulation_ops = {
+ NULL, /* dereference */
+ NULL, /* deallocate */
+ dev_to_port,
+ device_open,
+ device_close,
+ device_write,
+ NULL, /* write_inband */
+ device_read,
+ NULL, /* read_inband */
+ NULL, /* set_status */
+ device_get_status,
+ NULL, /* set_filter */
+ NULL, /* map */
+ NULL, /* no_senders */
+ NULL, /* write_trap */
+ NULL, /* writev_trap */
+};
diff --git a/xen/block.h b/xen/block.h
new file mode 100644
index 0000000..5955968
--- /dev/null
+++ b/xen/block.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_BLOCK_H
+#define XEN_BLOCK_H
+
+void hyp_block_init(void);
+
+#endif /* XEN_BLOCK_H */
diff --git a/xen/configfrag.ac b/xen/configfrag.ac
new file mode 100644
index 0000000..eb68996
--- /dev/null
+++ b/xen/configfrag.ac
@@ -0,0 +1,44 @@
+dnl Configure fragment for the Xen platform.
+
+dnl Copyright (C) 2007 Free Software Foundation, Inc.
+
+dnl This program is free software; you can redistribute it and/or modify it
+dnl under the terms of the GNU General Public License as published by the
+dnl Free Software Foundation; either version 2, or (at your option) any later
+dnl version.
+dnl
+dnl This program is distributed in the hope that it will be useful, but
+dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+dnl for more details.
+dnl
+dnl You should have received a copy of the GNU General Public License along
+dnl with this program; if not, write to the Free Software Foundation, Inc.,
+dnl 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+#
+# Xen platform.
+#
+
+[if [ "$host_platform" = xen ]; then]
+ AC_DEFINE([MACH_XEN], [], [build a MachXen kernel])
+ AC_DEFINE([MACH_HYP], [], [be a hypervisor guest])
+ AM_CONDITIONAL([PLATFORM_xen], [true])
+
+ AC_ARG_ENABLE([pseudo-phys],
+ AS_HELP_STRING([--enable-pseudo-phys], [Pseudo physical support]))
+ [if [ x"$enable_pseudo_phys" = xno ]; then]
+ AM_CONDITIONAL([enable_pseudo_phys], [false])
+ [else]
+ AC_DEFINE([MACH_PSEUDO_PHYS], [], [Enable pseudo physical memory support])
+ AM_CONDITIONAL([enable_pseudo_phys], [true])
+ [fi]
+
+[else]
+ AM_CONDITIONAL([PLATFORM_xen], [false])
+ AM_CONDITIONAL([enable_pseudo_phys], [false])
+[fi]
+
+dnl Local Variables:
+dnl mode: autoconf
+dnl End:
diff --git a/xen/console.c b/xen/console.c
new file mode 100644
index 0000000..c65e6d2
--- /dev/null
+++ b/xen/console.c
@@ -0,0 +1,234 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <device/tty.h>
+#include <device/cons.h>
+#include <machine/pmap.h>
+#include <machine/machspl.h>
+#include <xen/public/io/console.h>
+#include "console.h"
+#include "ring.h"
+#include "evt.h"
+
+/* Hypervisor part */
+
+decl_simple_lock_data(static, outlock);
+decl_simple_lock_data(static, inlock);
+static struct xencons_interface *console;
+static int kd_pollc;
+int kb_mode; /* XXX: actually don't care. */
+
+#undef hyp_console_write
+void hyp_console_write(const char *str, int len)
+{
+ hyp_console_io (CONSOLEIO_write, len, kvtolin(str));
+}
+
+int hypputc(int c)
+{
+ if (!console) {
+ char d = c;
+ hyp_console_io(CONSOLEIO_write, 1, kvtolin(&d));
+ } else {
+ spl_t spl = splhigh();
+ simple_lock(&outlock);
+ while (hyp_ring_smash(console->out, console->out_prod, console->out_cons)) {
+ hyp_console_put("ring smash\n");
+ /* TODO: are we allowed to sleep in putc? */
+ hyp_yield();
+ }
+ hyp_ring_cell(console->out, console->out_prod) = c;
+ wmb();
+ console->out_prod++;
+ hyp_event_channel_send(boot_info.console_evtchn);
+ simple_unlock(&outlock);
+ splx(spl);
+ }
+ return 0;
+}
+
+int hypcnputc(dev_t dev, int c)
+{
+ return hypputc(c);
+}
+
+/* get char by polling, used by debugger */
+int hypcngetc(dev_t dev, int wait)
+{
+ int ret;
+ if (wait)
+ while (console->in_prod == console->in_cons)
+ hyp_yield();
+ else
+ if (console->in_prod == console->in_cons)
+ return -1;
+ ret = hyp_ring_cell(console->in, console->in_cons);
+ mb();
+ console->in_cons++;
+ hyp_event_channel_send(boot_info.console_evtchn);
+ return ret;
+}
+
+void cnpollc(boolean_t on) {
+ if (on) {
+ kd_pollc++;
+ } else {
+ --kd_pollc;
+ }
+}
+
+void kd_setleds1(u_char val)
+{
+ /* Can't do this. */
+}
+
+/* Mach part */
+
+struct tty hypcn_tty;
+
+static void hypcnintr(int unit, spl_t spl, void *ret_addr, void *regs) {
+ struct tty *tp = &hypcn_tty;
+ if (kd_pollc)
+ return;
+ simple_lock(&inlock);
+ while (console->in_prod != console->in_cons) {
+ int c = hyp_ring_cell(console->in, console->in_cons);
+ mb();
+ console->in_cons++;
+#ifdef MACH_KDB
+ if (c == (char)'£')
+ panic("£ pressed");
+#endif /* MACH_KDB */
+ if ((tp->t_state & (TS_ISOPEN|TS_WOPEN)))
+ (*linesw[tp->t_line].l_rint)(c, tp);
+ }
+ hyp_event_channel_send(boot_info.console_evtchn);
+ simple_unlock(&inlock);
+}
+
+int hypcnread(int dev, io_req_t ior)
+{
+ struct tty *tp = &hypcn_tty;
+ tp->t_state |= TS_CARR_ON;
+ return char_read(tp, ior);
+}
+
+int hypcnwrite(int dev, io_req_t ior)
+{
+ return char_write(&hypcn_tty, ior);
+}
+
+void hypcnstart(struct tty *tp)
+{
+ spl_t o_pri;
+ int ch;
+ unsigned char c;
+
+ if (tp->t_state & TS_TTSTOP)
+ return;
+ while (1) {
+ tp->t_state &= ~TS_BUSY;
+ if (tp->t_state & TS_TTSTOP)
+ break;
+ if ((tp->t_outq.c_cc <= 0) || (ch = getc(&tp->t_outq)) == -1)
+ break;
+ c = ch;
+ o_pri = splsoftclock();
+ hypputc(c);
+ splx(o_pri);
+ }
+ if (tp->t_outq.c_cc <= TTLOWAT(tp)) {
+ tt_write_wakeup(tp);
+ }
+}
+
+void hypcnstop()
+{
+}
+
+io_return_t hypcngetstat(dev_t dev, int flavor, int *data, unsigned int *count)
+{
+ return tty_get_status(&hypcn_tty, flavor, data, count);
+}
+
+io_return_t hypcnsetstat(dev_t dev, int flavor, int *data, unsigned int count)
+{
+ return tty_set_status(&hypcn_tty, flavor, data, count);
+}
+
+int hypcnportdeath(dev_t dev, mach_port_t port)
+{
+ return tty_portdeath(&hypcn_tty, (ipc_port_t) port);
+}
+
+int hypcnopen(dev_t dev, int flag, io_req_t ior)
+{
+ struct tty *tp = &hypcn_tty;
+ spl_t o_pri;
+
+ o_pri = spltty();
+ simple_lock(&tp->t_lock);
+ if (!(tp->t_state & (TS_ISOPEN|TS_WOPEN))) {
+ /* XXX ttychars allocates memory */
+ simple_unlock(&tp->t_lock);
+ ttychars(tp);
+ simple_lock(&tp->t_lock);
+ tp->t_oproc = hypcnstart;
+ tp->t_stop = hypcnstop;
+ tp->t_ospeed = tp->t_ispeed = B9600;
+ tp->t_flags = ODDP|EVENP|ECHO|CRMOD|XTABS;
+ }
+ tp->t_state |= TS_CARR_ON;
+ simple_unlock(&tp->t_lock);
+ splx(o_pri);
+ return (char_open(dev, tp, flag, ior));
+}
+
+int hypcnclose(int dev, int flag)
+{
+ struct tty *tp = &hypcn_tty;
+ spl_t s = spltty();
+ simple_lock(&tp->t_lock);
+ ttyclose(tp);
+ simple_unlock(&tp->t_lock);
+ splx(s);
+ return 0;
+}
+
+int hypcnprobe(struct consdev *cp)
+{
+ struct xencons_interface *my_console;
+ my_console = (void*) mfn_to_kv(boot_info.console_mfn);
+
+ cp->cn_dev = makedev(0, 0);
+ cp->cn_pri = CN_INTERNAL;
+ return 0;
+}
+
+int hypcninit(struct consdev *cp)
+{
+ if (console)
+ return 0;
+ simple_lock_init(&outlock);
+ simple_lock_init(&inlock);
+ console = (void*) mfn_to_kv(boot_info.console_mfn);
+ pmap_set_page_readwrite(console);
+ hyp_evt_handler(boot_info.console_evtchn, hypcnintr, 0, SPL6);
+ return 0;
+}
diff --git a/xen/console.h b/xen/console.h
new file mode 100644
index 0000000..fa13dc0
--- /dev/null
+++ b/xen/console.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_CONSOLE_H
+#define XEN_CONSOLE_H
+#include <machine/xen.h>
+#include <string.h>
+
+#define hyp_console_write(str, len) hyp_console_io (CONSOLEIO_write, (len), kvtolin(str))
+
+#define hyp_console_put(str) ({ \
+ const char *__str = (void*) (str); \
+ hyp_console_write (__str, strlen (__str)); \
+})
+
+extern void hyp_console_init(void);
+
+#endif /* XEN_CONSOLE_H */
diff --git a/xen/evt.c b/xen/evt.c
new file mode 100644
index 0000000..345e1d0
--- /dev/null
+++ b/xen/evt.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2007 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <string.h>
+#include <mach/xen.h>
+#include <machine/xen.h>
+#include <machine/ipl.h>
+#include <machine/gdt.h>
+#include <xen/console.h>
+#include "evt.h"
+
+#define NEVNT (sizeof(unsigned long) * sizeof(unsigned long) * 8)
+int int_mask[NSPL];
+
+spl_t curr_ipl;
+
+void (*ivect[NEVNT])();
+int intpri[NEVNT];
+int iunit[NEVNT];
+
+void hyp_c_callback(void *ret_addr, void *regs)
+{
+ int i, j, n;
+ int cpu = 0;
+ unsigned long pending_sel;
+
+ hyp_shared_info.vcpu_info[cpu].evtchn_upcall_pending = 0;
+ /* no need for a barrier on x86, xchg is already one */
+#if !(defined(__i386__) || defined(__x86_64__))
+ wmb();
+#endif
+ while ((pending_sel = xchgl(&hyp_shared_info.vcpu_info[cpu].evtchn_pending_sel, 0))) {
+
+ for (i = 0; pending_sel; i++, pending_sel >>= 1) {
+ unsigned long pending;
+
+ if (!(pending_sel & 1))
+ continue;
+
+ while ((pending = (hyp_shared_info.evtchn_pending[i] & ~hyp_shared_info.evtchn_mask[i]))) {
+
+ n = i * sizeof(unsigned long);
+ for (j = 0; pending; j++, n++, pending >>= 1) {
+ if (!(pending & 1))
+ continue;
+
+ if (ivect[n]) {
+ spl_t spl = splx(intpri[n]);
+ asm ("lock; andl %1,%0":"=m"(hyp_shared_info.evtchn_pending[i]):"r"(~(1<<j)));
+ ivect[n](iunit[n], spl, ret_addr, regs);
+ splx_cli(spl);
+ } else {
+ printf("warning: lost unbound event %d\n", n);
+ asm ("lock; andl %1,%0":"=m"(hyp_shared_info.evtchn_pending[i]):"r"(~(1<<j)));
+ }
+ }
+ }
+ }
+ }
+}
+
+void form_int_mask(void)
+{
+ unsigned int i, j, bit, mask;
+
+ for (i=SPL0; i < NSPL; i++) {
+ for (j=0x00, bit=0x01, mask = 0; j < NEVNT; j++, bit<<=1)
+ if (intpri[j] <= i)
+ mask |= bit;
+ int_mask[i] = mask;
+ }
+}
+
+extern void hyp_callback(void);
+extern void hyp_failsafe_callback(void);
+
+void hyp_intrinit() {
+ form_int_mask();
+ curr_ipl = SPLHI;
+ hyp_shared_info.evtchn_mask[0] = int_mask[SPLHI];
+ hyp_set_callbacks(KERNEL_CS, hyp_callback,
+ KERNEL_CS, hyp_failsafe_callback);
+}
+
+void hyp_evt_handler(evtchn_port_t port, void (*handler)(), int unit, spl_t spl) {
+ if (port > NEVNT)
+ panic("event channel port %d > %d not supported\n", port, NEVNT);
+ intpri[port] = spl;
+ iunit[port] = unit;
+ form_int_mask();
+ wmb();
+ ivect[port] = handler;
+}
diff --git a/xen/evt.h b/xen/evt.h
new file mode 100644
index 0000000..a583977
--- /dev/null
+++ b/xen/evt.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_EVT_H
+#define XEN_EVT_H
+
+#include <machine/spl.h>
+
+void hyp_intrinit(void);
+void form_int_mask(void);
+void hyp_evt_handler(evtchn_port_t port, void (*handler)(), int unit, spl_t spl);
+void hyp_c_callback(void *ret_addr, void *regs);
+
+#endif /* XEN_EVT_H */
diff --git a/xen/grant.c b/xen/grant.c
new file mode 100644
index 0000000..505d202
--- /dev/null
+++ b/xen/grant.c
@@ -0,0 +1,142 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <mach/vm_param.h>
+#include <machine/spl.h>
+#include <vm/pmap.h>
+#include <vm/vm_map.h>
+#include <vm/vm_kern.h>
+#include "grant.h"
+
+#define NR_RESERVED_ENTRIES 8
+#define NR_GRANT_PAGES 4
+
+decl_simple_lock_data(static,lock);
+static struct grant_entry *grants;
+static vm_map_entry_t grants_map_entry;
+static int last_grant = NR_RESERVED_ENTRIES;
+
+static grant_ref_t free_grants = -1;
+
+static grant_ref_t grant_alloc(void) {
+ grant_ref_t grant;
+ if (free_grants != -1) {
+ grant = free_grants;
+ free_grants = grants[grant].frame;
+ } else {
+ grant = last_grant++;
+ if (grant == (NR_GRANT_PAGES * PAGE_SIZE)/sizeof(*grants))
+ panic("not enough grant entries, increase NR_GRANT_PAGES");
+ }
+ return grant;
+}
+
+static void grant_free(grant_ref_t grant) {
+ grants[grant].frame = free_grants;
+ free_grants = grant;
+}
+
+static grant_ref_t grant_set(domid_t domid, unsigned long mfn, uint16_t flags) {
+ spl_t spl = splhigh();
+ simple_lock(&lock);
+
+ grant_ref_t grant = grant_alloc();
+ grants[grant].domid = domid;
+ grants[grant].frame = mfn;
+ wmb();
+ grants[grant].flags = flags;
+
+ simple_unlock(&lock);
+ splx(spl);
+ return grant;
+}
+
+grant_ref_t hyp_grant_give(domid_t domid, unsigned long frame, int readonly) {
+ return grant_set(domid, pfn_to_mfn(frame),
+ GTF_permit_access | (readonly ? GTF_readonly : 0));
+}
+
+grant_ref_t hyp_grant_accept_transfer(domid_t domid, unsigned long frame) {
+ return grant_set(domid, frame, GTF_accept_transfer);
+}
+
+unsigned long hyp_grant_finish_transfer(grant_ref_t grant) {
+ unsigned long frame;
+ spl_t spl = splhigh();
+ simple_lock(&lock);
+
+ if (!(grants[grant].flags & GTF_transfer_committed))
+ panic("grant transfer %x not committed\n", grant);
+ while (!(grants[grant].flags & GTF_transfer_completed))
+ machine_relax();
+ rmb();
+ frame = grants[grant].frame;
+ grant_free(grant);
+
+ simple_unlock(&lock);
+ splx(spl);
+ return frame;
+}
+
+void hyp_grant_takeback(grant_ref_t grant) {
+ spl_t spl = splhigh();
+ simple_lock(&lock);
+
+ if (grants[grant].flags & (GTF_reading|GTF_writing))
+ panic("grant %d still in use (%lx)\n", grant, grants[grant].flags);
+
+ /* Note: this is not safe, a cmpxchg is needed, see grant_table.h */
+ grants[grant].flags = 0;
+ wmb();
+
+ grant_free(grant);
+
+ simple_unlock(&lock);
+ splx(spl);
+}
+
+void *hyp_grant_address(grant_ref_t grant) {
+ return &grants[grant];
+}
+
+void hyp_grant_init(void) {
+ struct gnttab_setup_table setup;
+ unsigned long frame[NR_GRANT_PAGES];
+ long ret;
+ int i;
+ vm_offset_t addr;
+
+ setup.dom = DOMID_SELF;
+ setup.nr_frames = NR_GRANT_PAGES;
+ setup.frame_list = (void*) kvtolin(frame);
+
+ ret = hyp_grant_table_op(GNTTABOP_setup_table, kvtolin(&setup), 1);
+ if (ret)
+ panic("setup grant table error %d", ret);
+ if (setup.status)
+ panic("setup grant table: %d\n", setup.status);
+
+ simple_lock_init(&lock);
+ vm_map_find_entry(kernel_map, &addr, NR_GRANT_PAGES * PAGE_SIZE,
+ (vm_offset_t) 0, kernel_object, &grants_map_entry);
+ grants = (void*) addr;
+
+ for (i = 0; i < NR_GRANT_PAGES; i++)
+ pmap_map_mfn((void *)grants + i * PAGE_SIZE, frame[i]);
+}
diff --git a/xen/grant.h b/xen/grant.h
new file mode 100644
index 0000000..ff8617d
--- /dev/null
+++ b/xen/grant.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_GRANT_H
+#define XEN_GRANT_H
+#include <sys/types.h>
+#include <machine/xen.h>
+#include <xen/public/xen.h>
+#include <xen/public/grant_table.h>
+
+void hyp_grant_init(void);
+grant_ref_t hyp_grant_give(domid_t domid, unsigned long frame_nr, int readonly);
+void hyp_grant_takeback(grant_ref_t grant);
+grant_ref_t hyp_grant_accept_transfer(domid_t domid, unsigned long frame_nr);
+unsigned long hyp_grant_finish_transfer(grant_ref_t grant);
+void *hyp_grant_address(grant_ref_t grant);
+
+#endif /* XEN_GRANT_H */
diff --git a/xen/net.c b/xen/net.c
new file mode 100644
index 0000000..1bb217b
--- /dev/null
+++ b/xen/net.c
@@ -0,0 +1,665 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <mach/mig_errors.h>
+#include <ipc/ipc_port.h>
+#include <ipc/ipc_space.h>
+#include <vm/vm_kern.h>
+#include <device/device_types.h>
+#include <device/device_port.h>
+#include <device/if_hdr.h>
+#include <device/if_ether.h>
+#include <device/net_io.h>
+#include <device/device_reply.user.h>
+#include <device/device_emul.h>
+#include <intel/pmap.h>
+#include <xen/public/io/netif.h>
+#include <xen/public/memory.h>
+#include <string.h>
+#include <util/atoi.h>
+#include "evt.h"
+#include "store.h"
+#include "net.h"
+#include "grant.h"
+#include "ring.h"
+#include "time.h"
+#include "xen.h"
+
+/* Hypervisor part */
+
+#define ADDRESS_SIZE 6
+#define WINDOW __RING_SIZE((netif_rx_sring_t*)0, PAGE_SIZE)
+
+/* TODO: use rx-copy instead, since we're memcpying anyway */
+
+/* Are we paranoid enough to not leak anything to backend? */
+static const int paranoia = 0;
+
+struct net_data {
+ struct device device;
+ struct ifnet ifnet;
+ int open_count;
+ char *backend;
+ domid_t domid;
+ char *vif;
+ u_char address[ADDRESS_SIZE];
+ int handle;
+ ipc_port_t port;
+ netif_tx_front_ring_t tx;
+ netif_rx_front_ring_t rx;
+ void *rx_buf[WINDOW];
+ grant_ref_t rx_buf_gnt[WINDOW];
+ unsigned long rx_buf_pfn[WINDOW];
+ evtchn_port_t evt;
+ simple_lock_data_t lock;
+ simple_lock_data_t pushlock;
+};
+
+static int n_vifs;
+static struct net_data *vif_data;
+
+struct device_emulation_ops hyp_net_emulation_ops;
+
+int hextoi(char *cp, int *nump)
+{
+ int number;
+ char *original;
+ char c;
+
+ original = cp;
+ for (number = 0, c = *cp | 0x20; (('0' <= c) && (c <= '9')) || (('a' <= c) && (c <= 'f')); c = *(++cp)) {
+ number *= 16;
+ if (c <= '9')
+ number += c - '0';
+ else
+ number += c - 'a' + 10;
+ }
+ if (original == cp)
+ *nump = -1;
+ else
+ *nump = number;
+ return(cp - original);
+}
+
+static void enqueue_rx_buf(struct net_data *nd, int number) {
+ unsigned reqn = nd->rx.req_prod_pvt++;
+ netif_rx_request_t *req = RING_GET_REQUEST(&nd->rx, reqn);
+
+ assert(number < WINDOW);
+
+ req->id = number;
+ req->gref = nd->rx_buf_gnt[number] = hyp_grant_accept_transfer(nd->domid, nd->rx_buf_pfn[number]);
+
+ /* give back page */
+ hyp_free_page(nd->rx_buf_pfn[number], nd->rx_buf[number]);
+}
+
+static void hyp_net_intr(int unit) {
+ ipc_kmsg_t kmsg;
+ struct ether_header *eh;
+ struct packet_header *ph;
+ netif_rx_response_t *rx_rsp;
+ netif_tx_response_t *tx_rsp;
+ void *data;
+ int len, more;
+ struct net_data *nd = &vif_data[unit];
+
+ simple_lock(&nd->lock);
+ if ((nd->rx.sring->rsp_prod - nd->rx.rsp_cons) >= (WINDOW*3)/4)
+ printf("window %ld a bit small!\n", WINDOW);
+
+ more = RING_HAS_UNCONSUMED_RESPONSES(&nd->rx);
+ while (more) {
+ rmb(); /* make sure we see responses */
+ rx_rsp = RING_GET_RESPONSE(&nd->rx, nd->rx.rsp_cons++);
+
+ unsigned number = rx_rsp->id;
+ assert(number < WINDOW);
+ unsigned long mfn = hyp_grant_finish_transfer(nd->rx_buf_gnt[number]);
+
+#ifdef MACH_PSEUDO_PHYS
+ mfn_list[nd->rx_buf_pfn[number]] = mfn;
+#endif /* MACH_PSEUDO_PHYS */
+ pmap_map_mfn(nd->rx_buf[number], mfn);
+
+ kmsg = net_kmsg_get();
+ if (!kmsg)
+ /* gasp! Drop */
+ goto drop;
+
+ if (rx_rsp->status <= 0)
+ switch (rx_rsp->status) {
+ case NETIF_RSP_DROPPED:
+ printf("Packet dropped\n");
+ goto drop;
+ case NETIF_RSP_ERROR:
+ panic("Packet error");
+ case 0:
+ printf("nul packet\n");
+ goto drop;
+ default:
+ printf("Unknown error %d\n", rx_rsp->status);
+ goto drop;
+ }
+
+ data = nd->rx_buf[number] + rx_rsp->offset;
+ len = rx_rsp->status;
+
+ eh = (void*) (net_kmsg(kmsg)->header);
+ ph = (void*) (net_kmsg(kmsg)->packet);
+ memcpy(eh, data, sizeof (struct ether_header));
+ memcpy(ph + 1, data + sizeof (struct ether_header), len - sizeof(struct ether_header));
+ RING_FINAL_CHECK_FOR_RESPONSES(&nd->rx, more);
+ enqueue_rx_buf(nd, number);
+ ph->type = eh->ether_type;
+ ph->length = len - sizeof(struct ether_header) + sizeof (struct packet_header);
+
+ net_kmsg(kmsg)->sent = FALSE; /* Mark packet as received. */
+
+ net_packet(&nd->ifnet, kmsg, ph->length, ethernet_priority(kmsg));
+ continue;
+
+drop:
+ RING_FINAL_CHECK_FOR_RESPONSES(&nd->rx, more);
+ enqueue_rx_buf(nd, number);
+ }
+
+ /* commit new requests */
+ int notify;
+ wmb(); /* make sure it sees requests */
+ RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&nd->rx, notify);
+ if (notify)
+ hyp_event_channel_send(nd->evt);
+
+ /* Now the tx side */
+ more = RING_HAS_UNCONSUMED_RESPONSES(&nd->tx);
+ spl_t s = splsched ();
+ while (more) {
+ rmb(); /* make sure we see responses */
+ tx_rsp = RING_GET_RESPONSE(&nd->tx, nd->tx.rsp_cons++);
+ switch (tx_rsp->status) {
+ case NETIF_RSP_DROPPED:
+ printf("Packet dropped\n");
+ break;
+ case NETIF_RSP_ERROR:
+ panic("Packet error");
+ case NETIF_RSP_OKAY:
+ break;
+ default:
+ printf("Unknown error %d\n", tx_rsp->status);
+ goto drop_tx;
+ }
+ thread_wakeup((event_t) hyp_grant_address(tx_rsp->id));
+drop_tx:
+ thread_wakeup_one(nd);
+ RING_FINAL_CHECK_FOR_RESPONSES(&nd->tx, more);
+ }
+ splx(s);
+
+ simple_unlock(&nd->lock);
+}
+
+#define VIF_PATH "device/vif"
+void hyp_net_init(void) {
+ char **vifs, **vif;
+ char *c;
+ int i;
+ int n;
+ int grant;
+ char port_name[10];
+ domid_t domid;
+ evtchn_port_t evt;
+ hyp_store_transaction_t t;
+ vm_offset_t addr;
+ struct net_data *nd;
+ struct ifnet *ifp;
+ netif_tx_sring_t *tx_ring;
+ netif_rx_sring_t *rx_ring;
+
+ vifs = hyp_store_ls(0, 1, VIF_PATH);
+ if (!vifs) {
+ printf("eth: No net device (%s). Hoping you don't need any\n", hyp_store_error);
+ n_vifs = 0;
+ return;
+ }
+
+ n = 0;
+ for (vif = vifs; *vif; vif++)
+ n++;
+
+ vif_data = (void*) kalloc(n * sizeof(*vif_data));
+ if (!vif_data) {
+ printf("eth: No memory room for VIF\n");
+ n_vifs = 0;
+ return;
+ }
+ n_vifs = n;
+
+ for (n = 0; n < n_vifs; n++) {
+ nd = &vif_data[n];
+ mach_atoi((u_char *) vifs[n], &nd->handle);
+ if (nd->handle == MACH_ATOI_DEFAULT)
+ continue;
+
+ nd->open_count = -2;
+ nd->vif = vifs[n];
+
+ /* Get domain id of frontend driver. */
+ i = hyp_store_read_int(0, 5, VIF_PATH, "/", vifs[n], "/", "backend-id");
+ if (i == -1)
+ panic("eth: couldn't read frontend domid of VIF %s (%s)",vifs[n], hyp_store_error);
+ nd->domid = domid = i;
+
+ do {
+ t = hyp_store_transaction_start();
+
+ /* Get a page for tx_ring */
+ if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS)
+ panic("eth: couldn't allocate space for store tx_ring");
+ tx_ring = (void*) addr;
+ SHARED_RING_INIT(tx_ring);
+ FRONT_RING_INIT(&nd->tx, tx_ring, PAGE_SIZE);
+ grant = hyp_grant_give(domid, atop(kvtophys(addr)), 0);
+
+ /* and give it to backend. */
+ i = sprintf(port_name, "%u", grant);
+ c = hyp_store_write(t, port_name, 5, VIF_PATH, "/", vifs[n], "/", "tx-ring-ref");
+ if (!c)
+ panic("eth: couldn't store tx_ring reference for VIF %s (%s)", vifs[n], hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ /* Get a page for rx_ring */
+ if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS)
+ panic("eth: couldn't allocate space for store tx_ring");
+ rx_ring = (void*) addr;
+ SHARED_RING_INIT(rx_ring);
+ FRONT_RING_INIT(&nd->rx, rx_ring, PAGE_SIZE);
+ grant = hyp_grant_give(domid, atop(kvtophys(addr)), 0);
+
+ /* and give it to backend. */
+ i = sprintf(port_name, "%u", grant);
+ c = hyp_store_write(t, port_name, 5, VIF_PATH, "/", vifs[n], "/", "rx-ring-ref");
+ if (!c)
+ panic("eth: couldn't store rx_ring reference for VIF %s (%s)", vifs[n], hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ /* tell we need csums. */
+ c = hyp_store_write(t, "1", 5, VIF_PATH, "/", vifs[n], "/", "feature-no-csum-offload");
+ if (!c)
+ panic("eth: couldn't store feature-no-csum-offload reference for VIF %s (%s)", vifs[n], hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ /* Allocate an event channel and give it to backend. */
+ nd->evt = evt = hyp_event_channel_alloc(domid);
+ i = sprintf(port_name, "%lu", evt);
+ c = hyp_store_write(t, port_name, 5, VIF_PATH, "/", vifs[n], "/", "event-channel");
+ if (!c)
+ panic("eth: couldn't store event channel for VIF %s (%s)", vifs[n], hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+ c = hyp_store_write(t, hyp_store_state_initialized, 5, VIF_PATH, "/", vifs[n], "/", "state");
+ if (!c)
+ panic("eth: couldn't store state for VIF %s (%s)", vifs[n], hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+ } while ((!hyp_store_transaction_stop(t)));
+ /* TODO randomly wait? */
+
+ c = hyp_store_read(0, 5, VIF_PATH, "/", vifs[n], "/", "backend");
+ if (!c)
+ panic("eth: couldn't get path to VIF %s backend (%s)", vifs[n], hyp_store_error);
+ nd->backend = c;
+
+ while(1) {
+ i = hyp_store_read_int(0, 3, nd->backend, "/", "state");
+ if (i == MACH_ATOI_DEFAULT)
+ panic("can't read state from %s", nd->backend);
+ if (i == XenbusStateInitWait)
+ break;
+ hyp_yield();
+ }
+
+ c = hyp_store_read(0, 3, nd->backend, "/", "mac");
+ if (!c)
+ panic("eth: couldn't get VIF %s's mac (%s)", vifs[n], hyp_store_error);
+
+ for (i=0; ; i++) {
+ int val;
+ hextoi(&c[3*i], &val);
+ if (val == -1)
+ panic("eth: couldn't understand %dth number of VIF %s's mac %s", i, vifs[n], c);
+ nd->address[i] = val;
+ if (i==ADDRESS_SIZE-1)
+ break;
+ if (c[3*i+2] != ':')
+ panic("eth: couldn't understand %dth separator of VIF %s's mac %s", i, vifs[n], c);
+ }
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ printf("eth%d: dom%d's VIF %s ", n, domid, vifs[n]);
+ for (i=0; ; i++) {
+ printf("%02x", nd->address[i]);
+ if (i==ADDRESS_SIZE-1)
+ break;
+ printf(":");
+ }
+ printf("\n");
+
+ c = hyp_store_write(0, hyp_store_state_connected, 5, VIF_PATH, "/", nd->vif, "/", "state");
+ if (!c)
+ panic("couldn't store state for eth%d (%s)", nd - vif_data, hyp_store_error);
+ kfree((vm_offset_t) c, strlen(c)+1);
+
+ /* Get a page for packet reception */
+ for (i= 0; i<WINDOW; i++) {
+ if (kmem_alloc_wired(kernel_map, &addr, PAGE_SIZE) != KERN_SUCCESS)
+ panic("eth: couldn't allocate space for store tx_ring");
+ nd->rx_buf[i] = (void*)phystokv(kvtophys(addr));
+ nd->rx_buf_pfn[i] = atop(kvtophys((vm_offset_t)nd->rx_buf[i]));
+ if (hyp_do_update_va_mapping(kvtolin(addr), 0, UVMF_INVLPG|UVMF_ALL))
+ panic("eth: couldn't clear rx kv buf %d at %p", i, addr);
+ /* and enqueue it to backend. */
+ enqueue_rx_buf(nd, i);
+ }
+ int notify;
+ wmb(); /* make sure it sees requests */
+ RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&nd->rx, notify);
+ if (notify)
+ hyp_event_channel_send(nd->evt);
+
+
+ nd->open_count = -1;
+ nd->device.emul_ops = &hyp_net_emulation_ops;
+ nd->device.emul_data = nd;
+ simple_lock_init(&nd->lock);
+ simple_lock_init(&nd->pushlock);
+
+ ifp = &nd->ifnet;
+ ifp->if_unit = n;
+ ifp->if_flags = IFF_UP | IFF_RUNNING;
+ ifp->if_header_size = 14;
+ ifp->if_header_format = HDR_ETHERNET;
+ /* Set to the maximum that we can handle in device_write. */
+ ifp->if_mtu = PAGE_SIZE - ifp->if_header_size;
+ ifp->if_address_size = ADDRESS_SIZE;
+ ifp->if_address = (void*) nd->address;
+ if_init_queues (ifp);
+
+ /* Now we can start receiving */
+ hyp_evt_handler(evt, hyp_net_intr, n, SPL6);
+ }
+}
+
+static ipc_port_t
+dev_to_port(void *d)
+{
+ struct net_data *b = d;
+ if (!d)
+ return IP_NULL;
+ return ipc_port_make_send(b->port);
+}
+
+static int
+device_close(void *devp)
+{
+ struct net_data *nd = devp;
+ if (--nd->open_count < 0)
+ panic("too many closes on eth%d", nd - vif_data);
+ printf("close, eth%d count %d\n",nd-vif_data,nd->open_count);
+ if (nd->open_count)
+ return 0;
+ ipc_kobject_set(nd->port, IKO_NULL, IKOT_NONE);
+ ipc_port_dealloc_kernel(nd->port);
+ return 0;
+}
+
+static io_return_t
+device_open (ipc_port_t reply_port, mach_msg_type_name_t reply_port_type,
+ dev_mode_t mode, char *name, device_t *devp /* out */)
+{
+ int i, n, err = 0;
+ ipc_port_t port, notify;
+ struct net_data *nd;
+
+ if (name[0] != 'e' || name[1] != 't' || name[2] != 'h' || name[3] < '0' || name[3] > '9')
+ return D_NO_SUCH_DEVICE;
+ i = mach_atoi((u_char *) &name[3], &n);
+ if (n == MACH_ATOI_DEFAULT)
+ return D_NO_SUCH_DEVICE;
+ if (name[3 + i])
+ return D_NO_SUCH_DEVICE;
+ if (n >= n_vifs)
+ return D_NO_SUCH_DEVICE;
+ nd = &vif_data[n];
+ if (nd->open_count == -2)
+ /* couldn't be initialized */
+ return D_NO_SUCH_DEVICE;
+
+ if (nd->open_count >= 0) {
+ *devp = &nd->device ;
+ nd->open_count++ ;
+ printf("re-open, eth%d count %d\n",nd-vif_data,nd->open_count);
+ return D_SUCCESS;
+ }
+
+ nd->open_count = 1;
+ printf("eth%d count %d\n",nd-vif_data,nd->open_count);
+
+ port = ipc_port_alloc_kernel();
+ if (port == IP_NULL) {
+ err = KERN_RESOURCE_SHORTAGE;
+ goto out;
+ }
+ nd->port = port;
+
+ *devp = &nd->device;
+
+ ipc_kobject_set (port, (ipc_kobject_t) &nd->device, IKOT_DEVICE);
+
+ notify = ipc_port_make_sonce (nd->port);
+ ip_lock (nd->port);
+ ipc_port_nsrequest (nd->port, 1, notify, &notify);
+ assert (notify == IP_NULL);
+
+out:
+ if (IP_VALID (reply_port))
+ ds_device_open_reply (reply_port, reply_port_type, D_SUCCESS, dev_to_port(nd));
+ else
+ device_close(nd);
+ return MIG_NO_REPLY;
+}
+
+static io_return_t
+device_write(void *d, ipc_port_t reply_port,
+ mach_msg_type_name_t reply_port_type, dev_mode_t mode,
+ recnum_t bn, io_buf_ptr_t data, unsigned int count,
+ int *bytes_written)
+{
+ vm_map_copy_t copy = (vm_map_copy_t) data;
+ grant_ref_t gref;
+ struct net_data *nd = d;
+ struct ifnet *ifp = &nd->ifnet;
+ netif_tx_request_t *req;
+ unsigned reqn;
+ vm_offset_t offset;
+ vm_page_t m;
+ vm_size_t size;
+
+ /* The maximum that we can handle. */
+ assert(ifp->if_header_size + ifp->if_mtu <= PAGE_SIZE);
+
+ if (count < ifp->if_header_size ||
+ count > ifp->if_header_size + ifp->if_mtu)
+ return D_INVALID_SIZE;
+
+ assert(copy->type == VM_MAP_COPY_PAGE_LIST);
+
+ assert(copy->cpy_npages <= 2);
+ assert(copy->cpy_npages >= 1);
+
+ offset = copy->offset & PAGE_MASK;
+ if (paranoia || copy->cpy_npages == 2) {
+ /* have to copy :/ */
+ while ((m = vm_page_grab(FALSE)) == 0)
+ VM_PAGE_WAIT (0);
+ assert (! m->active && ! m->inactive);
+ m->busy = TRUE;
+
+ if (copy->cpy_npages == 1)
+ size = count;
+ else
+ size = PAGE_SIZE - offset;
+
+ memcpy((void*)phystokv(m->phys_addr), (void*)phystokv(copy->cpy_page_list[0]->phys_addr + offset), size);
+ if (copy->cpy_npages == 2)
+ memcpy((void*)phystokv(m->phys_addr + size), (void*)phystokv(copy->cpy_page_list[1]->phys_addr), count - size);
+
+ offset = 0;
+ } else
+ m = copy->cpy_page_list[0];
+
+ /* allocate a request */
+ spl_t spl = splimp();
+ while (1) {
+ simple_lock(&nd->lock);
+ if (!RING_FULL(&nd->tx))
+ break;
+ thread_sleep(nd, &nd->lock, FALSE);
+ }
+ mb();
+ reqn = nd->tx.req_prod_pvt++;;
+ simple_lock(&nd->pushlock);
+ simple_unlock(&nd->lock);
+ (void) splx(spl);
+
+ req = RING_GET_REQUEST(&nd->tx, reqn);
+ req->gref = gref = hyp_grant_give(nd->domid, atop(m->phys_addr), 1);
+ req->offset = offset;
+ req->flags = 0;
+ req->id = gref;
+ req->size = count;
+
+ assert_wait(hyp_grant_address(gref), FALSE);
+
+ int notify;
+ wmb(); /* make sure it sees requests */
+ RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&nd->tx, notify);
+ if (notify)
+ hyp_event_channel_send(nd->evt);
+ simple_unlock(&nd->pushlock);
+
+ thread_block(NULL);
+
+ hyp_grant_takeback(gref);
+
+ /* Send packet to filters. */
+ {
+ struct packet_header *packet;
+ struct ether_header *header;
+ ipc_kmsg_t kmsg;
+
+ kmsg = net_kmsg_get ();
+
+ if (kmsg != IKM_NULL)
+ {
+ /* Suitable for Ethernet only. */
+ header = (struct ether_header *) (net_kmsg (kmsg)->header);
+ packet = (struct packet_header *) (net_kmsg (kmsg)->packet);
+ memcpy (header, (void*)phystokv(m->phys_addr + offset), sizeof (struct ether_header));
+
+ /* packet is prefixed with a struct packet_header,
+ see include/device/net_status.h. */
+ memcpy (packet + 1, (void*)phystokv(m->phys_addr + offset + sizeof (struct ether_header)),
+ count - sizeof (struct ether_header));
+ packet->length = count - sizeof (struct ether_header)
+ + sizeof (struct packet_header);
+ packet->type = header->ether_type;
+ net_kmsg (kmsg)->sent = TRUE; /* Mark packet as sent. */
+ spl_t s = splimp ();
+ net_packet (&nd->ifnet, kmsg, packet->length,
+ ethernet_priority (kmsg));
+ splx (s);
+ }
+ }
+
+ if (paranoia || copy->cpy_npages == 2)
+ VM_PAGE_FREE(m);
+
+ vm_map_copy_discard (copy);
+
+ *bytes_written = count;
+
+ if (IP_VALID(reply_port))
+ ds_device_write_reply (reply_port, reply_port_type, 0, count);
+
+ return MIG_NO_REPLY;
+}
+
+static io_return_t
+device_get_status(void *d, dev_flavor_t flavor, dev_status_t status,
+ mach_msg_type_number_t *status_count)
+{
+ struct net_data *nd = d;
+
+ return net_getstat (&nd->ifnet, flavor, status, status_count);
+}
+
+static io_return_t
+device_set_status(void *d, dev_flavor_t flavor, dev_status_t status,
+ mach_msg_type_number_t count)
+{
+ struct net_data *nd = d;
+
+ switch (flavor)
+ {
+ default:
+ printf("TODO: net_%s(%p, 0x%x)\n", __func__, nd, flavor);
+ return D_INVALID_OPERATION;
+ }
+ return D_SUCCESS;
+}
+
+static io_return_t
+device_set_filter(void *d, ipc_port_t port, int priority,
+ filter_t * filter, unsigned filter_count)
+{
+ struct net_data *nd = d;
+
+ if (!nd)
+ return D_NO_SUCH_DEVICE;
+
+ return net_set_filter (&nd->ifnet, port, priority, filter, filter_count);
+}
+
+struct device_emulation_ops hyp_net_emulation_ops = {
+ NULL, /* dereference */
+ NULL, /* deallocate */
+ dev_to_port,
+ device_open,
+ device_close,
+ device_write,
+ NULL, /* write_inband */
+ NULL,
+ NULL, /* read_inband */
+ device_set_status, /* set_status */
+ device_get_status,
+ device_set_filter, /* set_filter */
+ NULL, /* map */
+ NULL, /* no_senders */
+ NULL, /* write_trap */
+ NULL, /* writev_trap */
+};
diff --git a/xen/net.h b/xen/net.h
new file mode 100644
index 0000000..6683870
--- /dev/null
+++ b/xen/net.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_NET_H
+#define XEN_NET_H
+
+void hyp_net_init(void);
+
+#endif /* XEN_NET_H */
diff --git a/xen/public/COPYING b/xen/public/COPYING
new file mode 100644
index 0000000..ffc6d61
--- /dev/null
+++ b/xen/public/COPYING
@@ -0,0 +1,38 @@
+XEN NOTICE
+==========
+
+This copyright applies to all files within this subdirectory and its
+subdirectories:
+ include/public/*.h
+ include/public/hvm/*.h
+ include/public/io/*.h
+
+The intention is that these files can be freely copied into the source
+tree of an operating system when porting that OS to run on Xen. Doing
+so does *not* cause the OS to become subject to the terms of the GPL.
+
+All other files in the Xen source distribution are covered by version
+2 of the GNU General Public License except where explicitly stated
+otherwise within individual source files.
+
+ -- Keir Fraser (on behalf of the Xen team)
+
+=====================================================================
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
diff --git a/xen/public/arch-x86/xen-mca.h b/xen/public/arch-x86/xen-mca.h
new file mode 100644
index 0000000..103d41f
--- /dev/null
+++ b/xen/public/arch-x86/xen-mca.h
@@ -0,0 +1,279 @@
+/******************************************************************************
+ * arch-x86/mca.h
+ *
+ * Contributed by Advanced Micro Devices, Inc.
+ * Author: Christoph Egger <Christoph.Egger@amd.com>
+ *
+ * Guest OS machine check interface to x86 Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/* Full MCA functionality has the following Usecases from the guest side:
+ *
+ * Must have's:
+ * 1. Dom0 and DomU register machine check trap callback handlers
+ * (already done via "set_trap_table" hypercall)
+ * 2. Dom0 registers machine check event callback handler
+ * (doable via EVTCHNOP_bind_virq)
+ * 3. Dom0 and DomU fetches machine check data
+ * 4. Dom0 wants Xen to notify a DomU
+ * 5. Dom0 gets DomU ID from physical address
+ * 6. Dom0 wants Xen to kill DomU (already done for "xm destroy")
+ *
+ * Nice to have's:
+ * 7. Dom0 wants Xen to deactivate a physical CPU
+ * This is better done as separate task, physical CPU hotplugging,
+ * and hypercall(s) should be sysctl's
+ * 8. Page migration proposed from Xen NUMA work, where Dom0 can tell Xen to
+ * move a DomU (or Dom0 itself) away from a malicious page
+ * producing correctable errors.
+ * 9. offlining physical page:
+ * Xen free's and never re-uses a certain physical page.
+ * 10. Testfacility: Allow Dom0 to write values into machine check MSR's
+ * and tell Xen to trigger a machine check
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_MCA_H__
+#define __XEN_PUBLIC_ARCH_X86_MCA_H__
+
+/* Hypercall */
+#define __HYPERVISOR_mca __HYPERVISOR_arch_0
+
+#define XEN_MCA_INTERFACE_VERSION 0x03000001
+
+/* IN: Dom0 calls hypercall from MC event handler. */
+#define XEN_MC_CORRECTABLE 0x0
+/* IN: Dom0/DomU calls hypercall from MC trap handler. */
+#define XEN_MC_TRAP 0x1
+/* XEN_MC_CORRECTABLE and XEN_MC_TRAP are mutually exclusive. */
+
+/* OUT: All is ok */
+#define XEN_MC_OK 0x0
+/* OUT: Domain could not fetch data. */
+#define XEN_MC_FETCHFAILED 0x1
+/* OUT: There was no machine check data to fetch. */
+#define XEN_MC_NODATA 0x2
+/* OUT: Between notification time and this hypercall an other
+ * (most likely) correctable error happened. The fetched data,
+ * does not match the original machine check data. */
+#define XEN_MC_NOMATCH 0x4
+
+/* OUT: DomU did not register MC NMI handler. Try something else. */
+#define XEN_MC_CANNOTHANDLE 0x8
+/* OUT: Notifying DomU failed. Retry later or try something else. */
+#define XEN_MC_NOTDELIVERED 0x10
+/* Note, XEN_MC_CANNOTHANDLE and XEN_MC_NOTDELIVERED are mutually exclusive. */
+
+
+#ifndef __ASSEMBLY__
+
+#define VIRQ_MCA VIRQ_ARCH_0 /* G. (DOM0) Machine Check Architecture */
+
+/*
+ * Machine Check Architecure:
+ * structs are read-only and used to report all kinds of
+ * correctable and uncorrectable errors detected by the HW.
+ * Dom0 and DomU: register a handler to get notified.
+ * Dom0 only: Correctable errors are reported via VIRQ_MCA
+ * Dom0 and DomU: Uncorrectable errors are reported via nmi handlers
+ */
+#define MC_TYPE_GLOBAL 0
+#define MC_TYPE_BANK 1
+#define MC_TYPE_EXTENDED 2
+
+struct mcinfo_common {
+ uint16_t type; /* structure type */
+ uint16_t size; /* size of this struct in bytes */
+};
+
+
+#define MC_FLAG_CORRECTABLE (1 << 0)
+#define MC_FLAG_UNCORRECTABLE (1 << 1)
+
+/* contains global x86 mc information */
+struct mcinfo_global {
+ struct mcinfo_common common;
+
+ /* running domain at the time in error (most likely the impacted one) */
+ uint16_t mc_domid;
+ uint32_t mc_socketid; /* physical socket of the physical core */
+ uint16_t mc_coreid; /* physical impacted core */
+ uint16_t mc_core_threadid; /* core thread of physical core */
+ uint16_t mc_vcpuid; /* virtual cpu scheduled for mc_domid */
+ uint64_t mc_gstatus; /* global status */
+ uint32_t mc_flags;
+};
+
+/* contains bank local x86 mc information */
+struct mcinfo_bank {
+ struct mcinfo_common common;
+
+ uint16_t mc_bank; /* bank nr */
+ uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
+ * and if mc_addr is valid. Never valid on DomU. */
+ uint64_t mc_status; /* bank status */
+ uint64_t mc_addr; /* bank address, only valid
+ * if addr bit is set in mc_status */
+ uint64_t mc_misc;
+};
+
+
+struct mcinfo_msr {
+ uint64_t reg; /* MSR */
+ uint64_t value; /* MSR value */
+};
+
+/* contains mc information from other
+ * or additional mc MSRs */
+struct mcinfo_extended {
+ struct mcinfo_common common;
+
+ /* You can fill up to five registers.
+ * If you need more, then use this structure
+ * multiple times. */
+
+ uint32_t mc_msrs; /* Number of msr with valid values. */
+ struct mcinfo_msr mc_msr[5];
+};
+
+#define MCINFO_HYPERCALLSIZE 1024
+#define MCINFO_MAXSIZE 768
+
+struct mc_info {
+ /* Number of mcinfo_* entries in mi_data */
+ uint32_t mi_nentries;
+
+ uint8_t mi_data[MCINFO_MAXSIZE - sizeof(uint32_t)];
+};
+typedef struct mc_info mc_info_t;
+
+
+
+/*
+ * OS's should use these instead of writing their own lookup function
+ * each with its own bugs and drawbacks.
+ * We use macros instead of static inline functions to allow guests
+ * to include this header in assembly files (*.S).
+ */
+/* Prototype:
+ * uint32_t x86_mcinfo_nentries(struct mc_info *mi);
+ */
+#define x86_mcinfo_nentries(_mi) \
+ (_mi)->mi_nentries
+/* Prototype:
+ * struct mcinfo_common *x86_mcinfo_first(struct mc_info *mi);
+ */
+#define x86_mcinfo_first(_mi) \
+ (struct mcinfo_common *)((_mi)->mi_data)
+/* Prototype:
+ * struct mcinfo_common *x86_mcinfo_next(struct mcinfo_common *mic);
+ */
+#define x86_mcinfo_next(_mic) \
+ (struct mcinfo_common *)((uint8_t *)(_mic) + (_mic)->size)
+
+/* Prototype:
+ * void x86_mcinfo_lookup(void *ret, struct mc_info *mi, uint16_t type);
+ */
+#define x86_mcinfo_lookup(_ret, _mi, _type) \
+ do { \
+ uint32_t found, i; \
+ struct mcinfo_common *_mic; \
+ \
+ found = 0; \
+ (_ret) = NULL; \
+ if (_mi == NULL) break; \
+ _mic = x86_mcinfo_first(_mi); \
+ for (i = 0; i < x86_mcinfo_nentries(_mi); i++) { \
+ if (_mic->type == (_type)) { \
+ found = 1; \
+ break; \
+ } \
+ _mic = x86_mcinfo_next(_mic); \
+ } \
+ (_ret) = found ? _mic : NULL; \
+ } while (0)
+
+
+/* Usecase 1
+ * Register machine check trap callback handler
+ * (already done via "set_trap_table" hypercall)
+ */
+
+/* Usecase 2
+ * Dom0 registers machine check event callback handler
+ * done by EVTCHNOP_bind_virq
+ */
+
+/* Usecase 3
+ * Fetch machine check data from hypervisor.
+ * Note, this hypercall is special, because both Dom0 and DomU must use this.
+ */
+#define XEN_MC_fetch 1
+struct xen_mc_fetch {
+ /* IN/OUT variables. */
+ uint32_t flags;
+
+/* IN: XEN_MC_CORRECTABLE, XEN_MC_TRAP */
+/* OUT: XEN_MC_OK, XEN_MC_FETCHFAILED, XEN_MC_NODATA, XEN_MC_NOMATCH */
+
+ /* OUT variables. */
+ uint32_t fetch_idx; /* only useful for Dom0 for the notify hypercall */
+ struct mc_info mc_info;
+};
+typedef struct xen_mc_fetch xen_mc_fetch_t;
+DEFINE_XEN_GUEST_HANDLE(xen_mc_fetch_t);
+
+
+/* Usecase 4
+ * This tells the hypervisor to notify a DomU about the machine check error
+ */
+#define XEN_MC_notifydomain 2
+struct xen_mc_notifydomain {
+ /* IN variables. */
+ uint16_t mc_domid; /* The unprivileged domain to notify. */
+ uint16_t mc_vcpuid; /* The vcpu in mc_domid to notify.
+ * Usually echo'd value from the fetch hypercall. */
+ uint32_t fetch_idx; /* echo'd value from the fetch hypercall. */
+
+ /* IN/OUT variables. */
+ uint32_t flags;
+
+/* IN: XEN_MC_CORRECTABLE, XEN_MC_TRAP */
+/* OUT: XEN_MC_OK, XEN_MC_CANNOTHANDLE, XEN_MC_NOTDELIVERED, XEN_MC_NOMATCH */
+};
+typedef struct xen_mc_notifydomain xen_mc_notifydomain_t;
+DEFINE_XEN_GUEST_HANDLE(xen_mc_notifydomain_t);
+
+
+struct xen_mc {
+ uint32_t cmd;
+ uint32_t interface_version; /* XEN_MCA_INTERFACE_VERSION */
+ union {
+ struct xen_mc_fetch mc_fetch;
+ struct xen_mc_notifydomain mc_notifydomain;
+ uint8_t pad[MCINFO_HYPERCALLSIZE];
+ } u;
+};
+typedef struct xen_mc xen_mc_t;
+DEFINE_XEN_GUEST_HANDLE(xen_mc_t);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __XEN_PUBLIC_ARCH_X86_MCA_H__ */
diff --git a/xen/public/arch-x86/xen-x86_32.h b/xen/public/arch-x86/xen-x86_32.h
new file mode 100644
index 0000000..7cb6a01
--- /dev/null
+++ b/xen/public/arch-x86/xen-x86_32.h
@@ -0,0 +1,180 @@
+/******************************************************************************
+ * xen-x86_32.h
+ *
+ * Guest OS interface to x86 32-bit Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004-2007, K A Fraser
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_XEN_X86_32_H__
+#define __XEN_PUBLIC_ARCH_X86_XEN_X86_32_H__
+
+/*
+ * Hypercall interface:
+ * Input: %ebx, %ecx, %edx, %esi, %edi (arguments 1-5)
+ * Output: %eax
+ * Access is via hypercall page (set up by guest loader or via a Xen MSR):
+ * call hypercall_page + hypercall-number * 32
+ * Clobbered: Argument registers (e.g., 2-arg hypercall clobbers %ebx,%ecx)
+ */
+
+#if __XEN_INTERFACE_VERSION__ < 0x00030203
+/*
+ * Legacy hypercall interface:
+ * As above, except the entry sequence to the hypervisor is:
+ * mov $hypercall-number*32,%eax ; int $0x82
+ */
+#define TRAP_INSTR "int $0x82"
+#endif
+
+/*
+ * These flat segments are in the Xen-private section of every GDT. Since these
+ * are also present in the initial GDT, many OSes will be able to avoid
+ * installing their own GDT.
+ */
+#define FLAT_RING1_CS 0xe019 /* GDT index 259 */
+#define FLAT_RING1_DS 0xe021 /* GDT index 260 */
+#define FLAT_RING1_SS 0xe021 /* GDT index 260 */
+#define FLAT_RING3_CS 0xe02b /* GDT index 261 */
+#define FLAT_RING3_DS 0xe033 /* GDT index 262 */
+#define FLAT_RING3_SS 0xe033 /* GDT index 262 */
+
+#define FLAT_KERNEL_CS FLAT_RING1_CS
+#define FLAT_KERNEL_DS FLAT_RING1_DS
+#define FLAT_KERNEL_SS FLAT_RING1_SS
+#define FLAT_USER_CS FLAT_RING3_CS
+#define FLAT_USER_DS FLAT_RING3_DS
+#define FLAT_USER_SS FLAT_RING3_SS
+
+#define __HYPERVISOR_VIRT_START_PAE 0xF5800000
+#define __MACH2PHYS_VIRT_START_PAE 0xF5800000
+#define __MACH2PHYS_VIRT_END_PAE 0xF6800000
+#define HYPERVISOR_VIRT_START_PAE \
+ mk_unsigned_long(__HYPERVISOR_VIRT_START_PAE)
+#define MACH2PHYS_VIRT_START_PAE \
+ mk_unsigned_long(__MACH2PHYS_VIRT_START_PAE)
+#define MACH2PHYS_VIRT_END_PAE \
+ mk_unsigned_long(__MACH2PHYS_VIRT_END_PAE)
+
+/* Non-PAE bounds are obsolete. */
+#define __HYPERVISOR_VIRT_START_NONPAE 0xFC000000
+#define __MACH2PHYS_VIRT_START_NONPAE 0xFC000000
+#define __MACH2PHYS_VIRT_END_NONPAE 0xFC400000
+#define HYPERVISOR_VIRT_START_NONPAE \
+ mk_unsigned_long(__HYPERVISOR_VIRT_START_NONPAE)
+#define MACH2PHYS_VIRT_START_NONPAE \
+ mk_unsigned_long(__MACH2PHYS_VIRT_START_NONPAE)
+#define MACH2PHYS_VIRT_END_NONPAE \
+ mk_unsigned_long(__MACH2PHYS_VIRT_END_NONPAE)
+
+#define __HYPERVISOR_VIRT_START __HYPERVISOR_VIRT_START_PAE
+#define __MACH2PHYS_VIRT_START __MACH2PHYS_VIRT_START_PAE
+#define __MACH2PHYS_VIRT_END __MACH2PHYS_VIRT_END_PAE
+
+#ifndef HYPERVISOR_VIRT_START
+#define HYPERVISOR_VIRT_START mk_unsigned_long(__HYPERVISOR_VIRT_START)
+#endif
+
+#define MACH2PHYS_VIRT_START mk_unsigned_long(__MACH2PHYS_VIRT_START)
+#define MACH2PHYS_VIRT_END mk_unsigned_long(__MACH2PHYS_VIRT_END)
+#define MACH2PHYS_NR_ENTRIES ((MACH2PHYS_VIRT_END-MACH2PHYS_VIRT_START)>>2)
+#ifndef machine_to_phys_mapping
+#define machine_to_phys_mapping ((unsigned long *)MACH2PHYS_VIRT_START)
+#endif
+
+/* 32-/64-bit invariability for control interfaces (domctl/sysctl). */
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+#undef ___DEFINE_XEN_GUEST_HANDLE
+#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \
+ typedef struct { type *p; } \
+ __guest_handle_ ## name; \
+ typedef struct { union { type *p; uint64_aligned_t q; }; } \
+ __guest_handle_64_ ## name
+#undef set_xen_guest_handle
+#define set_xen_guest_handle(hnd, val) \
+ do { if ( sizeof(hnd) == 8 ) *(uint64_t *)&(hnd) = 0; \
+ (hnd).p = val; \
+ } while ( 0 )
+#define uint64_aligned_t uint64_t __attribute__((aligned(8)))
+#define __XEN_GUEST_HANDLE_64(name) __guest_handle_64_ ## name
+#define XEN_GUEST_HANDLE_64(name) __XEN_GUEST_HANDLE_64(name)
+#endif
+
+#ifndef __ASSEMBLY__
+
+struct cpu_user_regs {
+ uint32_t ebx;
+ uint32_t ecx;
+ uint32_t edx;
+ uint32_t esi;
+ uint32_t edi;
+ uint32_t ebp;
+ uint32_t eax;
+ uint16_t error_code; /* private */
+ uint16_t entry_vector; /* private */
+ uint32_t eip;
+ uint16_t cs;
+ uint8_t saved_upcall_mask;
+ uint8_t _pad0;
+ uint32_t eflags; /* eflags.IF == !saved_upcall_mask */
+ uint32_t esp;
+ uint16_t ss, _pad1;
+ uint16_t es, _pad2;
+ uint16_t ds, _pad3;
+ uint16_t fs, _pad4;
+ uint16_t gs, _pad5;
+};
+typedef struct cpu_user_regs cpu_user_regs_t;
+DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
+
+/*
+ * Page-directory addresses above 4GB do not fit into architectural %cr3.
+ * When accessing %cr3, or equivalent field in vcpu_guest_context, guests
+ * must use the following accessor macros to pack/unpack valid MFNs.
+ */
+#define xen_pfn_to_cr3(pfn) (((unsigned)(pfn) << 12) | ((unsigned)(pfn) >> 20))
+#define xen_cr3_to_pfn(cr3) (((unsigned)(cr3) >> 12) | ((unsigned)(cr3) << 20))
+
+struct arch_vcpu_info {
+ unsigned long cr2;
+ unsigned long pad[5]; /* sizeof(vcpu_info_t) == 64 */
+};
+typedef struct arch_vcpu_info arch_vcpu_info_t;
+
+struct xen_callback {
+ unsigned long cs;
+ unsigned long eip;
+};
+typedef struct xen_callback xen_callback_t;
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __XEN_PUBLIC_ARCH_X86_XEN_X86_32_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/arch-x86/xen-x86_64.h b/xen/public/arch-x86/xen-x86_64.h
new file mode 100644
index 0000000..1e54cf9
--- /dev/null
+++ b/xen/public/arch-x86/xen-x86_64.h
@@ -0,0 +1,212 @@
+/******************************************************************************
+ * xen-x86_64.h
+ *
+ * Guest OS interface to x86 64-bit Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004-2006, K A Fraser
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_XEN_X86_64_H__
+#define __XEN_PUBLIC_ARCH_X86_XEN_X86_64_H__
+
+/*
+ * Hypercall interface:
+ * Input: %rdi, %rsi, %rdx, %r10, %r8 (arguments 1-5)
+ * Output: %rax
+ * Access is via hypercall page (set up by guest loader or via a Xen MSR):
+ * call hypercall_page + hypercall-number * 32
+ * Clobbered: argument registers (e.g., 2-arg hypercall clobbers %rdi,%rsi)
+ */
+
+#if __XEN_INTERFACE_VERSION__ < 0x00030203
+/*
+ * Legacy hypercall interface:
+ * As above, except the entry sequence to the hypervisor is:
+ * mov $hypercall-number*32,%eax ; syscall
+ * Clobbered: %rcx, %r11, argument registers (as above)
+ */
+#define TRAP_INSTR "syscall"
+#endif
+
+/*
+ * 64-bit segment selectors
+ * These flat segments are in the Xen-private section of every GDT. Since these
+ * are also present in the initial GDT, many OSes will be able to avoid
+ * installing their own GDT.
+ */
+
+#define FLAT_RING3_CS32 0xe023 /* GDT index 260 */
+#define FLAT_RING3_CS64 0xe033 /* GDT index 261 */
+#define FLAT_RING3_DS32 0xe02b /* GDT index 262 */
+#define FLAT_RING3_DS64 0x0000 /* NULL selector */
+#define FLAT_RING3_SS32 0xe02b /* GDT index 262 */
+#define FLAT_RING3_SS64 0xe02b /* GDT index 262 */
+
+#define FLAT_KERNEL_DS64 FLAT_RING3_DS64
+#define FLAT_KERNEL_DS32 FLAT_RING3_DS32
+#define FLAT_KERNEL_DS FLAT_KERNEL_DS64
+#define FLAT_KERNEL_CS64 FLAT_RING3_CS64
+#define FLAT_KERNEL_CS32 FLAT_RING3_CS32
+#define FLAT_KERNEL_CS FLAT_KERNEL_CS64
+#define FLAT_KERNEL_SS64 FLAT_RING3_SS64
+#define FLAT_KERNEL_SS32 FLAT_RING3_SS32
+#define FLAT_KERNEL_SS FLAT_KERNEL_SS64
+
+#define FLAT_USER_DS64 FLAT_RING3_DS64
+#define FLAT_USER_DS32 FLAT_RING3_DS32
+#define FLAT_USER_DS FLAT_USER_DS64
+#define FLAT_USER_CS64 FLAT_RING3_CS64
+#define FLAT_USER_CS32 FLAT_RING3_CS32
+#define FLAT_USER_CS FLAT_USER_CS64
+#define FLAT_USER_SS64 FLAT_RING3_SS64
+#define FLAT_USER_SS32 FLAT_RING3_SS32
+#define FLAT_USER_SS FLAT_USER_SS64
+
+#define __HYPERVISOR_VIRT_START 0xFFFF800000000000
+#define __HYPERVISOR_VIRT_END 0xFFFF880000000000
+#define __MACH2PHYS_VIRT_START 0xFFFF800000000000
+#define __MACH2PHYS_VIRT_END 0xFFFF804000000000
+
+#ifndef HYPERVISOR_VIRT_START
+#define HYPERVISOR_VIRT_START mk_unsigned_long(__HYPERVISOR_VIRT_START)
+#define HYPERVISOR_VIRT_END mk_unsigned_long(__HYPERVISOR_VIRT_END)
+#endif
+
+#define MACH2PHYS_VIRT_START mk_unsigned_long(__MACH2PHYS_VIRT_START)
+#define MACH2PHYS_VIRT_END mk_unsigned_long(__MACH2PHYS_VIRT_END)
+#define MACH2PHYS_NR_ENTRIES ((MACH2PHYS_VIRT_END-MACH2PHYS_VIRT_START)>>3)
+#ifndef machine_to_phys_mapping
+#define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)
+#endif
+
+/*
+ * int HYPERVISOR_set_segment_base(unsigned int which, unsigned long base)
+ * @which == SEGBASE_* ; @base == 64-bit base address
+ * Returns 0 on success.
+ */
+#define SEGBASE_FS 0
+#define SEGBASE_GS_USER 1
+#define SEGBASE_GS_KERNEL 2
+#define SEGBASE_GS_USER_SEL 3 /* Set user %gs specified in base[15:0] */
+
+/*
+ * int HYPERVISOR_iret(void)
+ * All arguments are on the kernel stack, in the following format.
+ * Never returns if successful. Current kernel context is lost.
+ * The saved CS is mapped as follows:
+ * RING0 -> RING3 kernel mode.
+ * RING1 -> RING3 kernel mode.
+ * RING2 -> RING3 kernel mode.
+ * RING3 -> RING3 user mode.
+ * However RING0 indicates that the guest kernel should return to iteself
+ * directly with
+ * orb $3,1*8(%rsp)
+ * iretq
+ * If flags contains VGCF_in_syscall:
+ * Restore RAX, RIP, RFLAGS, RSP.
+ * Discard R11, RCX, CS, SS.
+ * Otherwise:
+ * Restore RAX, R11, RCX, CS:RIP, RFLAGS, SS:RSP.
+ * All other registers are saved on hypercall entry and restored to user.
+ */
+/* Guest exited in SYSCALL context? Return to guest with SYSRET? */
+#define _VGCF_in_syscall 8
+#define VGCF_in_syscall (1<<_VGCF_in_syscall)
+#define VGCF_IN_SYSCALL VGCF_in_syscall
+
+#ifndef __ASSEMBLY__
+
+struct iret_context {
+ /* Top of stack (%rsp at point of hypercall). */
+ uint64_t rax, r11, rcx, flags, rip, cs, rflags, rsp, ss;
+ /* Bottom of iret stack frame. */
+};
+
+#if defined(__GNUC__) && !defined(__STRICT_ANSI__)
+/* Anonymous union includes both 32- and 64-bit names (e.g., eax/rax). */
+#define __DECL_REG(name) union { \
+ uint64_t r ## name, e ## name; \
+ uint32_t _e ## name; \
+}
+#else
+/* Non-gcc sources must always use the proper 64-bit name (e.g., rax). */
+#define __DECL_REG(name) uint64_t r ## name
+#endif
+
+struct cpu_user_regs {
+ uint64_t r15;
+ uint64_t r14;
+ uint64_t r13;
+ uint64_t r12;
+ __DECL_REG(bp);
+ __DECL_REG(bx);
+ uint64_t r11;
+ uint64_t r10;
+ uint64_t r9;
+ uint64_t r8;
+ __DECL_REG(ax);
+ __DECL_REG(cx);
+ __DECL_REG(dx);
+ __DECL_REG(si);
+ __DECL_REG(di);
+ uint32_t error_code; /* private */
+ uint32_t entry_vector; /* private */
+ __DECL_REG(ip);
+ uint16_t cs, _pad0[1];
+ uint8_t saved_upcall_mask;
+ uint8_t _pad1[3];
+ __DECL_REG(flags); /* rflags.IF == !saved_upcall_mask */
+ __DECL_REG(sp);
+ uint16_t ss, _pad2[3];
+ uint16_t es, _pad3[3];
+ uint16_t ds, _pad4[3];
+ uint16_t fs, _pad5[3]; /* Non-zero => takes precedence over fs_base. */
+ uint16_t gs, _pad6[3]; /* Non-zero => takes precedence over gs_base_usr. */
+};
+typedef struct cpu_user_regs cpu_user_regs_t;
+DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
+
+#undef __DECL_REG
+
+#define xen_pfn_to_cr3(pfn) ((unsigned long)(pfn) << 12)
+#define xen_cr3_to_pfn(cr3) ((unsigned long)(cr3) >> 12)
+
+struct arch_vcpu_info {
+ unsigned long cr2;
+ unsigned long pad; /* sizeof(vcpu_info_t) == 64 */
+};
+typedef struct arch_vcpu_info arch_vcpu_info_t;
+
+typedef unsigned long xen_callback_t;
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __XEN_PUBLIC_ARCH_X86_XEN_X86_64_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/arch-x86/xen.h b/xen/public/arch-x86/xen.h
new file mode 100644
index 0000000..084348f
--- /dev/null
+++ b/xen/public/arch-x86/xen.h
@@ -0,0 +1,204 @@
+/******************************************************************************
+ * arch-x86/xen.h
+ *
+ * Guest OS interface to x86 Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004-2006, K A Fraser
+ */
+
+#include "../xen.h"
+
+#ifndef __XEN_PUBLIC_ARCH_X86_XEN_H__
+#define __XEN_PUBLIC_ARCH_X86_XEN_H__
+
+/* Structural guest handles introduced in 0x00030201. */
+#if __XEN_INTERFACE_VERSION__ >= 0x00030201
+#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \
+ typedef struct { type *p; } __guest_handle_ ## name
+#else
+#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \
+ typedef type * __guest_handle_ ## name
+#endif
+
+#define __DEFINE_XEN_GUEST_HANDLE(name, type) \
+ ___DEFINE_XEN_GUEST_HANDLE(name, type); \
+ ___DEFINE_XEN_GUEST_HANDLE(const_##name, const type)
+#define DEFINE_XEN_GUEST_HANDLE(name) __DEFINE_XEN_GUEST_HANDLE(name, name)
+#define __XEN_GUEST_HANDLE(name) __guest_handle_ ## name
+#define XEN_GUEST_HANDLE(name) __XEN_GUEST_HANDLE(name)
+#define set_xen_guest_handle(hnd, val) do { (hnd).p = val; } while (0)
+#ifdef __XEN_TOOLS__
+#define get_xen_guest_handle(val, hnd) do { val = (hnd).p; } while (0)
+#endif
+
+#if defined(__i386__)
+#include "xen-x86_32.h"
+#elif defined(__x86_64__)
+#include "xen-x86_64.h"
+#endif
+
+#ifndef __ASSEMBLY__
+typedef unsigned long xen_pfn_t;
+#define PRI_xen_pfn "lx"
+#endif
+
+/*
+ * SEGMENT DESCRIPTOR TABLES
+ */
+/*
+ * A number of GDT entries are reserved by Xen. These are not situated at the
+ * start of the GDT because some stupid OSes export hard-coded selector values
+ * in their ABI. These hard-coded values are always near the start of the GDT,
+ * so Xen places itself out of the way, at the far end of the GDT.
+ */
+#define FIRST_RESERVED_GDT_PAGE 14
+#define FIRST_RESERVED_GDT_BYTE (FIRST_RESERVED_GDT_PAGE * 4096)
+#define FIRST_RESERVED_GDT_ENTRY (FIRST_RESERVED_GDT_BYTE / 8)
+
+/* Maximum number of virtual CPUs in multi-processor guests. */
+#define MAX_VIRT_CPUS 32
+
+
+/* Machine check support */
+#include "xen-mca.h"
+
+#ifndef __ASSEMBLY__
+
+typedef unsigned long xen_ulong_t;
+
+/*
+ * Send an array of these to HYPERVISOR_set_trap_table().
+ * The privilege level specifies which modes may enter a trap via a software
+ * interrupt. On x86/64, since rings 1 and 2 are unavailable, we allocate
+ * privilege levels as follows:
+ * Level == 0: Noone may enter
+ * Level == 1: Kernel may enter
+ * Level == 2: Kernel may enter
+ * Level == 3: Everyone may enter
+ */
+#define TI_GET_DPL(_ti) ((_ti)->flags & 3)
+#define TI_GET_IF(_ti) ((_ti)->flags & 4)
+#define TI_SET_DPL(_ti,_dpl) ((_ti)->flags |= (_dpl))
+#define TI_SET_IF(_ti,_if) ((_ti)->flags |= ((!!(_if))<<2))
+struct trap_info {
+ uint8_t vector; /* exception vector */
+ uint8_t flags; /* 0-3: privilege level; 4: clear event enable? */
+ uint16_t cs; /* code selector */
+ unsigned long address; /* code offset */
+};
+typedef struct trap_info trap_info_t;
+DEFINE_XEN_GUEST_HANDLE(trap_info_t);
+
+typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */
+
+/*
+ * The following is all CPU context. Note that the fpu_ctxt block is filled
+ * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used.
+ */
+struct vcpu_guest_context {
+ /* FPU registers come first so they can be aligned for FXSAVE/FXRSTOR. */
+ struct { char x[512]; } fpu_ctxt; /* User-level FPU registers */
+#define VGCF_I387_VALID (1<<0)
+#define VGCF_IN_KERNEL (1<<2)
+#define _VGCF_i387_valid 0
+#define VGCF_i387_valid (1<<_VGCF_i387_valid)
+#define _VGCF_in_kernel 2
+#define VGCF_in_kernel (1<<_VGCF_in_kernel)
+#define _VGCF_failsafe_disables_events 3
+#define VGCF_failsafe_disables_events (1<<_VGCF_failsafe_disables_events)
+#define _VGCF_syscall_disables_events 4
+#define VGCF_syscall_disables_events (1<<_VGCF_syscall_disables_events)
+#define _VGCF_online 5
+#define VGCF_online (1<<_VGCF_online)
+ unsigned long flags; /* VGCF_* flags */
+ struct cpu_user_regs user_regs; /* User-level CPU registers */
+ struct trap_info trap_ctxt[256]; /* Virtual IDT */
+ unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */
+ unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */
+ unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */
+ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */
+ unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */
+ unsigned long debugreg[8]; /* DB0-DB7 (debug registers) */
+#ifdef __i386__
+ unsigned long event_callback_cs; /* CS:EIP of event callback */
+ unsigned long event_callback_eip;
+ unsigned long failsafe_callback_cs; /* CS:EIP of failsafe callback */
+ unsigned long failsafe_callback_eip;
+#else
+ unsigned long event_callback_eip;
+ unsigned long failsafe_callback_eip;
+#ifdef __XEN__
+ union {
+ unsigned long syscall_callback_eip;
+ struct {
+ unsigned int event_callback_cs; /* compat CS of event cb */
+ unsigned int failsafe_callback_cs; /* compat CS of failsafe cb */
+ };
+ };
+#else
+ unsigned long syscall_callback_eip;
+#endif
+#endif
+ unsigned long vm_assist; /* VMASST_TYPE_* bitmap */
+#ifdef __x86_64__
+ /* Segment base addresses. */
+ uint64_t fs_base;
+ uint64_t gs_base_kernel;
+ uint64_t gs_base_user;
+#endif
+};
+typedef struct vcpu_guest_context vcpu_guest_context_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t);
+
+struct arch_shared_info {
+ unsigned long max_pfn; /* max pfn that appears in table */
+ /* Frame containing list of mfns containing list of mfns containing p2m. */
+ xen_pfn_t pfn_to_mfn_frame_list_list;
+ unsigned long nmi_reason;
+ uint64_t pad[32];
+};
+typedef struct arch_shared_info arch_shared_info_t;
+
+#endif /* !__ASSEMBLY__ */
+
+/*
+ * Prefix forces emulation of some non-trapping instructions.
+ * Currently only CPUID.
+ */
+#ifdef __ASSEMBLY__
+#define XEN_EMULATE_PREFIX .byte 0x0f,0x0b,0x78,0x65,0x6e ;
+#define XEN_CPUID XEN_EMULATE_PREFIX cpuid
+#else
+#define XEN_EMULATE_PREFIX ".byte 0x0f,0x0b,0x78,0x65,0x6e ; "
+#define XEN_CPUID XEN_EMULATE_PREFIX "cpuid"
+#endif
+
+#endif /* __XEN_PUBLIC_ARCH_X86_XEN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/arch-x86_32.h b/xen/public/arch-x86_32.h
new file mode 100644
index 0000000..45842b2
--- /dev/null
+++ b/xen/public/arch-x86_32.h
@@ -0,0 +1,27 @@
+/******************************************************************************
+ * arch-x86_32.h
+ *
+ * Guest OS interface to x86 32-bit Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004-2006, K A Fraser
+ */
+
+#include "arch-x86/xen.h"
diff --git a/xen/public/arch-x86_64.h b/xen/public/arch-x86_64.h
new file mode 100644
index 0000000..fbb2639
--- /dev/null
+++ b/xen/public/arch-x86_64.h
@@ -0,0 +1,27 @@
+/******************************************************************************
+ * arch-x86_64.h
+ *
+ * Guest OS interface to x86 64-bit Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004-2006, K A Fraser
+ */
+
+#include "arch-x86/xen.h"
diff --git a/xen/public/callback.h b/xen/public/callback.h
new file mode 100644
index 0000000..f4962f6
--- /dev/null
+++ b/xen/public/callback.h
@@ -0,0 +1,121 @@
+/******************************************************************************
+ * callback.h
+ *
+ * Register guest OS callbacks with Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2006, Ian Campbell
+ */
+
+#ifndef __XEN_PUBLIC_CALLBACK_H__
+#define __XEN_PUBLIC_CALLBACK_H__
+
+#include "xen.h"
+
+/*
+ * Prototype for this hypercall is:
+ * long callback_op(int cmd, void *extra_args)
+ * @cmd == CALLBACKOP_??? (callback operation).
+ * @extra_args == Operation-specific extra arguments (NULL if none).
+ */
+
+/* ia64, x86: Callback for event delivery. */
+#define CALLBACKTYPE_event 0
+
+/* x86: Failsafe callback when guest state cannot be restored by Xen. */
+#define CALLBACKTYPE_failsafe 1
+
+/* x86/64 hypervisor: Syscall by 64-bit guest app ('64-on-64-on-64'). */
+#define CALLBACKTYPE_syscall 2
+
+/*
+ * x86/32 hypervisor: Only available on x86/32 when supervisor_mode_kernel
+ * feature is enabled. Do not use this callback type in new code.
+ */
+#define CALLBACKTYPE_sysenter_deprecated 3
+
+/* x86: Callback for NMI delivery. */
+#define CALLBACKTYPE_nmi 4
+
+/*
+ * x86: sysenter is only available as follows:
+ * - 32-bit hypervisor: with the supervisor_mode_kernel feature enabled
+ * - 64-bit hypervisor: 32-bit guest applications on Intel CPUs
+ * ('32-on-32-on-64', '32-on-64-on-64')
+ * [nb. also 64-bit guest applications on Intel CPUs
+ * ('64-on-64-on-64'), but syscall is preferred]
+ */
+#define CALLBACKTYPE_sysenter 5
+
+/*
+ * x86/64 hypervisor: Syscall by 32-bit guest app on AMD CPUs
+ * ('32-on-32-on-64', '32-on-64-on-64')
+ */
+#define CALLBACKTYPE_syscall32 7
+
+/*
+ * Disable event deliver during callback? This flag is ignored for event and
+ * NMI callbacks: event delivery is unconditionally disabled.
+ */
+#define _CALLBACKF_mask_events 0
+#define CALLBACKF_mask_events (1U << _CALLBACKF_mask_events)
+
+/*
+ * Register a callback.
+ */
+#define CALLBACKOP_register 0
+struct callback_register {
+ uint16_t type;
+ uint16_t flags;
+ xen_callback_t address;
+};
+typedef struct callback_register callback_register_t;
+DEFINE_XEN_GUEST_HANDLE(callback_register_t);
+
+/*
+ * Unregister a callback.
+ *
+ * Not all callbacks can be unregistered. -EINVAL will be returned if
+ * you attempt to unregister such a callback.
+ */
+#define CALLBACKOP_unregister 1
+struct callback_unregister {
+ uint16_t type;
+ uint16_t _unused;
+};
+typedef struct callback_unregister callback_unregister_t;
+DEFINE_XEN_GUEST_HANDLE(callback_unregister_t);
+
+#if __XEN_INTERFACE_VERSION__ < 0x00030207
+#undef CALLBACKTYPE_sysenter
+#define CALLBACKTYPE_sysenter CALLBACKTYPE_sysenter_deprecated
+#endif
+
+#endif /* __XEN_PUBLIC_CALLBACK_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/dom0_ops.h b/xen/public/dom0_ops.h
new file mode 100644
index 0000000..5d2b324
--- /dev/null
+++ b/xen/public/dom0_ops.h
@@ -0,0 +1,120 @@
+/******************************************************************************
+ * dom0_ops.h
+ *
+ * Process command requests from domain-0 guest OS.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2002-2003, B Dragovic
+ * Copyright (c) 2002-2006, K Fraser
+ */
+
+#ifndef __XEN_PUBLIC_DOM0_OPS_H__
+#define __XEN_PUBLIC_DOM0_OPS_H__
+
+#include "xen.h"
+#include "platform.h"
+
+#if __XEN_INTERFACE_VERSION__ >= 0x00030204
+#error "dom0_ops.h is a compatibility interface only"
+#endif
+
+#define DOM0_INTERFACE_VERSION XENPF_INTERFACE_VERSION
+
+#define DOM0_SETTIME XENPF_settime
+#define dom0_settime xenpf_settime
+#define dom0_settime_t xenpf_settime_t
+
+#define DOM0_ADD_MEMTYPE XENPF_add_memtype
+#define dom0_add_memtype xenpf_add_memtype
+#define dom0_add_memtype_t xenpf_add_memtype_t
+
+#define DOM0_DEL_MEMTYPE XENPF_del_memtype
+#define dom0_del_memtype xenpf_del_memtype
+#define dom0_del_memtype_t xenpf_del_memtype_t
+
+#define DOM0_READ_MEMTYPE XENPF_read_memtype
+#define dom0_read_memtype xenpf_read_memtype
+#define dom0_read_memtype_t xenpf_read_memtype_t
+
+#define DOM0_MICROCODE XENPF_microcode_update
+#define dom0_microcode xenpf_microcode_update
+#define dom0_microcode_t xenpf_microcode_update_t
+
+#define DOM0_PLATFORM_QUIRK XENPF_platform_quirk
+#define dom0_platform_quirk xenpf_platform_quirk
+#define dom0_platform_quirk_t xenpf_platform_quirk_t
+
+typedef uint64_t cpumap_t;
+
+/* Unsupported legacy operation -- defined for API compatibility. */
+#define DOM0_MSR 15
+struct dom0_msr {
+ /* IN variables. */
+ uint32_t write;
+ cpumap_t cpu_mask;
+ uint32_t msr;
+ uint32_t in1;
+ uint32_t in2;
+ /* OUT variables. */
+ uint32_t out1;
+ uint32_t out2;
+};
+typedef struct dom0_msr dom0_msr_t;
+DEFINE_XEN_GUEST_HANDLE(dom0_msr_t);
+
+/* Unsupported legacy operation -- defined for API compatibility. */
+#define DOM0_PHYSICAL_MEMORY_MAP 40
+struct dom0_memory_map_entry {
+ uint64_t start, end;
+ uint32_t flags; /* reserved */
+ uint8_t is_ram;
+};
+typedef struct dom0_memory_map_entry dom0_memory_map_entry_t;
+DEFINE_XEN_GUEST_HANDLE(dom0_memory_map_entry_t);
+
+struct dom0_op {
+ uint32_t cmd;
+ uint32_t interface_version; /* DOM0_INTERFACE_VERSION */
+ union {
+ struct dom0_msr msr;
+ struct dom0_settime settime;
+ struct dom0_add_memtype add_memtype;
+ struct dom0_del_memtype del_memtype;
+ struct dom0_read_memtype read_memtype;
+ struct dom0_microcode microcode;
+ struct dom0_platform_quirk platform_quirk;
+ struct dom0_memory_map_entry physical_memory_map;
+ uint8_t pad[128];
+ } u;
+};
+typedef struct dom0_op dom0_op_t;
+DEFINE_XEN_GUEST_HANDLE(dom0_op_t);
+
+#endif /* __XEN_PUBLIC_DOM0_OPS_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/domctl.h b/xen/public/domctl.h
new file mode 100644
index 0000000..b7075ac
--- /dev/null
+++ b/xen/public/domctl.h
@@ -0,0 +1,680 @@
+/******************************************************************************
+ * domctl.h
+ *
+ * Domain management operations. For use by node control stack.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2002-2003, B Dragovic
+ * Copyright (c) 2002-2006, K Fraser
+ */
+
+#ifndef __XEN_PUBLIC_DOMCTL_H__
+#define __XEN_PUBLIC_DOMCTL_H__
+
+#if !defined(__XEN__) && !defined(__XEN_TOOLS__)
+#error "domctl operations are intended for use by node control tools only"
+#endif
+
+#include "xen.h"
+
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000005
+
+struct xenctl_cpumap {
+ XEN_GUEST_HANDLE_64(uint8) bitmap;
+ uint32_t nr_cpus;
+};
+
+/*
+ * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
+ * If it is specified as zero, an id is auto-allocated and returned.
+ */
+#define XEN_DOMCTL_createdomain 1
+struct xen_domctl_createdomain {
+ /* IN parameters */
+ uint32_t ssidref;
+ xen_domain_handle_t handle;
+ /* Is this an HVM guest (as opposed to a PV guest)? */
+#define _XEN_DOMCTL_CDF_hvm_guest 0
+#define XEN_DOMCTL_CDF_hvm_guest (1U<<_XEN_DOMCTL_CDF_hvm_guest)
+ /* Use hardware-assisted paging if available? */
+#define _XEN_DOMCTL_CDF_hap 1
+#define XEN_DOMCTL_CDF_hap (1U<<_XEN_DOMCTL_CDF_hap)
+ uint32_t flags;
+};
+typedef struct xen_domctl_createdomain xen_domctl_createdomain_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_createdomain_t);
+
+#define XEN_DOMCTL_destroydomain 2
+#define XEN_DOMCTL_pausedomain 3
+#define XEN_DOMCTL_unpausedomain 4
+#define XEN_DOMCTL_resumedomain 27
+
+#define XEN_DOMCTL_getdomaininfo 5
+struct xen_domctl_getdomaininfo {
+ /* OUT variables. */
+ domid_t domain; /* Also echoed in domctl.domain */
+ /* Domain is scheduled to die. */
+#define _XEN_DOMINF_dying 0
+#define XEN_DOMINF_dying (1U<<_XEN_DOMINF_dying)
+ /* Domain is an HVM guest (as opposed to a PV guest). */
+#define _XEN_DOMINF_hvm_guest 1
+#define XEN_DOMINF_hvm_guest (1U<<_XEN_DOMINF_hvm_guest)
+ /* The guest OS has shut down. */
+#define _XEN_DOMINF_shutdown 2
+#define XEN_DOMINF_shutdown (1U<<_XEN_DOMINF_shutdown)
+ /* Currently paused by control software. */
+#define _XEN_DOMINF_paused 3
+#define XEN_DOMINF_paused (1U<<_XEN_DOMINF_paused)
+ /* Currently blocked pending an event. */
+#define _XEN_DOMINF_blocked 4
+#define XEN_DOMINF_blocked (1U<<_XEN_DOMINF_blocked)
+ /* Domain is currently running. */
+#define _XEN_DOMINF_running 5
+#define XEN_DOMINF_running (1U<<_XEN_DOMINF_running)
+ /* Being debugged. */
+#define _XEN_DOMINF_debugged 6
+#define XEN_DOMINF_debugged (1U<<_XEN_DOMINF_debugged)
+ /* CPU to which this domain is bound. */
+#define XEN_DOMINF_cpumask 255
+#define XEN_DOMINF_cpushift 8
+ /* XEN_DOMINF_shutdown guest-supplied code. */
+#define XEN_DOMINF_shutdownmask 255
+#define XEN_DOMINF_shutdownshift 16
+ uint32_t flags; /* XEN_DOMINF_* */
+ uint64_aligned_t tot_pages;
+ uint64_aligned_t max_pages;
+ uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
+ uint64_aligned_t cpu_time;
+ uint32_t nr_online_vcpus; /* Number of VCPUs currently online. */
+ uint32_t max_vcpu_id; /* Maximum VCPUID in use by this domain. */
+ uint32_t ssidref;
+ xen_domain_handle_t handle;
+};
+typedef struct xen_domctl_getdomaininfo xen_domctl_getdomaininfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
+
+
+#define XEN_DOMCTL_getmemlist 6
+struct xen_domctl_getmemlist {
+ /* IN variables. */
+ /* Max entries to write to output buffer. */
+ uint64_aligned_t max_pfns;
+ /* Start index in guest's page list. */
+ uint64_aligned_t start_pfn;
+ XEN_GUEST_HANDLE_64(uint64) buffer;
+ /* OUT variables. */
+ uint64_aligned_t num_pfns;
+};
+typedef struct xen_domctl_getmemlist xen_domctl_getmemlist_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_getmemlist_t);
+
+
+#define XEN_DOMCTL_getpageframeinfo 7
+
+#define XEN_DOMCTL_PFINFO_LTAB_SHIFT 28
+#define XEN_DOMCTL_PFINFO_NOTAB (0x0U<<28)
+#define XEN_DOMCTL_PFINFO_L1TAB (0x1U<<28)
+#define XEN_DOMCTL_PFINFO_L2TAB (0x2U<<28)
+#define XEN_DOMCTL_PFINFO_L3TAB (0x3U<<28)
+#define XEN_DOMCTL_PFINFO_L4TAB (0x4U<<28)
+#define XEN_DOMCTL_PFINFO_LTABTYPE_MASK (0x7U<<28)
+#define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31)
+#define XEN_DOMCTL_PFINFO_XTAB (0xfU<<28) /* invalid page */
+#define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
+
+struct xen_domctl_getpageframeinfo {
+ /* IN variables. */
+ uint64_aligned_t gmfn; /* GMFN to query */
+ /* OUT variables. */
+ /* Is the page PINNED to a type? */
+ uint32_t type; /* see above type defs */
+};
+typedef struct xen_domctl_getpageframeinfo xen_domctl_getpageframeinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_getpageframeinfo_t);
+
+
+#define XEN_DOMCTL_getpageframeinfo2 8
+struct xen_domctl_getpageframeinfo2 {
+ /* IN variables. */
+ uint64_aligned_t num;
+ /* IN/OUT variables. */
+ XEN_GUEST_HANDLE_64(uint32) array;
+};
+typedef struct xen_domctl_getpageframeinfo2 xen_domctl_getpageframeinfo2_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_getpageframeinfo2_t);
+
+
+/*
+ * Control shadow pagetables operation
+ */
+#define XEN_DOMCTL_shadow_op 10
+
+/* Disable shadow mode. */
+#define XEN_DOMCTL_SHADOW_OP_OFF 0
+
+/* Enable shadow mode (mode contains ORed XEN_DOMCTL_SHADOW_ENABLE_* flags). */
+#define XEN_DOMCTL_SHADOW_OP_ENABLE 32
+
+/* Log-dirty bitmap operations. */
+ /* Return the bitmap and clean internal copy for next round. */
+#define XEN_DOMCTL_SHADOW_OP_CLEAN 11
+ /* Return the bitmap but do not modify internal copy. */
+#define XEN_DOMCTL_SHADOW_OP_PEEK 12
+
+/* Memory allocation accessors. */
+#define XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION 30
+#define XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION 31
+
+/* Legacy enable operations. */
+ /* Equiv. to ENABLE with no mode flags. */
+#define XEN_DOMCTL_SHADOW_OP_ENABLE_TEST 1
+ /* Equiv. to ENABLE with mode flag ENABLE_LOG_DIRTY. */
+#define XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY 2
+ /* Equiv. to ENABLE with mode flags ENABLE_REFCOUNT and ENABLE_TRANSLATE. */
+#define XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE 3
+
+/* Mode flags for XEN_DOMCTL_SHADOW_OP_ENABLE. */
+ /*
+ * Shadow pagetables are refcounted: guest does not use explicit mmu
+ * operations nor write-protect its pagetables.
+ */
+#define XEN_DOMCTL_SHADOW_ENABLE_REFCOUNT (1 << 1)
+ /*
+ * Log pages in a bitmap as they are dirtied.
+ * Used for live relocation to determine which pages must be re-sent.
+ */
+#define XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY (1 << 2)
+ /*
+ * Automatically translate GPFNs into MFNs.
+ */
+#define XEN_DOMCTL_SHADOW_ENABLE_TRANSLATE (1 << 3)
+ /*
+ * Xen does not steal virtual address space from the guest.
+ * Requires HVM support.
+ */
+#define XEN_DOMCTL_SHADOW_ENABLE_EXTERNAL (1 << 4)
+
+struct xen_domctl_shadow_op_stats {
+ uint32_t fault_count;
+ uint32_t dirty_count;
+};
+typedef struct xen_domctl_shadow_op_stats xen_domctl_shadow_op_stats_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_shadow_op_stats_t);
+
+struct xen_domctl_shadow_op {
+ /* IN variables. */
+ uint32_t op; /* XEN_DOMCTL_SHADOW_OP_* */
+
+ /* OP_ENABLE */
+ uint32_t mode; /* XEN_DOMCTL_SHADOW_ENABLE_* */
+
+ /* OP_GET_ALLOCATION / OP_SET_ALLOCATION */
+ uint32_t mb; /* Shadow memory allocation in MB */
+
+ /* OP_PEEK / OP_CLEAN */
+ XEN_GUEST_HANDLE_64(uint8) dirty_bitmap;
+ uint64_aligned_t pages; /* Size of buffer. Updated with actual size. */
+ struct xen_domctl_shadow_op_stats stats;
+};
+typedef struct xen_domctl_shadow_op xen_domctl_shadow_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_shadow_op_t);
+
+
+#define XEN_DOMCTL_max_mem 11
+struct xen_domctl_max_mem {
+ /* IN variables. */
+ uint64_aligned_t max_memkb;
+};
+typedef struct xen_domctl_max_mem xen_domctl_max_mem_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_mem_t);
+
+
+#define XEN_DOMCTL_setvcpucontext 12
+#define XEN_DOMCTL_getvcpucontext 13
+struct xen_domctl_vcpucontext {
+ uint32_t vcpu; /* IN */
+ XEN_GUEST_HANDLE_64(vcpu_guest_context_t) ctxt; /* IN/OUT */
+};
+typedef struct xen_domctl_vcpucontext xen_domctl_vcpucontext_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpucontext_t);
+
+
+#define XEN_DOMCTL_getvcpuinfo 14
+struct xen_domctl_getvcpuinfo {
+ /* IN variables. */
+ uint32_t vcpu;
+ /* OUT variables. */
+ uint8_t online; /* currently online (not hotplugged)? */
+ uint8_t blocked; /* blocked waiting for an event? */
+ uint8_t running; /* currently scheduled on its CPU? */
+ uint64_aligned_t cpu_time; /* total cpu time consumed (ns) */
+ uint32_t cpu; /* current mapping */
+};
+typedef struct xen_domctl_getvcpuinfo xen_domctl_getvcpuinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t);
+
+
+/* Get/set which physical cpus a vcpu can execute on. */
+#define XEN_DOMCTL_setvcpuaffinity 9
+#define XEN_DOMCTL_getvcpuaffinity 25
+struct xen_domctl_vcpuaffinity {
+ uint32_t vcpu; /* IN */
+ struct xenctl_cpumap cpumap; /* IN/OUT */
+};
+typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t);
+
+
+#define XEN_DOMCTL_max_vcpus 15
+struct xen_domctl_max_vcpus {
+ uint32_t max; /* maximum number of vcpus */
+};
+typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
+
+
+#define XEN_DOMCTL_scheduler_op 16
+/* Scheduler types. */
+#define XEN_SCHEDULER_SEDF 4
+#define XEN_SCHEDULER_CREDIT 5
+/* Set or get info? */
+#define XEN_DOMCTL_SCHEDOP_putinfo 0
+#define XEN_DOMCTL_SCHEDOP_getinfo 1
+struct xen_domctl_scheduler_op {
+ uint32_t sched_id; /* XEN_SCHEDULER_* */
+ uint32_t cmd; /* XEN_DOMCTL_SCHEDOP_* */
+ union {
+ struct xen_domctl_sched_sedf {
+ uint64_aligned_t period;
+ uint64_aligned_t slice;
+ uint64_aligned_t latency;
+ uint32_t extratime;
+ uint32_t weight;
+ } sedf;
+ struct xen_domctl_sched_credit {
+ uint16_t weight;
+ uint16_t cap;
+ } credit;
+ } u;
+};
+typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_scheduler_op_t);
+
+
+#define XEN_DOMCTL_setdomainhandle 17
+struct xen_domctl_setdomainhandle {
+ xen_domain_handle_t handle;
+};
+typedef struct xen_domctl_setdomainhandle xen_domctl_setdomainhandle_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_setdomainhandle_t);
+
+
+#define XEN_DOMCTL_setdebugging 18
+struct xen_domctl_setdebugging {
+ uint8_t enable;
+};
+typedef struct xen_domctl_setdebugging xen_domctl_setdebugging_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_setdebugging_t);
+
+
+#define XEN_DOMCTL_irq_permission 19
+struct xen_domctl_irq_permission {
+ uint8_t pirq;
+ uint8_t allow_access; /* flag to specify enable/disable of IRQ access */
+};
+typedef struct xen_domctl_irq_permission xen_domctl_irq_permission_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_irq_permission_t);
+
+
+#define XEN_DOMCTL_iomem_permission 20
+struct xen_domctl_iomem_permission {
+ uint64_aligned_t first_mfn;/* first page (physical page number) in range */
+ uint64_aligned_t nr_mfns; /* number of pages in range (>0) */
+ uint8_t allow_access; /* allow (!0) or deny (0) access to range? */
+};
+typedef struct xen_domctl_iomem_permission xen_domctl_iomem_permission_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_iomem_permission_t);
+
+
+#define XEN_DOMCTL_ioport_permission 21
+struct xen_domctl_ioport_permission {
+ uint32_t first_port; /* first port int range */
+ uint32_t nr_ports; /* size of port range */
+ uint8_t allow_access; /* allow or deny access to range? */
+};
+typedef struct xen_domctl_ioport_permission xen_domctl_ioport_permission_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_ioport_permission_t);
+
+
+#define XEN_DOMCTL_hypercall_init 22
+struct xen_domctl_hypercall_init {
+ uint64_aligned_t gmfn; /* GMFN to be initialised */
+};
+typedef struct xen_domctl_hypercall_init xen_domctl_hypercall_init_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_hypercall_init_t);
+
+
+#define XEN_DOMCTL_arch_setup 23
+#define _XEN_DOMAINSETUP_hvm_guest 0
+#define XEN_DOMAINSETUP_hvm_guest (1UL<<_XEN_DOMAINSETUP_hvm_guest)
+#define _XEN_DOMAINSETUP_query 1 /* Get parameters (for save) */
+#define XEN_DOMAINSETUP_query (1UL<<_XEN_DOMAINSETUP_query)
+#define _XEN_DOMAINSETUP_sioemu_guest 2
+#define XEN_DOMAINSETUP_sioemu_guest (1UL<<_XEN_DOMAINSETUP_sioemu_guest)
+typedef struct xen_domctl_arch_setup {
+ uint64_aligned_t flags; /* XEN_DOMAINSETUP_* */
+#ifdef __ia64__
+ uint64_aligned_t bp; /* mpaddr of boot param area */
+ uint64_aligned_t maxmem; /* Highest memory address for MDT. */
+ uint64_aligned_t xsi_va; /* Xen shared_info area virtual address. */
+ uint32_t hypercall_imm; /* Break imm for Xen hypercalls. */
+ int8_t vhpt_size_log2; /* Log2 of VHPT size. */
+#endif
+} xen_domctl_arch_setup_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_arch_setup_t);
+
+
+#define XEN_DOMCTL_settimeoffset 24
+struct xen_domctl_settimeoffset {
+ int32_t time_offset_seconds; /* applied to domain wallclock time */
+};
+typedef struct xen_domctl_settimeoffset xen_domctl_settimeoffset_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_settimeoffset_t);
+
+
+#define XEN_DOMCTL_gethvmcontext 33
+#define XEN_DOMCTL_sethvmcontext 34
+typedef struct xen_domctl_hvmcontext {
+ uint32_t size; /* IN/OUT: size of buffer / bytes filled */
+ XEN_GUEST_HANDLE_64(uint8) buffer; /* IN/OUT: data, or call
+ * gethvmcontext with NULL
+ * buffer to get size req'd */
+} xen_domctl_hvmcontext_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_hvmcontext_t);
+
+
+#define XEN_DOMCTL_set_address_size 35
+#define XEN_DOMCTL_get_address_size 36
+typedef struct xen_domctl_address_size {
+ uint32_t size;
+} xen_domctl_address_size_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_address_size_t);
+
+
+#define XEN_DOMCTL_real_mode_area 26
+struct xen_domctl_real_mode_area {
+ uint32_t log; /* log2 of Real Mode Area size */
+};
+typedef struct xen_domctl_real_mode_area xen_domctl_real_mode_area_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_real_mode_area_t);
+
+
+#define XEN_DOMCTL_sendtrigger 28
+#define XEN_DOMCTL_SENDTRIGGER_NMI 0
+#define XEN_DOMCTL_SENDTRIGGER_RESET 1
+#define XEN_DOMCTL_SENDTRIGGER_INIT 2
+struct xen_domctl_sendtrigger {
+ uint32_t trigger; /* IN */
+ uint32_t vcpu; /* IN */
+};
+typedef struct xen_domctl_sendtrigger xen_domctl_sendtrigger_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_sendtrigger_t);
+
+
+/* Assign PCI device to HVM guest. Sets up IOMMU structures. */
+#define XEN_DOMCTL_assign_device 37
+#define XEN_DOMCTL_test_assign_device 45
+#define XEN_DOMCTL_deassign_device 47
+struct xen_domctl_assign_device {
+ uint32_t machine_bdf; /* machine PCI ID of assigned device */
+};
+typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
+
+/* Retrieve sibling devices infomation of machine_bdf */
+#define XEN_DOMCTL_get_device_group 50
+struct xen_domctl_get_device_group {
+ uint32_t machine_bdf; /* IN */
+ uint32_t max_sdevs; /* IN */
+ uint32_t num_sdevs; /* OUT */
+ XEN_GUEST_HANDLE_64(uint32) sdev_array; /* OUT */
+};
+typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
+
+/* Pass-through interrupts: bind real irq -> hvm devfn. */
+#define XEN_DOMCTL_bind_pt_irq 38
+#define XEN_DOMCTL_unbind_pt_irq 48
+typedef enum pt_irq_type_e {
+ PT_IRQ_TYPE_PCI,
+ PT_IRQ_TYPE_ISA,
+ PT_IRQ_TYPE_MSI,
+} pt_irq_type_t;
+struct xen_domctl_bind_pt_irq {
+ uint32_t machine_irq;
+ pt_irq_type_t irq_type;
+ uint32_t hvm_domid;
+
+ union {
+ struct {
+ uint8_t isa_irq;
+ } isa;
+ struct {
+ uint8_t bus;
+ uint8_t device;
+ uint8_t intx;
+ } pci;
+ struct {
+ uint8_t gvec;
+ uint32_t gflags;
+ } msi;
+ } u;
+};
+typedef struct xen_domctl_bind_pt_irq xen_domctl_bind_pt_irq_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_bind_pt_irq_t);
+
+
+/* Bind machine I/O address range -> HVM address range. */
+#define XEN_DOMCTL_memory_mapping 39
+#define DPCI_ADD_MAPPING 1
+#define DPCI_REMOVE_MAPPING 0
+struct xen_domctl_memory_mapping {
+ uint64_aligned_t first_gfn; /* first page (hvm guest phys page) in range */
+ uint64_aligned_t first_mfn; /* first page (machine page) in range */
+ uint64_aligned_t nr_mfns; /* number of pages in range (>0) */
+ uint32_t add_mapping; /* add or remove mapping */
+ uint32_t padding; /* padding for 64-bit aligned structure */
+};
+typedef struct xen_domctl_memory_mapping xen_domctl_memory_mapping_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_memory_mapping_t);
+
+
+/* Bind machine I/O port range -> HVM I/O port range. */
+#define XEN_DOMCTL_ioport_mapping 40
+struct xen_domctl_ioport_mapping {
+ uint32_t first_gport; /* first guest IO port*/
+ uint32_t first_mport; /* first machine IO port */
+ uint32_t nr_ports; /* size of port range */
+ uint32_t add_mapping; /* add or remove mapping */
+};
+typedef struct xen_domctl_ioport_mapping xen_domctl_ioport_mapping_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_ioport_mapping_t);
+
+
+/*
+ * Pin caching type of RAM space for x86 HVM domU.
+ */
+#define XEN_DOMCTL_pin_mem_cacheattr 41
+/* Caching types: these happen to be the same as x86 MTRR/PAT type codes. */
+#define XEN_DOMCTL_MEM_CACHEATTR_UC 0
+#define XEN_DOMCTL_MEM_CACHEATTR_WC 1
+#define XEN_DOMCTL_MEM_CACHEATTR_WT 4
+#define XEN_DOMCTL_MEM_CACHEATTR_WP 5
+#define XEN_DOMCTL_MEM_CACHEATTR_WB 6
+#define XEN_DOMCTL_MEM_CACHEATTR_UCM 7
+struct xen_domctl_pin_mem_cacheattr {
+ uint64_aligned_t start, end;
+ unsigned int type; /* XEN_DOMCTL_MEM_CACHEATTR_* */
+};
+typedef struct xen_domctl_pin_mem_cacheattr xen_domctl_pin_mem_cacheattr_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_pin_mem_cacheattr_t);
+
+
+#define XEN_DOMCTL_set_ext_vcpucontext 42
+#define XEN_DOMCTL_get_ext_vcpucontext 43
+struct xen_domctl_ext_vcpucontext {
+ /* IN: VCPU that this call applies to. */
+ uint32_t vcpu;
+ /*
+ * SET: Size of struct (IN)
+ * GET: Size of struct (OUT)
+ */
+ uint32_t size;
+#if defined(__i386__) || defined(__x86_64__)
+ /* SYSCALL from 32-bit mode and SYSENTER callback information. */
+ /* NB. SYSCALL from 64-bit mode is contained in vcpu_guest_context_t */
+ uint64_aligned_t syscall32_callback_eip;
+ uint64_aligned_t sysenter_callback_eip;
+ uint16_t syscall32_callback_cs;
+ uint16_t sysenter_callback_cs;
+ uint8_t syscall32_disables_events;
+ uint8_t sysenter_disables_events;
+#endif
+};
+typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_ext_vcpucontext_t);
+
+/*
+ * Set optimizaton features for a domain
+ */
+#define XEN_DOMCTL_set_opt_feature 44
+struct xen_domctl_set_opt_feature {
+#if defined(__ia64__)
+ struct xen_ia64_opt_feature optf;
+#else
+ /* Make struct non-empty: do not depend on this field name! */
+ uint64_t dummy;
+#endif
+};
+typedef struct xen_domctl_set_opt_feature xen_domctl_set_opt_feature_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_opt_feature_t);
+
+/*
+ * Set the target domain for a domain
+ */
+#define XEN_DOMCTL_set_target 46
+struct xen_domctl_set_target {
+ domid_t target;
+};
+typedef struct xen_domctl_set_target xen_domctl_set_target_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_target_t);
+
+#if defined(__i386__) || defined(__x86_64__)
+# define XEN_CPUID_INPUT_UNUSED 0xFFFFFFFF
+# define XEN_DOMCTL_set_cpuid 49
+struct xen_domctl_cpuid {
+ unsigned int input[2];
+ unsigned int eax;
+ unsigned int ebx;
+ unsigned int ecx;
+ unsigned int edx;
+};
+typedef struct xen_domctl_cpuid xen_domctl_cpuid_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_cpuid_t);
+#endif
+
+#define XEN_DOMCTL_subscribe 29
+struct xen_domctl_subscribe {
+ uint32_t port; /* IN */
+};
+typedef struct xen_domctl_subscribe xen_domctl_subscribe_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_subscribe_t);
+
+/*
+ * Define the maximum machine address size which should be allocated
+ * to a guest.
+ */
+#define XEN_DOMCTL_set_machine_address_size 51
+#define XEN_DOMCTL_get_machine_address_size 52
+
+/*
+ * Do not inject spurious page faults into this domain.
+ */
+#define XEN_DOMCTL_suppress_spurious_page_faults 53
+
+struct xen_domctl {
+ uint32_t cmd;
+ uint32_t interface_version; /* XEN_DOMCTL_INTERFACE_VERSION */
+ domid_t domain;
+ union {
+ struct xen_domctl_createdomain createdomain;
+ struct xen_domctl_getdomaininfo getdomaininfo;
+ struct xen_domctl_getmemlist getmemlist;
+ struct xen_domctl_getpageframeinfo getpageframeinfo;
+ struct xen_domctl_getpageframeinfo2 getpageframeinfo2;
+ struct xen_domctl_vcpuaffinity vcpuaffinity;
+ struct xen_domctl_shadow_op shadow_op;
+ struct xen_domctl_max_mem max_mem;
+ struct xen_domctl_vcpucontext vcpucontext;
+ struct xen_domctl_getvcpuinfo getvcpuinfo;
+ struct xen_domctl_max_vcpus max_vcpus;
+ struct xen_domctl_scheduler_op scheduler_op;
+ struct xen_domctl_setdomainhandle setdomainhandle;
+ struct xen_domctl_setdebugging setdebugging;
+ struct xen_domctl_irq_permission irq_permission;
+ struct xen_domctl_iomem_permission iomem_permission;
+ struct xen_domctl_ioport_permission ioport_permission;
+ struct xen_domctl_hypercall_init hypercall_init;
+ struct xen_domctl_arch_setup arch_setup;
+ struct xen_domctl_settimeoffset settimeoffset;
+ struct xen_domctl_real_mode_area real_mode_area;
+ struct xen_domctl_hvmcontext hvmcontext;
+ struct xen_domctl_address_size address_size;
+ struct xen_domctl_sendtrigger sendtrigger;
+ struct xen_domctl_get_device_group get_device_group;
+ struct xen_domctl_assign_device assign_device;
+ struct xen_domctl_bind_pt_irq bind_pt_irq;
+ struct xen_domctl_memory_mapping memory_mapping;
+ struct xen_domctl_ioport_mapping ioport_mapping;
+ struct xen_domctl_pin_mem_cacheattr pin_mem_cacheattr;
+ struct xen_domctl_ext_vcpucontext ext_vcpucontext;
+ struct xen_domctl_set_opt_feature set_opt_feature;
+ struct xen_domctl_set_target set_target;
+ struct xen_domctl_subscribe subscribe;
+#if defined(__i386__) || defined(__x86_64__)
+ struct xen_domctl_cpuid cpuid;
+#endif
+ uint8_t pad[128];
+ } u;
+};
+typedef struct xen_domctl xen_domctl_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_t);
+
+#endif /* __XEN_PUBLIC_DOMCTL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/elfnote.h b/xen/public/elfnote.h
new file mode 100644
index 0000000..77be41b
--- /dev/null
+++ b/xen/public/elfnote.h
@@ -0,0 +1,233 @@
+/******************************************************************************
+ * elfnote.h
+ *
+ * Definitions used for the Xen ELF notes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2006, Ian Campbell, XenSource Ltd.
+ */
+
+#ifndef __XEN_PUBLIC_ELFNOTE_H__
+#define __XEN_PUBLIC_ELFNOTE_H__
+
+/*
+ * The notes should live in a PT_NOTE segment and have "Xen" in the
+ * name field.
+ *
+ * Numeric types are either 4 or 8 bytes depending on the content of
+ * the desc field.
+ *
+ * LEGACY indicated the fields in the legacy __xen_guest string which
+ * this a note type replaces.
+ */
+
+/*
+ * NAME=VALUE pair (string).
+ */
+#define XEN_ELFNOTE_INFO 0
+
+/*
+ * The virtual address of the entry point (numeric).
+ *
+ * LEGACY: VIRT_ENTRY
+ */
+#define XEN_ELFNOTE_ENTRY 1
+
+/* The virtual address of the hypercall transfer page (numeric).
+ *
+ * LEGACY: HYPERCALL_PAGE. (n.b. legacy value is a physical page
+ * number not a virtual address)
+ */
+#define XEN_ELFNOTE_HYPERCALL_PAGE 2
+
+/* The virtual address where the kernel image should be mapped (numeric).
+ *
+ * Defaults to 0.
+ *
+ * LEGACY: VIRT_BASE
+ */
+#define XEN_ELFNOTE_VIRT_BASE 3
+
+/*
+ * The offset of the ELF paddr field from the acutal required
+ * psuedo-physical address (numeric).
+ *
+ * This is used to maintain backwards compatibility with older kernels
+ * which wrote __PAGE_OFFSET into that field. This field defaults to 0
+ * if not present.
+ *
+ * LEGACY: ELF_PADDR_OFFSET. (n.b. legacy default is VIRT_BASE)
+ */
+#define XEN_ELFNOTE_PADDR_OFFSET 4
+
+/*
+ * The version of Xen that we work with (string).
+ *
+ * LEGACY: XEN_VER
+ */
+#define XEN_ELFNOTE_XEN_VERSION 5
+
+/*
+ * The name of the guest operating system (string).
+ *
+ * LEGACY: GUEST_OS
+ */
+#define XEN_ELFNOTE_GUEST_OS 6
+
+/*
+ * The version of the guest operating system (string).
+ *
+ * LEGACY: GUEST_VER
+ */
+#define XEN_ELFNOTE_GUEST_VERSION 7
+
+/*
+ * The loader type (string).
+ *
+ * LEGACY: LOADER
+ */
+#define XEN_ELFNOTE_LOADER 8
+
+/*
+ * The kernel supports PAE (x86/32 only, string = "yes", "no" or
+ * "bimodal").
+ *
+ * For compatibility with Xen 3.0.3 and earlier the "bimodal" setting
+ * may be given as "yes,bimodal" which will cause older Xen to treat
+ * this kernel as PAE.
+ *
+ * LEGACY: PAE (n.b. The legacy interface included a provision to
+ * indicate 'extended-cr3' support allowing L3 page tables to be
+ * placed above 4G. It is assumed that any kernel new enough to use
+ * these ELF notes will include this and therefore "yes" here is
+ * equivalent to "yes[entended-cr3]" in the __xen_guest interface.
+ */
+#define XEN_ELFNOTE_PAE_MODE 9
+
+/*
+ * The features supported/required by this kernel (string).
+ *
+ * The string must consist of a list of feature names (as given in
+ * features.h, without the "XENFEAT_" prefix) separated by '|'
+ * characters. If a feature is required for the kernel to function
+ * then the feature name must be preceded by a '!' character.
+ *
+ * LEGACY: FEATURES
+ */
+#define XEN_ELFNOTE_FEATURES 10
+
+/*
+ * The kernel requires the symbol table to be loaded (string = "yes" or "no")
+ * LEGACY: BSD_SYMTAB (n.b. The legacy treated the presence or absence
+ * of this string as a boolean flag rather than requiring "yes" or
+ * "no".
+ */
+#define XEN_ELFNOTE_BSD_SYMTAB 11
+
+/*
+ * The lowest address the hypervisor hole can begin at (numeric).
+ *
+ * This must not be set higher than HYPERVISOR_VIRT_START. Its presence
+ * also indicates to the hypervisor that the kernel can deal with the
+ * hole starting at a higher address.
+ */
+#define XEN_ELFNOTE_HV_START_LOW 12
+
+/*
+ * List of maddr_t-sized mask/value pairs describing how to recognize
+ * (non-present) L1 page table entries carrying valid MFNs (numeric).
+ */
+#define XEN_ELFNOTE_L1_MFN_VALID 13
+
+/*
+ * Whether or not the guest supports cooperative suspend cancellation.
+ */
+#define XEN_ELFNOTE_SUSPEND_CANCEL 14
+
+/*
+ * The number of the highest elfnote defined.
+ */
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_SUSPEND_CANCEL
+
+/*
+ * System information exported through crash notes.
+ *
+ * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_INFO
+ * note in case of a system crash. This note will contain various
+ * information about the system, see xen/include/xen/elfcore.h.
+ */
+#define XEN_ELFNOTE_CRASH_INFO 0x1000001
+
+/*
+ * System registers exported through crash notes.
+ *
+ * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_REGS
+ * note per cpu in case of a system crash. This note is architecture
+ * specific and will contain registers not saved in the "CORE" note.
+ * See xen/include/xen/elfcore.h for more information.
+ */
+#define XEN_ELFNOTE_CRASH_REGS 0x1000002
+
+
+/*
+ * xen dump-core none note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_NONE
+ * in its dump file to indicate that the file is xen dump-core
+ * file. This note doesn't have any other information.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_NONE 0x2000000
+
+/*
+ * xen dump-core header note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_HEADER
+ * in its dump file.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_HEADER 0x2000001
+
+/*
+ * xen dump-core xen version note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_XEN_VERSION
+ * in its dump file. It contains the xen version obtained via the
+ * XENVER hypercall.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_XEN_VERSION 0x2000002
+
+/*
+ * xen dump-core format version note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION
+ * in its dump file. It contains a format version identifier.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION 0x2000003
+
+#endif /* __XEN_PUBLIC_ELFNOTE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/elfstructs.h b/xen/public/elfstructs.h
new file mode 100644
index 0000000..77362f3
--- /dev/null
+++ b/xen/public/elfstructs.h
@@ -0,0 +1,527 @@
+#ifndef __XEN_PUBLIC_ELFSTRUCTS_H__
+#define __XEN_PUBLIC_ELFSTRUCTS_H__ 1
+/*
+ * Copyright (c) 1995, 1996 Erik Theisen. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+typedef uint8_t Elf_Byte;
+
+typedef uint32_t Elf32_Addr; /* Unsigned program address */
+typedef uint32_t Elf32_Off; /* Unsigned file offset */
+typedef int32_t Elf32_Sword; /* Signed large integer */
+typedef uint32_t Elf32_Word; /* Unsigned large integer */
+typedef uint16_t Elf32_Half; /* Unsigned medium integer */
+
+typedef uint64_t Elf64_Addr;
+typedef uint64_t Elf64_Off;
+typedef int32_t Elf64_Shalf;
+
+typedef int32_t Elf64_Sword;
+typedef uint32_t Elf64_Word;
+
+typedef int64_t Elf64_Sxword;
+typedef uint64_t Elf64_Xword;
+
+typedef uint32_t Elf64_Half;
+typedef uint16_t Elf64_Quarter;
+
+/*
+ * e_ident[] identification indexes
+ * See http://www.caldera.com/developers/gabi/2000-07-17/ch4.eheader.html
+ */
+#define EI_MAG0 0 /* file ID */
+#define EI_MAG1 1 /* file ID */
+#define EI_MAG2 2 /* file ID */
+#define EI_MAG3 3 /* file ID */
+#define EI_CLASS 4 /* file class */
+#define EI_DATA 5 /* data encoding */
+#define EI_VERSION 6 /* ELF header version */
+#define EI_OSABI 7 /* OS/ABI ID */
+#define EI_ABIVERSION 8 /* ABI version */
+#define EI_PAD 9 /* start of pad bytes */
+#define EI_NIDENT 16 /* Size of e_ident[] */
+
+/* e_ident[] magic number */
+#define ELFMAG0 0x7f /* e_ident[EI_MAG0] */
+#define ELFMAG1 'E' /* e_ident[EI_MAG1] */
+#define ELFMAG2 'L' /* e_ident[EI_MAG2] */
+#define ELFMAG3 'F' /* e_ident[EI_MAG3] */
+#define ELFMAG "\177ELF" /* magic */
+#define SELFMAG 4 /* size of magic */
+
+/* e_ident[] file class */
+#define ELFCLASSNONE 0 /* invalid */
+#define ELFCLASS32 1 /* 32-bit objs */
+#define ELFCLASS64 2 /* 64-bit objs */
+#define ELFCLASSNUM 3 /* number of classes */
+
+/* e_ident[] data encoding */
+#define ELFDATANONE 0 /* invalid */
+#define ELFDATA2LSB 1 /* Little-Endian */
+#define ELFDATA2MSB 2 /* Big-Endian */
+#define ELFDATANUM 3 /* number of data encode defines */
+
+/* e_ident[] Operating System/ABI */
+#define ELFOSABI_SYSV 0 /* UNIX System V ABI */
+#define ELFOSABI_HPUX 1 /* HP-UX operating system */
+#define ELFOSABI_NETBSD 2 /* NetBSD */
+#define ELFOSABI_LINUX 3 /* GNU/Linux */
+#define ELFOSABI_HURD 4 /* GNU/Hurd */
+#define ELFOSABI_86OPEN 5 /* 86Open common IA32 ABI */
+#define ELFOSABI_SOLARIS 6 /* Solaris */
+#define ELFOSABI_MONTEREY 7 /* Monterey */
+#define ELFOSABI_IRIX 8 /* IRIX */
+#define ELFOSABI_FREEBSD 9 /* FreeBSD */
+#define ELFOSABI_TRU64 10 /* TRU64 UNIX */
+#define ELFOSABI_MODESTO 11 /* Novell Modesto */
+#define ELFOSABI_OPENBSD 12 /* OpenBSD */
+#define ELFOSABI_ARM 97 /* ARM */
+#define ELFOSABI_STANDALONE 255 /* Standalone (embedded) application */
+
+/* e_ident */
+#define IS_ELF(ehdr) ((ehdr).e_ident[EI_MAG0] == ELFMAG0 && \
+ (ehdr).e_ident[EI_MAG1] == ELFMAG1 && \
+ (ehdr).e_ident[EI_MAG2] == ELFMAG2 && \
+ (ehdr).e_ident[EI_MAG3] == ELFMAG3)
+
+/* ELF Header */
+typedef struct elfhdr {
+ unsigned char e_ident[EI_NIDENT]; /* ELF Identification */
+ Elf32_Half e_type; /* object file type */
+ Elf32_Half e_machine; /* machine */
+ Elf32_Word e_version; /* object file version */
+ Elf32_Addr e_entry; /* virtual entry point */
+ Elf32_Off e_phoff; /* program header table offset */
+ Elf32_Off e_shoff; /* section header table offset */
+ Elf32_Word e_flags; /* processor-specific flags */
+ Elf32_Half e_ehsize; /* ELF header size */
+ Elf32_Half e_phentsize; /* program header entry size */
+ Elf32_Half e_phnum; /* number of program header entries */
+ Elf32_Half e_shentsize; /* section header entry size */
+ Elf32_Half e_shnum; /* number of section header entries */
+ Elf32_Half e_shstrndx; /* section header table's "section
+ header string table" entry offset */
+} Elf32_Ehdr;
+
+typedef struct {
+ unsigned char e_ident[EI_NIDENT]; /* Id bytes */
+ Elf64_Quarter e_type; /* file type */
+ Elf64_Quarter e_machine; /* machine type */
+ Elf64_Half e_version; /* version number */
+ Elf64_Addr e_entry; /* entry point */
+ Elf64_Off e_phoff; /* Program hdr offset */
+ Elf64_Off e_shoff; /* Section hdr offset */
+ Elf64_Half e_flags; /* Processor flags */
+ Elf64_Quarter e_ehsize; /* sizeof ehdr */
+ Elf64_Quarter e_phentsize; /* Program header entry size */
+ Elf64_Quarter e_phnum; /* Number of program headers */
+ Elf64_Quarter e_shentsize; /* Section header entry size */
+ Elf64_Quarter e_shnum; /* Number of section headers */
+ Elf64_Quarter e_shstrndx; /* String table index */
+} Elf64_Ehdr;
+
+/* e_type */
+#define ET_NONE 0 /* No file type */
+#define ET_REL 1 /* relocatable file */
+#define ET_EXEC 2 /* executable file */
+#define ET_DYN 3 /* shared object file */
+#define ET_CORE 4 /* core file */
+#define ET_NUM 5 /* number of types */
+#define ET_LOPROC 0xff00 /* reserved range for processor */
+#define ET_HIPROC 0xffff /* specific e_type */
+
+/* e_machine */
+#define EM_NONE 0 /* No Machine */
+#define EM_M32 1 /* AT&T WE 32100 */
+#define EM_SPARC 2 /* SPARC */
+#define EM_386 3 /* Intel 80386 */
+#define EM_68K 4 /* Motorola 68000 */
+#define EM_88K 5 /* Motorola 88000 */
+#define EM_486 6 /* Intel 80486 - unused? */
+#define EM_860 7 /* Intel 80860 */
+#define EM_MIPS 8 /* MIPS R3000 Big-Endian only */
+/*
+ * Don't know if EM_MIPS_RS4_BE,
+ * EM_SPARC64, EM_PARISC,
+ * or EM_PPC are ABI compliant
+ */
+#define EM_MIPS_RS4_BE 10 /* MIPS R4000 Big-Endian */
+#define EM_SPARC64 11 /* SPARC v9 64-bit unoffical */
+#define EM_PARISC 15 /* HPPA */
+#define EM_SPARC32PLUS 18 /* Enhanced instruction set SPARC */
+#define EM_PPC 20 /* PowerPC */
+#define EM_PPC64 21 /* PowerPC 64-bit */
+#define EM_ARM 40 /* Advanced RISC Machines ARM */
+#define EM_ALPHA 41 /* DEC ALPHA */
+#define EM_SPARCV9 43 /* SPARC version 9 */
+#define EM_ALPHA_EXP 0x9026 /* DEC ALPHA */
+#define EM_IA_64 50 /* Intel Merced */
+#define EM_X86_64 62 /* AMD x86-64 architecture */
+#define EM_VAX 75 /* DEC VAX */
+
+/* Version */
+#define EV_NONE 0 /* Invalid */
+#define EV_CURRENT 1 /* Current */
+#define EV_NUM 2 /* number of versions */
+
+/* Section Header */
+typedef struct {
+ Elf32_Word sh_name; /* name - index into section header
+ string table section */
+ Elf32_Word sh_type; /* type */
+ Elf32_Word sh_flags; /* flags */
+ Elf32_Addr sh_addr; /* address */
+ Elf32_Off sh_offset; /* file offset */
+ Elf32_Word sh_size; /* section size */
+ Elf32_Word sh_link; /* section header table index link */
+ Elf32_Word sh_info; /* extra information */
+ Elf32_Word sh_addralign; /* address alignment */
+ Elf32_Word sh_entsize; /* section entry size */
+} Elf32_Shdr;
+
+typedef struct {
+ Elf64_Half sh_name; /* section name */
+ Elf64_Half sh_type; /* section type */
+ Elf64_Xword sh_flags; /* section flags */
+ Elf64_Addr sh_addr; /* virtual address */
+ Elf64_Off sh_offset; /* file offset */
+ Elf64_Xword sh_size; /* section size */
+ Elf64_Half sh_link; /* link to another */
+ Elf64_Half sh_info; /* misc info */
+ Elf64_Xword sh_addralign; /* memory alignment */
+ Elf64_Xword sh_entsize; /* table entry size */
+} Elf64_Shdr;
+
+/* Special Section Indexes */
+#define SHN_UNDEF 0 /* undefined */
+#define SHN_LORESERVE 0xff00 /* lower bounds of reserved indexes */
+#define SHN_LOPROC 0xff00 /* reserved range for processor */
+#define SHN_HIPROC 0xff1f /* specific section indexes */
+#define SHN_ABS 0xfff1 /* absolute value */
+#define SHN_COMMON 0xfff2 /* common symbol */
+#define SHN_HIRESERVE 0xffff /* upper bounds of reserved indexes */
+
+/* sh_type */
+#define SHT_NULL 0 /* inactive */
+#define SHT_PROGBITS 1 /* program defined information */
+#define SHT_SYMTAB 2 /* symbol table section */
+#define SHT_STRTAB 3 /* string table section */
+#define SHT_RELA 4 /* relocation section with addends*/
+#define SHT_HASH 5 /* symbol hash table section */
+#define SHT_DYNAMIC 6 /* dynamic section */
+#define SHT_NOTE 7 /* note section */
+#define SHT_NOBITS 8 /* no space section */
+#define SHT_REL 9 /* relation section without addends */
+#define SHT_SHLIB 10 /* reserved - purpose unknown */
+#define SHT_DYNSYM 11 /* dynamic symbol table section */
+#define SHT_NUM 12 /* number of section types */
+#define SHT_LOPROC 0x70000000 /* reserved range for processor */
+#define SHT_HIPROC 0x7fffffff /* specific section header types */
+#define SHT_LOUSER 0x80000000 /* reserved range for application */
+#define SHT_HIUSER 0xffffffff /* specific indexes */
+
+/* Section names */
+#define ELF_BSS ".bss" /* uninitialized data */
+#define ELF_DATA ".data" /* initialized data */
+#define ELF_DEBUG ".debug" /* debug */
+#define ELF_DYNAMIC ".dynamic" /* dynamic linking information */
+#define ELF_DYNSTR ".dynstr" /* dynamic string table */
+#define ELF_DYNSYM ".dynsym" /* dynamic symbol table */
+#define ELF_FINI ".fini" /* termination code */
+#define ELF_GOT ".got" /* global offset table */
+#define ELF_HASH ".hash" /* symbol hash table */
+#define ELF_INIT ".init" /* initialization code */
+#define ELF_REL_DATA ".rel.data" /* relocation data */
+#define ELF_REL_FINI ".rel.fini" /* relocation termination code */
+#define ELF_REL_INIT ".rel.init" /* relocation initialization code */
+#define ELF_REL_DYN ".rel.dyn" /* relocaltion dynamic link info */
+#define ELF_REL_RODATA ".rel.rodata" /* relocation read-only data */
+#define ELF_REL_TEXT ".rel.text" /* relocation code */
+#define ELF_RODATA ".rodata" /* read-only data */
+#define ELF_SHSTRTAB ".shstrtab" /* section header string table */
+#define ELF_STRTAB ".strtab" /* string table */
+#define ELF_SYMTAB ".symtab" /* symbol table */
+#define ELF_TEXT ".text" /* code */
+
+
+/* Section Attribute Flags - sh_flags */
+#define SHF_WRITE 0x1 /* Writable */
+#define SHF_ALLOC 0x2 /* occupies memory */
+#define SHF_EXECINSTR 0x4 /* executable */
+#define SHF_MASKPROC 0xf0000000 /* reserved bits for processor */
+ /* specific section attributes */
+
+/* Symbol Table Entry */
+typedef struct elf32_sym {
+ Elf32_Word st_name; /* name - index into string table */
+ Elf32_Addr st_value; /* symbol value */
+ Elf32_Word st_size; /* symbol size */
+ unsigned char st_info; /* type and binding */
+ unsigned char st_other; /* 0 - no defined meaning */
+ Elf32_Half st_shndx; /* section header index */
+} Elf32_Sym;
+
+typedef struct {
+ Elf64_Half st_name; /* Symbol name index in str table */
+ Elf_Byte st_info; /* type / binding attrs */
+ Elf_Byte st_other; /* unused */
+ Elf64_Quarter st_shndx; /* section index of symbol */
+ Elf64_Xword st_value; /* value of symbol */
+ Elf64_Xword st_size; /* size of symbol */
+} Elf64_Sym;
+
+/* Symbol table index */
+#define STN_UNDEF 0 /* undefined */
+
+/* Extract symbol info - st_info */
+#define ELF32_ST_BIND(x) ((x) >> 4)
+#define ELF32_ST_TYPE(x) (((unsigned int) x) & 0xf)
+#define ELF32_ST_INFO(b,t) (((b) << 4) + ((t) & 0xf))
+
+#define ELF64_ST_BIND(x) ((x) >> 4)
+#define ELF64_ST_TYPE(x) (((unsigned int) x) & 0xf)
+#define ELF64_ST_INFO(b,t) (((b) << 4) + ((t) & 0xf))
+
+/* Symbol Binding - ELF32_ST_BIND - st_info */
+#define STB_LOCAL 0 /* Local symbol */
+#define STB_GLOBAL 1 /* Global symbol */
+#define STB_WEAK 2 /* like global - lower precedence */
+#define STB_NUM 3 /* number of symbol bindings */
+#define STB_LOPROC 13 /* reserved range for processor */
+#define STB_HIPROC 15 /* specific symbol bindings */
+
+/* Symbol type - ELF32_ST_TYPE - st_info */
+#define STT_NOTYPE 0 /* not specified */
+#define STT_OBJECT 1 /* data object */
+#define STT_FUNC 2 /* function */
+#define STT_SECTION 3 /* section */
+#define STT_FILE 4 /* file */
+#define STT_NUM 5 /* number of symbol types */
+#define STT_LOPROC 13 /* reserved range for processor */
+#define STT_HIPROC 15 /* specific symbol types */
+
+/* Relocation entry with implicit addend */
+typedef struct {
+ Elf32_Addr r_offset; /* offset of relocation */
+ Elf32_Word r_info; /* symbol table index and type */
+} Elf32_Rel;
+
+/* Relocation entry with explicit addend */
+typedef struct {
+ Elf32_Addr r_offset; /* offset of relocation */
+ Elf32_Word r_info; /* symbol table index and type */
+ Elf32_Sword r_addend;
+} Elf32_Rela;
+
+/* Extract relocation info - r_info */
+#define ELF32_R_SYM(i) ((i) >> 8)
+#define ELF32_R_TYPE(i) ((unsigned char) (i))
+#define ELF32_R_INFO(s,t) (((s) << 8) + (unsigned char)(t))
+
+typedef struct {
+ Elf64_Xword r_offset; /* where to do it */
+ Elf64_Xword r_info; /* index & type of relocation */
+} Elf64_Rel;
+
+typedef struct {
+ Elf64_Xword r_offset; /* where to do it */
+ Elf64_Xword r_info; /* index & type of relocation */
+ Elf64_Sxword r_addend; /* adjustment value */
+} Elf64_Rela;
+
+#define ELF64_R_SYM(info) ((info) >> 32)
+#define ELF64_R_TYPE(info) ((info) & 0xFFFFFFFF)
+#define ELF64_R_INFO(s,t) (((s) << 32) + (u_int32_t)(t))
+
+/* Program Header */
+typedef struct {
+ Elf32_Word p_type; /* segment type */
+ Elf32_Off p_offset; /* segment offset */
+ Elf32_Addr p_vaddr; /* virtual address of segment */
+ Elf32_Addr p_paddr; /* physical address - ignored? */
+ Elf32_Word p_filesz; /* number of bytes in file for seg. */
+ Elf32_Word p_memsz; /* number of bytes in mem. for seg. */
+ Elf32_Word p_flags; /* flags */
+ Elf32_Word p_align; /* memory alignment */
+} Elf32_Phdr;
+
+typedef struct {
+ Elf64_Half p_type; /* entry type */
+ Elf64_Half p_flags; /* flags */
+ Elf64_Off p_offset; /* offset */
+ Elf64_Addr p_vaddr; /* virtual address */
+ Elf64_Addr p_paddr; /* physical address */
+ Elf64_Xword p_filesz; /* file size */
+ Elf64_Xword p_memsz; /* memory size */
+ Elf64_Xword p_align; /* memory & file alignment */
+} Elf64_Phdr;
+
+/* Segment types - p_type */
+#define PT_NULL 0 /* unused */
+#define PT_LOAD 1 /* loadable segment */
+#define PT_DYNAMIC 2 /* dynamic linking section */
+#define PT_INTERP 3 /* the RTLD */
+#define PT_NOTE 4 /* auxiliary information */
+#define PT_SHLIB 5 /* reserved - purpose undefined */
+#define PT_PHDR 6 /* program header */
+#define PT_NUM 7 /* Number of segment types */
+#define PT_LOPROC 0x70000000 /* reserved range for processor */
+#define PT_HIPROC 0x7fffffff /* specific segment types */
+
+/* Segment flags - p_flags */
+#define PF_X 0x1 /* Executable */
+#define PF_W 0x2 /* Writable */
+#define PF_R 0x4 /* Readable */
+#define PF_MASKPROC 0xf0000000 /* reserved bits for processor */
+ /* specific segment flags */
+
+/* Dynamic structure */
+typedef struct {
+ Elf32_Sword d_tag; /* controls meaning of d_val */
+ union {
+ Elf32_Word d_val; /* Multiple meanings - see d_tag */
+ Elf32_Addr d_ptr; /* program virtual address */
+ } d_un;
+} Elf32_Dyn;
+
+typedef struct {
+ Elf64_Xword d_tag; /* controls meaning of d_val */
+ union {
+ Elf64_Addr d_ptr;
+ Elf64_Xword d_val;
+ } d_un;
+} Elf64_Dyn;
+
+/* Dynamic Array Tags - d_tag */
+#define DT_NULL 0 /* marks end of _DYNAMIC array */
+#define DT_NEEDED 1 /* string table offset of needed lib */
+#define DT_PLTRELSZ 2 /* size of relocation entries in PLT */
+#define DT_PLTGOT 3 /* address PLT/GOT */
+#define DT_HASH 4 /* address of symbol hash table */
+#define DT_STRTAB 5 /* address of string table */
+#define DT_SYMTAB 6 /* address of symbol table */
+#define DT_RELA 7 /* address of relocation table */
+#define DT_RELASZ 8 /* size of relocation table */
+#define DT_RELAENT 9 /* size of relocation entry */
+#define DT_STRSZ 10 /* size of string table */
+#define DT_SYMENT 11 /* size of symbol table entry */
+#define DT_INIT 12 /* address of initialization func. */
+#define DT_FINI 13 /* address of termination function */
+#define DT_SONAME 14 /* string table offset of shared obj */
+#define DT_RPATH 15 /* string table offset of library
+ search path */
+#define DT_SYMBOLIC 16 /* start sym search in shared obj. */
+#define DT_REL 17 /* address of rel. tbl. w addends */
+#define DT_RELSZ 18 /* size of DT_REL relocation table */
+#define DT_RELENT 19 /* size of DT_REL relocation entry */
+#define DT_PLTREL 20 /* PLT referenced relocation entry */
+#define DT_DEBUG 21 /* bugger */
+#define DT_TEXTREL 22 /* Allow rel. mod. to unwritable seg */
+#define DT_JMPREL 23 /* add. of PLT's relocation entries */
+#define DT_BIND_NOW 24 /* Bind now regardless of env setting */
+#define DT_NUM 25 /* Number used. */
+#define DT_LOPROC 0x70000000 /* reserved range for processor */
+#define DT_HIPROC 0x7fffffff /* specific dynamic array tags */
+
+/* Standard ELF hashing function */
+unsigned int elf_hash(const unsigned char *name);
+
+/*
+ * Note Definitions
+ */
+typedef struct {
+ Elf32_Word namesz;
+ Elf32_Word descsz;
+ Elf32_Word type;
+} Elf32_Note;
+
+typedef struct {
+ Elf64_Half namesz;
+ Elf64_Half descsz;
+ Elf64_Half type;
+} Elf64_Note;
+
+
+#if defined(ELFSIZE)
+#define CONCAT(x,y) __CONCAT(x,y)
+#define ELFNAME(x) CONCAT(elf,CONCAT(ELFSIZE,CONCAT(_,x)))
+#define ELFNAME2(x,y) CONCAT(x,CONCAT(_elf,CONCAT(ELFSIZE,CONCAT(_,y))))
+#define ELFNAMEEND(x) CONCAT(x,CONCAT(_elf,ELFSIZE))
+#define ELFDEFNNAME(x) CONCAT(ELF,CONCAT(ELFSIZE,CONCAT(_,x)))
+#endif
+
+#if defined(ELFSIZE) && (ELFSIZE == 32)
+#define Elf_Ehdr Elf32_Ehdr
+#define Elf_Phdr Elf32_Phdr
+#define Elf_Shdr Elf32_Shdr
+#define Elf_Sym Elf32_Sym
+#define Elf_Rel Elf32_Rel
+#define Elf_RelA Elf32_Rela
+#define Elf_Dyn Elf32_Dyn
+#define Elf_Word Elf32_Word
+#define Elf_Sword Elf32_Sword
+#define Elf_Addr Elf32_Addr
+#define Elf_Off Elf32_Off
+#define Elf_Nhdr Elf32_Nhdr
+#define Elf_Note Elf32_Note
+
+#define ELF_R_SYM ELF32_R_SYM
+#define ELF_R_TYPE ELF32_R_TYPE
+#define ELF_R_INFO ELF32_R_INFO
+#define ELFCLASS ELFCLASS32
+
+#define ELF_ST_BIND ELF32_ST_BIND
+#define ELF_ST_TYPE ELF32_ST_TYPE
+#define ELF_ST_INFO ELF32_ST_INFO
+
+#define AuxInfo Aux32Info
+#elif defined(ELFSIZE) && (ELFSIZE == 64)
+#define Elf_Ehdr Elf64_Ehdr
+#define Elf_Phdr Elf64_Phdr
+#define Elf_Shdr Elf64_Shdr
+#define Elf_Sym Elf64_Sym
+#define Elf_Rel Elf64_Rel
+#define Elf_RelA Elf64_Rela
+#define Elf_Dyn Elf64_Dyn
+#define Elf_Word Elf64_Word
+#define Elf_Sword Elf64_Sword
+#define Elf_Addr Elf64_Addr
+#define Elf_Off Elf64_Off
+#define Elf_Nhdr Elf64_Nhdr
+#define Elf_Note Elf64_Note
+
+#define ELF_R_SYM ELF64_R_SYM
+#define ELF_R_TYPE ELF64_R_TYPE
+#define ELF_R_INFO ELF64_R_INFO
+#define ELFCLASS ELFCLASS64
+
+#define ELF_ST_BIND ELF64_ST_BIND
+#define ELF_ST_TYPE ELF64_ST_TYPE
+#define ELF_ST_INFO ELF64_ST_INFO
+
+#define AuxInfo Aux64Info
+#endif
+
+#endif /* __XEN_PUBLIC_ELFSTRUCTS_H__ */
diff --git a/xen/public/event_channel.h b/xen/public/event_channel.h
new file mode 100644
index 0000000..d35cce5
--- /dev/null
+++ b/xen/public/event_channel.h
@@ -0,0 +1,264 @@
+/******************************************************************************
+ * event_channel.h
+ *
+ * Event channels between domains.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2003-2004, K A Fraser.
+ */
+
+#ifndef __XEN_PUBLIC_EVENT_CHANNEL_H__
+#define __XEN_PUBLIC_EVENT_CHANNEL_H__
+
+/*
+ * Prototype for this hypercall is:
+ * int event_channel_op(int cmd, void *args)
+ * @cmd == EVTCHNOP_??? (event-channel operation).
+ * @args == Operation-specific extra arguments (NULL if none).
+ */
+
+typedef uint32_t evtchn_port_t;
+DEFINE_XEN_GUEST_HANDLE(evtchn_port_t);
+
+/*
+ * EVTCHNOP_alloc_unbound: Allocate a port in domain <dom> and mark as
+ * accepting interdomain bindings from domain <remote_dom>. A fresh port
+ * is allocated in <dom> and returned as <port>.
+ * NOTES:
+ * 1. If the caller is unprivileged then <dom> must be DOMID_SELF.
+ * 2. <rdom> may be DOMID_SELF, allowing loopback connections.
+ */
+#define EVTCHNOP_alloc_unbound 6
+struct evtchn_alloc_unbound {
+ /* IN parameters */
+ domid_t dom, remote_dom;
+ /* OUT parameters */
+ evtchn_port_t port;
+};
+typedef struct evtchn_alloc_unbound evtchn_alloc_unbound_t;
+
+/*
+ * EVTCHNOP_bind_interdomain: Construct an interdomain event channel between
+ * the calling domain and <remote_dom>. <remote_dom,remote_port> must identify
+ * a port that is unbound and marked as accepting bindings from the calling
+ * domain. A fresh port is allocated in the calling domain and returned as
+ * <local_port>.
+ * NOTES:
+ * 2. <remote_dom> may be DOMID_SELF, allowing loopback connections.
+ */
+#define EVTCHNOP_bind_interdomain 0
+struct evtchn_bind_interdomain {
+ /* IN parameters. */
+ domid_t remote_dom;
+ evtchn_port_t remote_port;
+ /* OUT parameters. */
+ evtchn_port_t local_port;
+};
+typedef struct evtchn_bind_interdomain evtchn_bind_interdomain_t;
+
+/*
+ * EVTCHNOP_bind_virq: Bind a local event channel to VIRQ <irq> on specified
+ * vcpu.
+ * NOTES:
+ * 1. Virtual IRQs are classified as per-vcpu or global. See the VIRQ list
+ * in xen.h for the classification of each VIRQ.
+ * 2. Global VIRQs must be allocated on VCPU0 but can subsequently be
+ * re-bound via EVTCHNOP_bind_vcpu.
+ * 3. Per-vcpu VIRQs may be bound to at most one event channel per vcpu.
+ * The allocated event channel is bound to the specified vcpu and the
+ * binding cannot be changed.
+ */
+#define EVTCHNOP_bind_virq 1
+struct evtchn_bind_virq {
+ /* IN parameters. */
+ uint32_t virq;
+ uint32_t vcpu;
+ /* OUT parameters. */
+ evtchn_port_t port;
+};
+typedef struct evtchn_bind_virq evtchn_bind_virq_t;
+
+/*
+ * EVTCHNOP_bind_pirq: Bind a local event channel to PIRQ <irq>.
+ * NOTES:
+ * 1. A physical IRQ may be bound to at most one event channel per domain.
+ * 2. Only a sufficiently-privileged domain may bind to a physical IRQ.
+ */
+#define EVTCHNOP_bind_pirq 2
+struct evtchn_bind_pirq {
+ /* IN parameters. */
+ uint32_t pirq;
+#define BIND_PIRQ__WILL_SHARE 1
+ uint32_t flags; /* BIND_PIRQ__* */
+ /* OUT parameters. */
+ evtchn_port_t port;
+};
+typedef struct evtchn_bind_pirq evtchn_bind_pirq_t;
+
+/*
+ * EVTCHNOP_bind_ipi: Bind a local event channel to receive events.
+ * NOTES:
+ * 1. The allocated event channel is bound to the specified vcpu. The binding
+ * may not be changed.
+ */
+#define EVTCHNOP_bind_ipi 7
+struct evtchn_bind_ipi {
+ uint32_t vcpu;
+ /* OUT parameters. */
+ evtchn_port_t port;
+};
+typedef struct evtchn_bind_ipi evtchn_bind_ipi_t;
+
+/*
+ * EVTCHNOP_close: Close a local event channel <port>. If the channel is
+ * interdomain then the remote end is placed in the unbound state
+ * (EVTCHNSTAT_unbound), awaiting a new connection.
+ */
+#define EVTCHNOP_close 3
+struct evtchn_close {
+ /* IN parameters. */
+ evtchn_port_t port;
+};
+typedef struct evtchn_close evtchn_close_t;
+
+/*
+ * EVTCHNOP_send: Send an event to the remote end of the channel whose local
+ * endpoint is <port>.
+ */
+#define EVTCHNOP_send 4
+struct evtchn_send {
+ /* IN parameters. */
+ evtchn_port_t port;
+};
+typedef struct evtchn_send evtchn_send_t;
+
+/*
+ * EVTCHNOP_status: Get the current status of the communication channel which
+ * has an endpoint at <dom, port>.
+ * NOTES:
+ * 1. <dom> may be specified as DOMID_SELF.
+ * 2. Only a sufficiently-privileged domain may obtain the status of an event
+ * channel for which <dom> is not DOMID_SELF.
+ */
+#define EVTCHNOP_status 5
+struct evtchn_status {
+ /* IN parameters */
+ domid_t dom;
+ evtchn_port_t port;
+ /* OUT parameters */
+#define EVTCHNSTAT_closed 0 /* Channel is not in use. */
+#define EVTCHNSTAT_unbound 1 /* Channel is waiting interdom connection.*/
+#define EVTCHNSTAT_interdomain 2 /* Channel is connected to remote domain. */
+#define EVTCHNSTAT_pirq 3 /* Channel is bound to a phys IRQ line. */
+#define EVTCHNSTAT_virq 4 /* Channel is bound to a virtual IRQ line */
+#define EVTCHNSTAT_ipi 5 /* Channel is bound to a virtual IPI line */
+ uint32_t status;
+ uint32_t vcpu; /* VCPU to which this channel is bound. */
+ union {
+ struct {
+ domid_t dom;
+ } unbound; /* EVTCHNSTAT_unbound */
+ struct {
+ domid_t dom;
+ evtchn_port_t port;
+ } interdomain; /* EVTCHNSTAT_interdomain */
+ uint32_t pirq; /* EVTCHNSTAT_pirq */
+ uint32_t virq; /* EVTCHNSTAT_virq */
+ } u;
+};
+typedef struct evtchn_status evtchn_status_t;
+
+/*
+ * EVTCHNOP_bind_vcpu: Specify which vcpu a channel should notify when an
+ * event is pending.
+ * NOTES:
+ * 1. IPI-bound channels always notify the vcpu specified at bind time.
+ * This binding cannot be changed.
+ * 2. Per-VCPU VIRQ channels always notify the vcpu specified at bind time.
+ * This binding cannot be changed.
+ * 3. All other channels notify vcpu0 by default. This default is set when
+ * the channel is allocated (a port that is freed and subsequently reused
+ * has its binding reset to vcpu0).
+ */
+#define EVTCHNOP_bind_vcpu 8
+struct evtchn_bind_vcpu {
+ /* IN parameters. */
+ evtchn_port_t port;
+ uint32_t vcpu;
+};
+typedef struct evtchn_bind_vcpu evtchn_bind_vcpu_t;
+
+/*
+ * EVTCHNOP_unmask: Unmask the specified local event-channel port and deliver
+ * a notification to the appropriate VCPU if an event is pending.
+ */
+#define EVTCHNOP_unmask 9
+struct evtchn_unmask {
+ /* IN parameters. */
+ evtchn_port_t port;
+};
+typedef struct evtchn_unmask evtchn_unmask_t;
+
+/*
+ * EVTCHNOP_reset: Close all event channels associated with specified domain.
+ * NOTES:
+ * 1. <dom> may be specified as DOMID_SELF.
+ * 2. Only a sufficiently-privileged domain may specify other than DOMID_SELF.
+ */
+#define EVTCHNOP_reset 10
+struct evtchn_reset {
+ /* IN parameters. */
+ domid_t dom;
+};
+typedef struct evtchn_reset evtchn_reset_t;
+
+/*
+ * Argument to event_channel_op_compat() hypercall. Superceded by new
+ * event_channel_op() hypercall since 0x00030202.
+ */
+struct evtchn_op {
+ uint32_t cmd; /* EVTCHNOP_* */
+ union {
+ struct evtchn_alloc_unbound alloc_unbound;
+ struct evtchn_bind_interdomain bind_interdomain;
+ struct evtchn_bind_virq bind_virq;
+ struct evtchn_bind_pirq bind_pirq;
+ struct evtchn_bind_ipi bind_ipi;
+ struct evtchn_close close;
+ struct evtchn_send send;
+ struct evtchn_status status;
+ struct evtchn_bind_vcpu bind_vcpu;
+ struct evtchn_unmask unmask;
+ } u;
+};
+typedef struct evtchn_op evtchn_op_t;
+DEFINE_XEN_GUEST_HANDLE(evtchn_op_t);
+
+#endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/features.h b/xen/public/features.h
new file mode 100644
index 0000000..879131c
--- /dev/null
+++ b/xen/public/features.h
@@ -0,0 +1,83 @@
+/******************************************************************************
+ * features.h
+ *
+ * Feature flags, reported by XENVER_get_features.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2006, Keir Fraser <keir@xensource.com>
+ */
+
+#ifndef __XEN_PUBLIC_FEATURES_H__
+#define __XEN_PUBLIC_FEATURES_H__
+
+/*
+ * If set, the guest does not need to write-protect its pagetables, and can
+ * update them via direct writes.
+ */
+#define XENFEAT_writable_page_tables 0
+
+/*
+ * If set, the guest does not need to write-protect its segment descriptor
+ * tables, and can update them via direct writes.
+ */
+#define XENFEAT_writable_descriptor_tables 1
+
+/*
+ * If set, translation between the guest's 'pseudo-physical' address space
+ * and the host's machine address space are handled by the hypervisor. In this
+ * mode the guest does not need to perform phys-to/from-machine translations
+ * when performing page table operations.
+ */
+#define XENFEAT_auto_translated_physmap 2
+
+/* If set, the guest is running in supervisor mode (e.g., x86 ring 0). */
+#define XENFEAT_supervisor_mode_kernel 3
+
+/*
+ * If set, the guest does not need to allocate x86 PAE page directories
+ * below 4GB. This flag is usually implied by auto_translated_physmap.
+ */
+#define XENFEAT_pae_pgdir_above_4gb 4
+
+/* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */
+#define XENFEAT_mmu_pt_update_preserve_ad 5
+
+/* x86: Does this Xen host support the MMU_{CLEAR,COPY}_PAGE hypercall? */
+#define XENFEAT_highmem_assist 6
+
+/*
+ * If set, GNTTABOP_map_grant_ref honors flags to be placed into guest kernel
+ * available pte bits.
+ */
+#define XENFEAT_gnttab_map_avail_bits 7
+
+#define XENFEAT_NR_SUBMAPS 1
+
+#endif /* __XEN_PUBLIC_FEATURES_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/grant_table.h b/xen/public/grant_table.h
new file mode 100644
index 0000000..ad116e7
--- /dev/null
+++ b/xen/public/grant_table.h
@@ -0,0 +1,438 @@
+/******************************************************************************
+ * grant_table.h
+ *
+ * Interface for granting foreign access to page frames, and receiving
+ * page-ownership transfers.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004, K A Fraser
+ */
+
+#ifndef __XEN_PUBLIC_GRANT_TABLE_H__
+#define __XEN_PUBLIC_GRANT_TABLE_H__
+
+
+/***********************************
+ * GRANT TABLE REPRESENTATION
+ */
+
+/* Some rough guidelines on accessing and updating grant-table entries
+ * in a concurrency-safe manner. For more information, Linux contains a
+ * reference implementation for guest OSes (arch/xen/kernel/grant_table.c).
+ *
+ * NB. WMB is a no-op on current-generation x86 processors. However, a
+ * compiler barrier will still be required.
+ *
+ * Introducing a valid entry into the grant table:
+ * 1. Write ent->domid.
+ * 2. Write ent->frame:
+ * GTF_permit_access: Frame to which access is permitted.
+ * GTF_accept_transfer: Pseudo-phys frame slot being filled by new
+ * frame, or zero if none.
+ * 3. Write memory barrier (WMB).
+ * 4. Write ent->flags, inc. valid type.
+ *
+ * Invalidating an unused GTF_permit_access entry:
+ * 1. flags = ent->flags.
+ * 2. Observe that !(flags & (GTF_reading|GTF_writing)).
+ * 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0).
+ * NB. No need for WMB as reuse of entry is control-dependent on success of
+ * step 3, and all architectures guarantee ordering of ctrl-dep writes.
+ *
+ * Invalidating an in-use GTF_permit_access entry:
+ * This cannot be done directly. Request assistance from the domain controller
+ * which can set a timeout on the use of a grant entry and take necessary
+ * action. (NB. This is not yet implemented!).
+ *
+ * Invalidating an unused GTF_accept_transfer entry:
+ * 1. flags = ent->flags.
+ * 2. Observe that !(flags & GTF_transfer_committed). [*]
+ * 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0).
+ * NB. No need for WMB as reuse of entry is control-dependent on success of
+ * step 3, and all architectures guarantee ordering of ctrl-dep writes.
+ * [*] If GTF_transfer_committed is set then the grant entry is 'committed'.
+ * The guest must /not/ modify the grant entry until the address of the
+ * transferred frame is written. It is safe for the guest to spin waiting
+ * for this to occur (detect by observing GTF_transfer_completed in
+ * ent->flags).
+ *
+ * Invalidating a committed GTF_accept_transfer entry:
+ * 1. Wait for (ent->flags & GTF_transfer_completed).
+ *
+ * Changing a GTF_permit_access from writable to read-only:
+ * Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing.
+ *
+ * Changing a GTF_permit_access from read-only to writable:
+ * Use SMP-safe bit-setting instruction.
+ */
+
+/*
+ * A grant table comprises a packed array of grant entries in one or more
+ * page frames shared between Xen and a guest.
+ * [XEN]: This field is written by Xen and read by the sharing guest.
+ * [GST]: This field is written by the guest and read by Xen.
+ */
+struct grant_entry {
+ /* GTF_xxx: various type and flag information. [XEN,GST] */
+ uint16_t flags;
+ /* The domain being granted foreign privileges. [GST] */
+ domid_t domid;
+ /*
+ * GTF_permit_access: Frame that @domid is allowed to map and access. [GST]
+ * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN]
+ */
+ uint32_t frame;
+};
+typedef struct grant_entry grant_entry_t;
+
+/*
+ * Type of grant entry.
+ * GTF_invalid: This grant entry grants no privileges.
+ * GTF_permit_access: Allow @domid to map/access @frame.
+ * GTF_accept_transfer: Allow @domid to transfer ownership of one page frame
+ * to this guest. Xen writes the page number to @frame.
+ */
+#define GTF_invalid (0U<<0)
+#define GTF_permit_access (1U<<0)
+#define GTF_accept_transfer (2U<<0)
+#define GTF_type_mask (3U<<0)
+
+/*
+ * Subflags for GTF_permit_access.
+ * GTF_readonly: Restrict @domid to read-only mappings and accesses. [GST]
+ * GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN]
+ * GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN]
+ * GTF_PAT, GTF_PWT, GTF_PCD: (x86) cache attribute flags for the grant [GST]
+ */
+#define _GTF_readonly (2)
+#define GTF_readonly (1U<<_GTF_readonly)
+#define _GTF_reading (3)
+#define GTF_reading (1U<<_GTF_reading)
+#define _GTF_writing (4)
+#define GTF_writing (1U<<_GTF_writing)
+#define _GTF_PWT (5)
+#define GTF_PWT (1U<<_GTF_PWT)
+#define _GTF_PCD (6)
+#define GTF_PCD (1U<<_GTF_PCD)
+#define _GTF_PAT (7)
+#define GTF_PAT (1U<<_GTF_PAT)
+
+/*
+ * Subflags for GTF_accept_transfer:
+ * GTF_transfer_committed: Xen sets this flag to indicate that it is committed
+ * to transferring ownership of a page frame. When a guest sees this flag
+ * it must /not/ modify the grant entry until GTF_transfer_completed is
+ * set by Xen.
+ * GTF_transfer_completed: It is safe for the guest to spin-wait on this flag
+ * after reading GTF_transfer_committed. Xen will always write the frame
+ * address, followed by ORing this flag, in a timely manner.
+ */
+#define _GTF_transfer_committed (2)
+#define GTF_transfer_committed (1U<<_GTF_transfer_committed)
+#define _GTF_transfer_completed (3)
+#define GTF_transfer_completed (1U<<_GTF_transfer_completed)
+
+
+/***********************************
+ * GRANT TABLE QUERIES AND USES
+ */
+
+/*
+ * Reference to a grant entry in a specified domain's grant table.
+ */
+typedef uint32_t grant_ref_t;
+
+/*
+ * Handle to track a mapping created via a grant reference.
+ */
+typedef uint32_t grant_handle_t;
+
+/*
+ * GNTTABOP_map_grant_ref: Map the grant entry (<dom>,<ref>) for access
+ * by devices and/or host CPUs. If successful, <handle> is a tracking number
+ * that must be presented later to destroy the mapping(s). On error, <handle>
+ * is a negative status code.
+ * NOTES:
+ * 1. If GNTMAP_device_map is specified then <dev_bus_addr> is the address
+ * via which I/O devices may access the granted frame.
+ * 2. If GNTMAP_host_map is specified then a mapping will be added at
+ * either a host virtual address in the current address space, or at
+ * a PTE at the specified machine address. The type of mapping to
+ * perform is selected through the GNTMAP_contains_pte flag, and the
+ * address is specified in <host_addr>.
+ * 3. Mappings should only be destroyed via GNTTABOP_unmap_grant_ref. If a
+ * host mapping is destroyed by other means then it is *NOT* guaranteed
+ * to be accounted to the correct grant reference!
+ */
+#define GNTTABOP_map_grant_ref 0
+struct gnttab_map_grant_ref {
+ /* IN parameters. */
+ uint64_t host_addr;
+ uint32_t flags; /* GNTMAP_* */
+ grant_ref_t ref;
+ domid_t dom;
+ /* OUT parameters. */
+ int16_t status; /* GNTST_* */
+ grant_handle_t handle;
+ uint64_t dev_bus_addr;
+};
+typedef struct gnttab_map_grant_ref gnttab_map_grant_ref_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_map_grant_ref_t);
+
+/*
+ * GNTTABOP_unmap_grant_ref: Destroy one or more grant-reference mappings
+ * tracked by <handle>. If <host_addr> or <dev_bus_addr> is zero, that
+ * field is ignored. If non-zero, they must refer to a device/host mapping
+ * that is tracked by <handle>
+ * NOTES:
+ * 1. The call may fail in an undefined manner if either mapping is not
+ * tracked by <handle>.
+ * 3. After executing a batch of unmaps, it is guaranteed that no stale
+ * mappings will remain in the device or host TLBs.
+ */
+#define GNTTABOP_unmap_grant_ref 1
+struct gnttab_unmap_grant_ref {
+ /* IN parameters. */
+ uint64_t host_addr;
+ uint64_t dev_bus_addr;
+ grant_handle_t handle;
+ /* OUT parameters. */
+ int16_t status; /* GNTST_* */
+};
+typedef struct gnttab_unmap_grant_ref gnttab_unmap_grant_ref_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t);
+
+/*
+ * GNTTABOP_setup_table: Set up a grant table for <dom> comprising at least
+ * <nr_frames> pages. The frame addresses are written to the <frame_list>.
+ * Only <nr_frames> addresses are written, even if the table is larger.
+ * NOTES:
+ * 1. <dom> may be specified as DOMID_SELF.
+ * 2. Only a sufficiently-privileged domain may specify <dom> != DOMID_SELF.
+ * 3. Xen may not support more than a single grant-table page per domain.
+ */
+#define GNTTABOP_setup_table 2
+struct gnttab_setup_table {
+ /* IN parameters. */
+ domid_t dom;
+ uint32_t nr_frames;
+ /* OUT parameters. */
+ int16_t status; /* GNTST_* */
+ XEN_GUEST_HANDLE(ulong) frame_list;
+};
+typedef struct gnttab_setup_table gnttab_setup_table_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_setup_table_t);
+
+/*
+ * GNTTABOP_dump_table: Dump the contents of the grant table to the
+ * xen console. Debugging use only.
+ */
+#define GNTTABOP_dump_table 3
+struct gnttab_dump_table {
+ /* IN parameters. */
+ domid_t dom;
+ /* OUT parameters. */
+ int16_t status; /* GNTST_* */
+};
+typedef struct gnttab_dump_table gnttab_dump_table_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_dump_table_t);
+
+/*
+ * GNTTABOP_transfer_grant_ref: Transfer <frame> to a foreign domain. The
+ * foreign domain has previously registered its interest in the transfer via
+ * <domid, ref>.
+ *
+ * Note that, even if the transfer fails, the specified page no longer belongs
+ * to the calling domain *unless* the error is GNTST_bad_page.
+ */
+#define GNTTABOP_transfer 4
+struct gnttab_transfer {
+ /* IN parameters. */
+ xen_pfn_t mfn;
+ domid_t domid;
+ grant_ref_t ref;
+ /* OUT parameters. */
+ int16_t status;
+};
+typedef struct gnttab_transfer gnttab_transfer_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_transfer_t);
+
+
+/*
+ * GNTTABOP_copy: Hypervisor based copy
+ * source and destinations can be eithers MFNs or, for foreign domains,
+ * grant references. the foreign domain has to grant read/write access
+ * in its grant table.
+ *
+ * The flags specify what type source and destinations are (either MFN
+ * or grant reference).
+ *
+ * Note that this can also be used to copy data between two domains
+ * via a third party if the source and destination domains had previously
+ * grant appropriate access to their pages to the third party.
+ *
+ * source_offset specifies an offset in the source frame, dest_offset
+ * the offset in the target frame and len specifies the number of
+ * bytes to be copied.
+ */
+
+#define _GNTCOPY_source_gref (0)
+#define GNTCOPY_source_gref (1<<_GNTCOPY_source_gref)
+#define _GNTCOPY_dest_gref (1)
+#define GNTCOPY_dest_gref (1<<_GNTCOPY_dest_gref)
+
+#define GNTTABOP_copy 5
+typedef struct gnttab_copy {
+ /* IN parameters. */
+ struct {
+ union {
+ grant_ref_t ref;
+ xen_pfn_t gmfn;
+ } u;
+ domid_t domid;
+ uint16_t offset;
+ } source, dest;
+ uint16_t len;
+ uint16_t flags; /* GNTCOPY_* */
+ /* OUT parameters. */
+ int16_t status;
+} gnttab_copy_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_copy_t);
+
+/*
+ * GNTTABOP_query_size: Query the current and maximum sizes of the shared
+ * grant table.
+ * NOTES:
+ * 1. <dom> may be specified as DOMID_SELF.
+ * 2. Only a sufficiently-privileged domain may specify <dom> != DOMID_SELF.
+ */
+#define GNTTABOP_query_size 6
+struct gnttab_query_size {
+ /* IN parameters. */
+ domid_t dom;
+ /* OUT parameters. */
+ uint32_t nr_frames;
+ uint32_t max_nr_frames;
+ int16_t status; /* GNTST_* */
+};
+typedef struct gnttab_query_size gnttab_query_size_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_query_size_t);
+
+/*
+ * GNTTABOP_unmap_and_replace: Destroy one or more grant-reference mappings
+ * tracked by <handle> but atomically replace the page table entry with one
+ * pointing to the machine address under <new_addr>. <new_addr> will be
+ * redirected to the null entry.
+ * NOTES:
+ * 1. The call may fail in an undefined manner if either mapping is not
+ * tracked by <handle>.
+ * 2. After executing a batch of unmaps, it is guaranteed that no stale
+ * mappings will remain in the device or host TLBs.
+ */
+#define GNTTABOP_unmap_and_replace 7
+struct gnttab_unmap_and_replace {
+ /* IN parameters. */
+ uint64_t host_addr;
+ uint64_t new_addr;
+ grant_handle_t handle;
+ /* OUT parameters. */
+ int16_t status; /* GNTST_* */
+};
+typedef struct gnttab_unmap_and_replace gnttab_unmap_and_replace_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_and_replace_t);
+
+
+/*
+ * Bitfield values for gnttab_map_grant_ref.flags.
+ */
+ /* Map the grant entry for access by I/O devices. */
+#define _GNTMAP_device_map (0)
+#define GNTMAP_device_map (1<<_GNTMAP_device_map)
+ /* Map the grant entry for access by host CPUs. */
+#define _GNTMAP_host_map (1)
+#define GNTMAP_host_map (1<<_GNTMAP_host_map)
+ /* Accesses to the granted frame will be restricted to read-only access. */
+#define _GNTMAP_readonly (2)
+#define GNTMAP_readonly (1<<_GNTMAP_readonly)
+ /*
+ * GNTMAP_host_map subflag:
+ * 0 => The host mapping is usable only by the guest OS.
+ * 1 => The host mapping is usable by guest OS + current application.
+ */
+#define _GNTMAP_application_map (3)
+#define GNTMAP_application_map (1<<_GNTMAP_application_map)
+
+ /*
+ * GNTMAP_contains_pte subflag:
+ * 0 => This map request contains a host virtual address.
+ * 1 => This map request contains the machine addess of the PTE to update.
+ */
+#define _GNTMAP_contains_pte (4)
+#define GNTMAP_contains_pte (1<<_GNTMAP_contains_pte)
+
+/*
+ * Bits to be placed in guest kernel available PTE bits (architecture
+ * dependent; only supported when XENFEAT_gnttab_map_avail_bits is set).
+ */
+#define _GNTMAP_guest_avail0 (16)
+#define GNTMAP_guest_avail_mask ((uint32_t)~0 << _GNTMAP_guest_avail0)
+
+/*
+ * Values for error status returns. All errors are -ve.
+ */
+#define GNTST_okay (0) /* Normal return. */
+#define GNTST_general_error (-1) /* General undefined error. */
+#define GNTST_bad_domain (-2) /* Unrecognsed domain id. */
+#define GNTST_bad_gntref (-3) /* Unrecognised or inappropriate gntref. */
+#define GNTST_bad_handle (-4) /* Unrecognised or inappropriate handle. */
+#define GNTST_bad_virt_addr (-5) /* Inappropriate virtual address to map. */
+#define GNTST_bad_dev_addr (-6) /* Inappropriate device address to unmap.*/
+#define GNTST_no_device_space (-7) /* Out of space in I/O MMU. */
+#define GNTST_permission_denied (-8) /* Not enough privilege for operation. */
+#define GNTST_bad_page (-9) /* Specified page was invalid for op. */
+#define GNTST_bad_copy_arg (-10) /* copy arguments cross page boundary. */
+#define GNTST_address_too_big (-11) /* transfer page address too large. */
+
+#define GNTTABOP_error_msgs { \
+ "okay", \
+ "undefined error", \
+ "unrecognised domain id", \
+ "invalid grant reference", \
+ "invalid mapping handle", \
+ "invalid virtual address", \
+ "invalid device address", \
+ "no spare translation slot in the I/O MMU", \
+ "permission denied", \
+ "bad page", \
+ "copy arguments cross page boundary", \
+ "page address size too large" \
+}
+
+#endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/blkif.h b/xen/public/io/blkif.h
new file mode 100644
index 0000000..2380066
--- /dev/null
+++ b/xen/public/io/blkif.h
@@ -0,0 +1,141 @@
+/******************************************************************************
+ * blkif.h
+ *
+ * Unified block-device I/O interface for Xen guest OSes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2003-2004, Keir Fraser
+ */
+
+#ifndef __XEN_PUBLIC_IO_BLKIF_H__
+#define __XEN_PUBLIC_IO_BLKIF_H__
+
+#include "ring.h"
+#include "../grant_table.h"
+
+/*
+ * Front->back notifications: When enqueuing a new request, sending a
+ * notification can be made conditional on req_event (i.e., the generic
+ * hold-off mechanism provided by the ring macros). Backends must set
+ * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()).
+ *
+ * Back->front notifications: When enqueuing a new response, sending a
+ * notification can be made conditional on rsp_event (i.e., the generic
+ * hold-off mechanism provided by the ring macros). Frontends must set
+ * rsp_event appropriately (e.g., using RING_FINAL_CHECK_FOR_RESPONSES()).
+ */
+
+#ifndef blkif_vdev_t
+#define blkif_vdev_t uint16_t
+#endif
+#define blkif_sector_t uint64_t
+
+/*
+ * REQUEST CODES.
+ */
+#define BLKIF_OP_READ 0
+#define BLKIF_OP_WRITE 1
+/*
+ * Recognised only if "feature-barrier" is present in backend xenbus info.
+ * The "feature-barrier" node contains a boolean indicating whether barrier
+ * requests are likely to succeed or fail. Either way, a barrier request
+ * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
+ * the underlying block-device hardware. The boolean simply indicates whether
+ * or not it is worthwhile for the frontend to attempt barrier requests.
+ * If a backend does not recognise BLKIF_OP_WRITE_BARRIER, it should *not*
+ * create the "feature-barrier" node!
+ */
+#define BLKIF_OP_WRITE_BARRIER 2
+/*
+ * Recognised if "feature-flush-cache" is present in backend xenbus
+ * info. A flush will ask the underlying storage hardware to flush its
+ * non-volatile caches as appropriate. The "feature-flush-cache" node
+ * contains a boolean indicating whether flush requests are likely to
+ * succeed or fail. Either way, a flush request may fail at any time
+ * with BLKIF_RSP_EOPNOTSUPP if it is unsupported by the underlying
+ * block-device hardware. The boolean simply indicates whether or not it
+ * is worthwhile for the frontend to attempt flushes. If a backend does
+ * not recognise BLKIF_OP_WRITE_FLUSH_CACHE, it should *not* create the
+ * "feature-flush-cache" node!
+ */
+#define BLKIF_OP_FLUSH_DISKCACHE 3
+
+/*
+ * Maximum scatter/gather segments per request.
+ * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
+ * NB. This could be 12 if the ring indexes weren't stored in the same page.
+ */
+#define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
+
+struct blkif_request_segment {
+ grant_ref_t gref; /* reference to I/O buffer frame */
+ /* @first_sect: first sector in frame to transfer (inclusive). */
+ /* @last_sect: last sector in frame to transfer (inclusive). */
+ uint8_t first_sect, last_sect;
+};
+
+struct blkif_request {
+ uint8_t operation; /* BLKIF_OP_??? */
+ uint8_t nr_segments; /* number of segments */
+ blkif_vdev_t handle; /* only for read/write requests */
+ uint64_t id; /* private guest value, echoed in resp */
+ blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */
+ struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+};
+typedef struct blkif_request blkif_request_t;
+
+struct blkif_response {
+ uint64_t id; /* copied from request */
+ uint8_t operation; /* copied from request */
+ int16_t status; /* BLKIF_RSP_??? */
+};
+typedef struct blkif_response blkif_response_t;
+
+/*
+ * STATUS RETURN CODES.
+ */
+ /* Operation not supported (only happens on barrier writes). */
+#define BLKIF_RSP_EOPNOTSUPP -2
+ /* Operation failed for some unspecified reason (-EIO). */
+#define BLKIF_RSP_ERROR -1
+ /* Operation completed successfully. */
+#define BLKIF_RSP_OKAY 0
+
+/*
+ * Generate blkif ring structures and types.
+ */
+
+DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response);
+
+#define VDISK_CDROM 0x1
+#define VDISK_REMOVABLE 0x2
+#define VDISK_READONLY 0x4
+
+#endif /* __XEN_PUBLIC_IO_BLKIF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/console.h b/xen/public/io/console.h
new file mode 100644
index 0000000..4b8c01a
--- /dev/null
+++ b/xen/public/io/console.h
@@ -0,0 +1,51 @@
+/******************************************************************************
+ * console.h
+ *
+ * Console I/O interface for Xen guest OSes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, Keir Fraser
+ */
+
+#ifndef __XEN_PUBLIC_IO_CONSOLE_H__
+#define __XEN_PUBLIC_IO_CONSOLE_H__
+
+typedef uint32_t XENCONS_RING_IDX;
+
+#define MASK_XENCONS_IDX(idx, ring) ((idx) & (sizeof(ring)-1))
+
+struct xencons_interface {
+ char in[1024];
+ char out[2048];
+ XENCONS_RING_IDX in_cons, in_prod;
+ XENCONS_RING_IDX out_cons, out_prod;
+};
+
+#endif /* __XEN_PUBLIC_IO_CONSOLE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/fbif.h b/xen/public/io/fbif.h
new file mode 100644
index 0000000..95377a0
--- /dev/null
+++ b/xen/public/io/fbif.h
@@ -0,0 +1,176 @@
+/*
+ * fbif.h -- Xen virtual frame buffer device
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2005 Anthony Liguori <aliguori@us.ibm.com>
+ * Copyright (C) 2006 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ */
+
+#ifndef __XEN_PUBLIC_IO_FBIF_H__
+#define __XEN_PUBLIC_IO_FBIF_H__
+
+/* Out events (frontend -> backend) */
+
+/*
+ * Out events may be sent only when requested by backend, and receipt
+ * of an unknown out event is an error.
+ */
+
+/* Event type 1 currently not used */
+/*
+ * Framebuffer update notification event
+ * Capable frontend sets feature-update in xenstore.
+ * Backend requests it by setting request-update in xenstore.
+ */
+#define XENFB_TYPE_UPDATE 2
+
+struct xenfb_update
+{
+ uint8_t type; /* XENFB_TYPE_UPDATE */
+ int32_t x; /* source x */
+ int32_t y; /* source y */
+ int32_t width; /* rect width */
+ int32_t height; /* rect height */
+};
+
+/*
+ * Framebuffer resize notification event
+ * Capable backend sets feature-resize in xenstore.
+ */
+#define XENFB_TYPE_RESIZE 3
+
+struct xenfb_resize
+{
+ uint8_t type; /* XENFB_TYPE_RESIZE */
+ int32_t width; /* width in pixels */
+ int32_t height; /* height in pixels */
+ int32_t stride; /* stride in bytes */
+ int32_t depth; /* depth in bits */
+ int32_t offset; /* offset of the framebuffer in bytes */
+};
+
+#define XENFB_OUT_EVENT_SIZE 40
+
+union xenfb_out_event
+{
+ uint8_t type;
+ struct xenfb_update update;
+ struct xenfb_resize resize;
+ char pad[XENFB_OUT_EVENT_SIZE];
+};
+
+/* In events (backend -> frontend) */
+
+/*
+ * Frontends should ignore unknown in events.
+ */
+
+/*
+ * Framebuffer refresh period advice
+ * Backend sends it to advise the frontend their preferred period of
+ * refresh. Frontends that keep the framebuffer constantly up-to-date
+ * just ignore it. Frontends that use the advice should immediately
+ * refresh the framebuffer (and send an update notification event if
+ * those have been requested), then use the update frequency to guide
+ * their periodical refreshs.
+ */
+#define XENFB_TYPE_REFRESH_PERIOD 1
+#define XENFB_NO_REFRESH 0
+
+struct xenfb_refresh_period
+{
+ uint8_t type; /* XENFB_TYPE_UPDATE_PERIOD */
+ uint32_t period; /* period of refresh, in ms,
+ * XENFB_NO_REFRESH if no refresh is needed */
+};
+
+#define XENFB_IN_EVENT_SIZE 40
+
+union xenfb_in_event
+{
+ uint8_t type;
+ struct xenfb_refresh_period refresh_period;
+ char pad[XENFB_IN_EVENT_SIZE];
+};
+
+/* shared page */
+
+#define XENFB_IN_RING_SIZE 1024
+#define XENFB_IN_RING_LEN (XENFB_IN_RING_SIZE / XENFB_IN_EVENT_SIZE)
+#define XENFB_IN_RING_OFFS 1024
+#define XENFB_IN_RING(page) \
+ ((union xenfb_in_event *)((char *)(page) + XENFB_IN_RING_OFFS))
+#define XENFB_IN_RING_REF(page, idx) \
+ (XENFB_IN_RING((page))[(idx) % XENFB_IN_RING_LEN])
+
+#define XENFB_OUT_RING_SIZE 2048
+#define XENFB_OUT_RING_LEN (XENFB_OUT_RING_SIZE / XENFB_OUT_EVENT_SIZE)
+#define XENFB_OUT_RING_OFFS (XENFB_IN_RING_OFFS + XENFB_IN_RING_SIZE)
+#define XENFB_OUT_RING(page) \
+ ((union xenfb_out_event *)((char *)(page) + XENFB_OUT_RING_OFFS))
+#define XENFB_OUT_RING_REF(page, idx) \
+ (XENFB_OUT_RING((page))[(idx) % XENFB_OUT_RING_LEN])
+
+struct xenfb_page
+{
+ uint32_t in_cons, in_prod;
+ uint32_t out_cons, out_prod;
+
+ int32_t width; /* the width of the framebuffer (in pixels) */
+ int32_t height; /* the height of the framebuffer (in pixels) */
+ uint32_t line_length; /* the length of a row of pixels (in bytes) */
+ uint32_t mem_length; /* the length of the framebuffer (in bytes) */
+ uint8_t depth; /* the depth of a pixel (in bits) */
+
+ /*
+ * Framebuffer page directory
+ *
+ * Each directory page holds PAGE_SIZE / sizeof(*pd)
+ * framebuffer pages, and can thus map up to PAGE_SIZE *
+ * PAGE_SIZE / sizeof(*pd) bytes. With PAGE_SIZE == 4096 and
+ * sizeof(unsigned long) == 4/8, that's 4 Megs 32 bit and 2 Megs
+ * 64 bit. 256 directories give enough room for a 512 Meg
+ * framebuffer with a max resolution of 12,800x10,240. Should
+ * be enough for a while with room leftover for expansion.
+ */
+ unsigned long pd[256];
+};
+
+/*
+ * Wart: xenkbd needs to know default resolution. Put it here until a
+ * better solution is found, but don't leak it to the backend.
+ */
+#ifdef __KERNEL__
+#define XENFB_WIDTH 800
+#define XENFB_HEIGHT 600
+#define XENFB_DEPTH 32
+#endif
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/fsif.h b/xen/public/io/fsif.h
new file mode 100644
index 0000000..04ef928
--- /dev/null
+++ b/xen/public/io/fsif.h
@@ -0,0 +1,191 @@
+/******************************************************************************
+ * fsif.h
+ *
+ * Interface to FS level split device drivers.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2007, Grzegorz Milos, <gm281@cam.ac.uk>.
+ */
+
+#ifndef __XEN_PUBLIC_IO_FSIF_H__
+#define __XEN_PUBLIC_IO_FSIF_H__
+
+#include "ring.h"
+#include "../grant_table.h"
+
+#define REQ_FILE_OPEN 1
+#define REQ_FILE_CLOSE 2
+#define REQ_FILE_READ 3
+#define REQ_FILE_WRITE 4
+#define REQ_STAT 5
+#define REQ_FILE_TRUNCATE 6
+#define REQ_REMOVE 7
+#define REQ_RENAME 8
+#define REQ_CREATE 9
+#define REQ_DIR_LIST 10
+#define REQ_CHMOD 11
+#define REQ_FS_SPACE 12
+#define REQ_FILE_SYNC 13
+
+struct fsif_open_request {
+ grant_ref_t gref;
+};
+
+struct fsif_close_request {
+ uint32_t fd;
+};
+
+struct fsif_read_request {
+ uint32_t fd;
+ int32_t pad;
+ uint64_t len;
+ uint64_t offset;
+ grant_ref_t grefs[1]; /* Variable length */
+};
+
+struct fsif_write_request {
+ uint32_t fd;
+ int32_t pad;
+ uint64_t len;
+ uint64_t offset;
+ grant_ref_t grefs[1]; /* Variable length */
+};
+
+struct fsif_stat_request {
+ uint32_t fd;
+};
+
+/* This structure is a copy of some fields from stat structure, returned
+ * via the ring. */
+struct fsif_stat_response {
+ int32_t stat_mode;
+ uint32_t stat_uid;
+ uint32_t stat_gid;
+ int32_t stat_ret;
+ int64_t stat_size;
+ int64_t stat_atime;
+ int64_t stat_mtime;
+ int64_t stat_ctime;
+};
+
+struct fsif_truncate_request {
+ uint32_t fd;
+ int32_t pad;
+ int64_t length;
+};
+
+struct fsif_remove_request {
+ grant_ref_t gref;
+};
+
+struct fsif_rename_request {
+ uint16_t old_name_offset;
+ uint16_t new_name_offset;
+ grant_ref_t gref;
+};
+
+struct fsif_create_request {
+ int8_t directory;
+ int8_t pad;
+ int16_t pad2;
+ int32_t mode;
+ grant_ref_t gref;
+};
+
+struct fsif_list_request {
+ uint32_t offset;
+ grant_ref_t gref;
+};
+
+#define NR_FILES_SHIFT 0
+#define NR_FILES_SIZE 16 /* 16 bits for the number of files mask */
+#define NR_FILES_MASK (((1ULL << NR_FILES_SIZE) - 1) << NR_FILES_SHIFT)
+#define ERROR_SIZE 32 /* 32 bits for the error mask */
+#define ERROR_SHIFT (NR_FILES_SIZE + NR_FILES_SHIFT)
+#define ERROR_MASK (((1ULL << ERROR_SIZE) - 1) << ERROR_SHIFT)
+#define HAS_MORE_SHIFT (ERROR_SHIFT + ERROR_SIZE)
+#define HAS_MORE_FLAG (1ULL << HAS_MORE_SHIFT)
+
+struct fsif_chmod_request {
+ uint32_t fd;
+ int32_t mode;
+};
+
+struct fsif_space_request {
+ grant_ref_t gref;
+};
+
+struct fsif_sync_request {
+ uint32_t fd;
+};
+
+
+/* FS operation request */
+struct fsif_request {
+ uint8_t type; /* Type of the request */
+ uint8_t pad;
+ uint16_t id; /* Request ID, copied to the response */
+ uint32_t pad2;
+ union {
+ struct fsif_open_request fopen;
+ struct fsif_close_request fclose;
+ struct fsif_read_request fread;
+ struct fsif_write_request fwrite;
+ struct fsif_stat_request fstat;
+ struct fsif_truncate_request ftruncate;
+ struct fsif_remove_request fremove;
+ struct fsif_rename_request frename;
+ struct fsif_create_request fcreate;
+ struct fsif_list_request flist;
+ struct fsif_chmod_request fchmod;
+ struct fsif_space_request fspace;
+ struct fsif_sync_request fsync;
+ } u;
+};
+typedef struct fsif_request fsif_request_t;
+
+/* FS operation response */
+struct fsif_response {
+ uint16_t id;
+ uint16_t pad1;
+ uint32_t pad2;
+ union {
+ uint64_t ret_val;
+ struct fsif_stat_response fstat;
+ };
+};
+
+typedef struct fsif_response fsif_response_t;
+
+#define FSIF_RING_ENTRY_SIZE 64
+
+#define FSIF_NR_READ_GNTS ((FSIF_RING_ENTRY_SIZE - sizeof(struct fsif_read_request)) / \
+ sizeof(grant_ref_t) + 1)
+#define FSIF_NR_WRITE_GNTS ((FSIF_RING_ENTRY_SIZE - sizeof(struct fsif_write_request)) / \
+ sizeof(grant_ref_t) + 1)
+
+DEFINE_RING_TYPES(fsif, struct fsif_request, struct fsif_response);
+
+#define STATE_INITIALISED "init"
+#define STATE_READY "ready"
+
+
+
+#endif
diff --git a/xen/public/io/kbdif.h b/xen/public/io/kbdif.h
new file mode 100644
index 0000000..e1d66a5
--- /dev/null
+++ b/xen/public/io/kbdif.h
@@ -0,0 +1,132 @@
+/*
+ * kbdif.h -- Xen virtual keyboard/mouse
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2005 Anthony Liguori <aliguori@us.ibm.com>
+ * Copyright (C) 2006 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ */
+
+#ifndef __XEN_PUBLIC_IO_KBDIF_H__
+#define __XEN_PUBLIC_IO_KBDIF_H__
+
+/* In events (backend -> frontend) */
+
+/*
+ * Frontends should ignore unknown in events.
+ */
+
+/* Pointer movement event */
+#define XENKBD_TYPE_MOTION 1
+/* Event type 2 currently not used */
+/* Key event (includes pointer buttons) */
+#define XENKBD_TYPE_KEY 3
+/*
+ * Pointer position event
+ * Capable backend sets feature-abs-pointer in xenstore.
+ * Frontend requests ot instead of XENKBD_TYPE_MOTION by setting
+ * request-abs-update in xenstore.
+ */
+#define XENKBD_TYPE_POS 4
+
+struct xenkbd_motion
+{
+ uint8_t type; /* XENKBD_TYPE_MOTION */
+ int32_t rel_x; /* relative X motion */
+ int32_t rel_y; /* relative Y motion */
+ int32_t rel_z; /* relative Z motion (wheel) */
+};
+
+struct xenkbd_key
+{
+ uint8_t type; /* XENKBD_TYPE_KEY */
+ uint8_t pressed; /* 1 if pressed; 0 otherwise */
+ uint32_t keycode; /* KEY_* from linux/input.h */
+};
+
+struct xenkbd_position
+{
+ uint8_t type; /* XENKBD_TYPE_POS */
+ int32_t abs_x; /* absolute X position (in FB pixels) */
+ int32_t abs_y; /* absolute Y position (in FB pixels) */
+ int32_t rel_z; /* relative Z motion (wheel) */
+};
+
+#define XENKBD_IN_EVENT_SIZE 40
+
+union xenkbd_in_event
+{
+ uint8_t type;
+ struct xenkbd_motion motion;
+ struct xenkbd_key key;
+ struct xenkbd_position pos;
+ char pad[XENKBD_IN_EVENT_SIZE];
+};
+
+/* Out events (frontend -> backend) */
+
+/*
+ * Out events may be sent only when requested by backend, and receipt
+ * of an unknown out event is an error.
+ * No out events currently defined.
+ */
+
+#define XENKBD_OUT_EVENT_SIZE 40
+
+union xenkbd_out_event
+{
+ uint8_t type;
+ char pad[XENKBD_OUT_EVENT_SIZE];
+};
+
+/* shared page */
+
+#define XENKBD_IN_RING_SIZE 2048
+#define XENKBD_IN_RING_LEN (XENKBD_IN_RING_SIZE / XENKBD_IN_EVENT_SIZE)
+#define XENKBD_IN_RING_OFFS 1024
+#define XENKBD_IN_RING(page) \
+ ((union xenkbd_in_event *)((char *)(page) + XENKBD_IN_RING_OFFS))
+#define XENKBD_IN_RING_REF(page, idx) \
+ (XENKBD_IN_RING((page))[(idx) % XENKBD_IN_RING_LEN])
+
+#define XENKBD_OUT_RING_SIZE 1024
+#define XENKBD_OUT_RING_LEN (XENKBD_OUT_RING_SIZE / XENKBD_OUT_EVENT_SIZE)
+#define XENKBD_OUT_RING_OFFS (XENKBD_IN_RING_OFFS + XENKBD_IN_RING_SIZE)
+#define XENKBD_OUT_RING(page) \
+ ((union xenkbd_out_event *)((char *)(page) + XENKBD_OUT_RING_OFFS))
+#define XENKBD_OUT_RING_REF(page, idx) \
+ (XENKBD_OUT_RING((page))[(idx) % XENKBD_OUT_RING_LEN])
+
+struct xenkbd_page
+{
+ uint32_t in_cons, in_prod;
+ uint32_t out_cons, out_prod;
+};
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/netif.h b/xen/public/io/netif.h
new file mode 100644
index 0000000..fbb5c27
--- /dev/null
+++ b/xen/public/io/netif.h
@@ -0,0 +1,205 @@
+/******************************************************************************
+ * netif.h
+ *
+ * Unified network-device I/O interface for Xen guest OSes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2003-2004, Keir Fraser
+ */
+
+#ifndef __XEN_PUBLIC_IO_NETIF_H__
+#define __XEN_PUBLIC_IO_NETIF_H__
+
+#include "ring.h"
+#include "../grant_table.h"
+
+/*
+ * Notifications after enqueuing any type of message should be conditional on
+ * the appropriate req_event or rsp_event field in the shared ring.
+ * If the client sends notification for rx requests then it should specify
+ * feature 'feature-rx-notify' via xenbus. Otherwise the backend will assume
+ * that it cannot safely queue packets (as it may not be kicked to send them).
+ */
+
+/*
+ * This is the 'wire' format for packets:
+ * Request 1: netif_tx_request -- NETTXF_* (any flags)
+ * [Request 2: netif_tx_extra] (only if request 1 has NETTXF_extra_info)
+ * [Request 3: netif_tx_extra] (only if request 2 has XEN_NETIF_EXTRA_MORE)
+ * Request 4: netif_tx_request -- NETTXF_more_data
+ * Request 5: netif_tx_request -- NETTXF_more_data
+ * ...
+ * Request N: netif_tx_request -- 0
+ */
+
+/* Protocol checksum field is blank in the packet (hardware offload)? */
+#define _NETTXF_csum_blank (0)
+#define NETTXF_csum_blank (1U<<_NETTXF_csum_blank)
+
+/* Packet data has been validated against protocol checksum. */
+#define _NETTXF_data_validated (1)
+#define NETTXF_data_validated (1U<<_NETTXF_data_validated)
+
+/* Packet continues in the next request descriptor. */
+#define _NETTXF_more_data (2)
+#define NETTXF_more_data (1U<<_NETTXF_more_data)
+
+/* Packet to be followed by extra descriptor(s). */
+#define _NETTXF_extra_info (3)
+#define NETTXF_extra_info (1U<<_NETTXF_extra_info)
+
+struct netif_tx_request {
+ grant_ref_t gref; /* Reference to buffer page */
+ uint16_t offset; /* Offset within buffer page */
+ uint16_t flags; /* NETTXF_* */
+ uint16_t id; /* Echoed in response message. */
+ uint16_t size; /* Packet size in bytes. */
+};
+typedef struct netif_tx_request netif_tx_request_t;
+
+/* Types of netif_extra_info descriptors. */
+#define XEN_NETIF_EXTRA_TYPE_NONE (0) /* Never used - invalid */
+#define XEN_NETIF_EXTRA_TYPE_GSO (1) /* u.gso */
+#define XEN_NETIF_EXTRA_TYPE_MCAST_ADD (2) /* u.mcast */
+#define XEN_NETIF_EXTRA_TYPE_MCAST_DEL (3) /* u.mcast */
+#define XEN_NETIF_EXTRA_TYPE_MAX (4)
+
+/* netif_extra_info flags. */
+#define _XEN_NETIF_EXTRA_FLAG_MORE (0)
+#define XEN_NETIF_EXTRA_FLAG_MORE (1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
+
+/* GSO types - only TCPv4 currently supported. */
+#define XEN_NETIF_GSO_TYPE_TCPV4 (1)
+
+/*
+ * This structure needs to fit within both netif_tx_request and
+ * netif_rx_response for compatibility.
+ */
+struct netif_extra_info {
+ uint8_t type; /* XEN_NETIF_EXTRA_TYPE_* */
+ uint8_t flags; /* XEN_NETIF_EXTRA_FLAG_* */
+
+ union {
+ /*
+ * XEN_NETIF_EXTRA_TYPE_GSO:
+ */
+ struct {
+ /*
+ * Maximum payload size of each segment. For example, for TCP this
+ * is just the path MSS.
+ */
+ uint16_t size;
+
+ /*
+ * GSO type. This determines the protocol of the packet and any
+ * extra features required to segment the packet properly.
+ */
+ uint8_t type; /* XEN_NETIF_GSO_TYPE_* */
+
+ /* Future expansion. */
+ uint8_t pad;
+
+ /*
+ * GSO features. This specifies any extra GSO features required
+ * to process this packet, such as ECN support for TCPv4.
+ */
+ uint16_t features; /* XEN_NETIF_GSO_FEAT_* */
+ } gso;
+
+ /*
+ * XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL}:
+ * Backend advertises availability via 'feature-multicast-control'
+ * xenbus node containing value '1'.
+ * Frontend requests this feature by advertising
+ * 'request-multicast-control' xenbus node containing value '1'.
+ * If multicast control is requested then multicast flooding is
+ * disabled and the frontend must explicitly register its interest
+ * in multicast groups using dummy transmit requests containing
+ * MCAST_{ADD,DEL} extra-info fragments.
+ */
+ struct {
+ uint8_t addr[6]; /* Address to add/remove. */
+ } mcast;
+
+ uint16_t pad[3];
+ } u;
+};
+typedef struct netif_extra_info netif_extra_info_t;
+
+struct netif_tx_response {
+ uint16_t id;
+ int16_t status; /* NETIF_RSP_* */
+};
+typedef struct netif_tx_response netif_tx_response_t;
+
+struct netif_rx_request {
+ uint16_t id; /* Echoed in response message. */
+ grant_ref_t gref; /* Reference to incoming granted frame */
+};
+typedef struct netif_rx_request netif_rx_request_t;
+
+/* Packet data has been validated against protocol checksum. */
+#define _NETRXF_data_validated (0)
+#define NETRXF_data_validated (1U<<_NETRXF_data_validated)
+
+/* Protocol checksum field is blank in the packet (hardware offload)? */
+#define _NETRXF_csum_blank (1)
+#define NETRXF_csum_blank (1U<<_NETRXF_csum_blank)
+
+/* Packet continues in the next request descriptor. */
+#define _NETRXF_more_data (2)
+#define NETRXF_more_data (1U<<_NETRXF_more_data)
+
+/* Packet to be followed by extra descriptor(s). */
+#define _NETRXF_extra_info (3)
+#define NETRXF_extra_info (1U<<_NETRXF_extra_info)
+
+struct netif_rx_response {
+ uint16_t id;
+ uint16_t offset; /* Offset in page of start of received packet */
+ uint16_t flags; /* NETRXF_* */
+ int16_t status; /* -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
+};
+typedef struct netif_rx_response netif_rx_response_t;
+
+/*
+ * Generate netif ring structures and types.
+ */
+
+DEFINE_RING_TYPES(netif_tx, struct netif_tx_request, struct netif_tx_response);
+DEFINE_RING_TYPES(netif_rx, struct netif_rx_request, struct netif_rx_response);
+
+#define NETIF_RSP_DROPPED -2
+#define NETIF_RSP_ERROR -1
+#define NETIF_RSP_OKAY 0
+/* No response: used for auxiliary requests (e.g., netif_tx_extra). */
+#define NETIF_RSP_NULL 1
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/pciif.h b/xen/public/io/pciif.h
new file mode 100644
index 0000000..0a0ffcc
--- /dev/null
+++ b/xen/public/io/pciif.h
@@ -0,0 +1,101 @@
+/*
+ * PCI Backend/Frontend Common Data Structures & Macros
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Ryan Wilson <hap9@epoch.ncsc.mil>
+ */
+#ifndef __XEN_PCI_COMMON_H__
+#define __XEN_PCI_COMMON_H__
+
+/* Be sure to bump this number if you change this file */
+#define XEN_PCI_MAGIC "7"
+
+/* xen_pci_sharedinfo flags */
+#define _XEN_PCIF_active (0)
+#define XEN_PCIF_active (1<<_XEN_PCI_active)
+
+/* xen_pci_op commands */
+#define XEN_PCI_OP_conf_read (0)
+#define XEN_PCI_OP_conf_write (1)
+#define XEN_PCI_OP_enable_msi (2)
+#define XEN_PCI_OP_disable_msi (3)
+#define XEN_PCI_OP_enable_msix (4)
+#define XEN_PCI_OP_disable_msix (5)
+
+/* xen_pci_op error numbers */
+#define XEN_PCI_ERR_success (0)
+#define XEN_PCI_ERR_dev_not_found (-1)
+#define XEN_PCI_ERR_invalid_offset (-2)
+#define XEN_PCI_ERR_access_denied (-3)
+#define XEN_PCI_ERR_not_implemented (-4)
+/* XEN_PCI_ERR_op_failed - backend failed to complete the operation */
+#define XEN_PCI_ERR_op_failed (-5)
+
+/*
+ * it should be PAGE_SIZE-sizeof(struct xen_pci_op))/sizeof(struct msix_entry))
+ * Should not exceed 128
+ */
+#define SH_INFO_MAX_VEC 128
+
+struct xen_msix_entry {
+ uint16_t vector;
+ uint16_t entry;
+};
+struct xen_pci_op {
+ /* IN: what action to perform: XEN_PCI_OP_* */
+ uint32_t cmd;
+
+ /* OUT: will contain an error number (if any) from errno.h */
+ int32_t err;
+
+ /* IN: which device to touch */
+ uint32_t domain; /* PCI Domain/Segment */
+ uint32_t bus;
+ uint32_t devfn;
+
+ /* IN: which configuration registers to touch */
+ int32_t offset;
+ int32_t size;
+
+ /* IN/OUT: Contains the result after a READ or the value to WRITE */
+ uint32_t value;
+ /* IN: Contains extra infor for this operation */
+ uint32_t info;
+ /*IN: param for msi-x */
+ struct xen_msix_entry msix_entries[SH_INFO_MAX_VEC];
+};
+
+struct xen_pci_sharedinfo {
+ /* flags - XEN_PCIF_* */
+ uint32_t flags;
+ struct xen_pci_op op;
+};
+
+#endif /* __XEN_PCI_COMMON_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/protocols.h b/xen/public/io/protocols.h
new file mode 100644
index 0000000..77bd1bd
--- /dev/null
+++ b/xen/public/io/protocols.h
@@ -0,0 +1,40 @@
+/******************************************************************************
+ * protocols.h
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PROTOCOLS_H__
+#define __XEN_PROTOCOLS_H__
+
+#define XEN_IO_PROTO_ABI_X86_32 "x86_32-abi"
+#define XEN_IO_PROTO_ABI_X86_64 "x86_64-abi"
+#define XEN_IO_PROTO_ABI_IA64 "ia64-abi"
+
+#if defined(__i386__)
+# define XEN_IO_PROTO_ABI_NATIVE XEN_IO_PROTO_ABI_X86_32
+#elif defined(__x86_64__)
+# define XEN_IO_PROTO_ABI_NATIVE XEN_IO_PROTO_ABI_X86_64
+#elif defined(__ia64__)
+# define XEN_IO_PROTO_ABI_NATIVE XEN_IO_PROTO_ABI_IA64
+#else
+# error arch fixup needed here
+#endif
+
+#endif
diff --git a/xen/public/io/ring.h b/xen/public/io/ring.h
new file mode 100644
index 0000000..6ce1d0d
--- /dev/null
+++ b/xen/public/io/ring.h
@@ -0,0 +1,307 @@
+/******************************************************************************
+ * ring.h
+ *
+ * Shared producer-consumer ring macros.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Tim Deegan and Andrew Warfield November 2004.
+ */
+
+#ifndef __XEN_PUBLIC_IO_RING_H__
+#define __XEN_PUBLIC_IO_RING_H__
+
+#include "../xen-compat.h"
+
+#if __XEN_INTERFACE_VERSION__ < 0x00030208
+#define xen_mb() mb()
+#define xen_rmb() rmb()
+#define xen_wmb() wmb()
+#endif
+
+typedef unsigned int RING_IDX;
+
+/* Round a 32-bit unsigned constant down to the nearest power of two. */
+#define __RD2(_x) (((_x) & 0x00000002) ? 0x2 : ((_x) & 0x1))
+#define __RD4(_x) (((_x) & 0x0000000c) ? __RD2((_x)>>2)<<2 : __RD2(_x))
+#define __RD8(_x) (((_x) & 0x000000f0) ? __RD4((_x)>>4)<<4 : __RD4(_x))
+#define __RD16(_x) (((_x) & 0x0000ff00) ? __RD8((_x)>>8)<<8 : __RD8(_x))
+#define __RD32(_x) (((_x) & 0xffff0000) ? __RD16((_x)>>16)<<16 : __RD16(_x))
+
+/*
+ * Calculate size of a shared ring, given the total available space for the
+ * ring and indexes (_sz), and the name tag of the request/response structure.
+ * A ring contains as many entries as will fit, rounded down to the nearest
+ * power of two (so we can mask with (size-1) to loop around).
+ */
+#define __RING_SIZE(_s, _sz) \
+ (__RD32(((_sz) - (long)(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0])))
+
+/*
+ * Macros to make the correct C datatypes for a new kind of ring.
+ *
+ * To make a new ring datatype, you need to have two message structures,
+ * let's say request_t, and response_t already defined.
+ *
+ * In a header where you want the ring datatype declared, you then do:
+ *
+ * DEFINE_RING_TYPES(mytag, request_t, response_t);
+ *
+ * These expand out to give you a set of types, as you can see below.
+ * The most important of these are:
+ *
+ * mytag_sring_t - The shared ring.
+ * mytag_front_ring_t - The 'front' half of the ring.
+ * mytag_back_ring_t - The 'back' half of the ring.
+ *
+ * To initialize a ring in your code you need to know the location and size
+ * of the shared memory area (PAGE_SIZE, for instance). To initialise
+ * the front half:
+ *
+ * mytag_front_ring_t front_ring;
+ * SHARED_RING_INIT((mytag_sring_t *)shared_page);
+ * FRONT_RING_INIT(&front_ring, (mytag_sring_t *)shared_page, PAGE_SIZE);
+ *
+ * Initializing the back follows similarly (note that only the front
+ * initializes the shared ring):
+ *
+ * mytag_back_ring_t back_ring;
+ * BACK_RING_INIT(&back_ring, (mytag_sring_t *)shared_page, PAGE_SIZE);
+ */
+
+#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \
+ \
+/* Shared ring entry */ \
+union __name##_sring_entry { \
+ __req_t req; \
+ __rsp_t rsp; \
+}; \
+ \
+/* Shared ring page */ \
+struct __name##_sring { \
+ RING_IDX req_prod, req_event; \
+ RING_IDX rsp_prod, rsp_event; \
+ uint8_t pad[48]; \
+ union __name##_sring_entry ring[1]; /* variable-length */ \
+}; \
+ \
+/* "Front" end's private variables */ \
+struct __name##_front_ring { \
+ RING_IDX req_prod_pvt; \
+ RING_IDX rsp_cons; \
+ unsigned int nr_ents; \
+ struct __name##_sring *sring; \
+}; \
+ \
+/* "Back" end's private variables */ \
+struct __name##_back_ring { \
+ RING_IDX rsp_prod_pvt; \
+ RING_IDX req_cons; \
+ unsigned int nr_ents; \
+ struct __name##_sring *sring; \
+}; \
+ \
+/* Syntactic sugar */ \
+typedef struct __name##_sring __name##_sring_t; \
+typedef struct __name##_front_ring __name##_front_ring_t; \
+typedef struct __name##_back_ring __name##_back_ring_t
+
+/*
+ * Macros for manipulating rings.
+ *
+ * FRONT_RING_whatever works on the "front end" of a ring: here
+ * requests are pushed on to the ring and responses taken off it.
+ *
+ * BACK_RING_whatever works on the "back end" of a ring: here
+ * requests are taken off the ring and responses put on.
+ *
+ * N.B. these macros do NO INTERLOCKS OR FLOW CONTROL.
+ * This is OK in 1-for-1 request-response situations where the
+ * requestor (front end) never has more than RING_SIZE()-1
+ * outstanding requests.
+ */
+
+/* Initialising empty rings */
+#define SHARED_RING_INIT(_s) do { \
+ (_s)->req_prod = (_s)->rsp_prod = 0; \
+ (_s)->req_event = (_s)->rsp_event = 1; \
+ (void)memset((_s)->pad, 0, sizeof((_s)->pad)); \
+} while(0)
+
+#define FRONT_RING_INIT(_r, _s, __size) do { \
+ (_r)->req_prod_pvt = 0; \
+ (_r)->rsp_cons = 0; \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+ (_r)->sring = (_s); \
+} while (0)
+
+#define BACK_RING_INIT(_r, _s, __size) do { \
+ (_r)->rsp_prod_pvt = 0; \
+ (_r)->req_cons = 0; \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+ (_r)->sring = (_s); \
+} while (0)
+
+/* Initialize to existing shared indexes -- for recovery */
+#define FRONT_RING_ATTACH(_r, _s, __size) do { \
+ (_r)->sring = (_s); \
+ (_r)->req_prod_pvt = (_s)->req_prod; \
+ (_r)->rsp_cons = (_s)->rsp_prod; \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+} while (0)
+
+#define BACK_RING_ATTACH(_r, _s, __size) do { \
+ (_r)->sring = (_s); \
+ (_r)->rsp_prod_pvt = (_s)->rsp_prod; \
+ (_r)->req_cons = (_s)->req_prod; \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+} while (0)
+
+/* How big is this ring? */
+#define RING_SIZE(_r) \
+ ((_r)->nr_ents)
+
+/* Number of free requests (for use on front side only). */
+#define RING_FREE_REQUESTS(_r) \
+ (RING_SIZE(_r) - ((_r)->req_prod_pvt - (_r)->rsp_cons))
+
+/* Test if there is an empty slot available on the front ring.
+ * (This is only meaningful from the front. )
+ */
+#define RING_FULL(_r) \
+ (RING_FREE_REQUESTS(_r) == 0)
+
+/* Test if there are outstanding messages to be processed on a ring. */
+#define RING_HAS_UNCONSUMED_RESPONSES(_r) \
+ ((_r)->sring->rsp_prod - (_r)->rsp_cons)
+
+#ifdef __GNUC__
+#define RING_HAS_UNCONSUMED_REQUESTS(_r) ({ \
+ unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \
+ unsigned int rsp = RING_SIZE(_r) - \
+ ((_r)->req_cons - (_r)->rsp_prod_pvt); \
+ req < rsp ? req : rsp; \
+})
+#else
+/* Same as above, but without the nice GCC ({ ... }) syntax. */
+#define RING_HAS_UNCONSUMED_REQUESTS(_r) \
+ ((((_r)->sring->req_prod - (_r)->req_cons) < \
+ (RING_SIZE(_r) - ((_r)->req_cons - (_r)->rsp_prod_pvt))) ? \
+ ((_r)->sring->req_prod - (_r)->req_cons) : \
+ (RING_SIZE(_r) - ((_r)->req_cons - (_r)->rsp_prod_pvt)))
+#endif
+
+/* Direct access to individual ring elements, by index. */
+#define RING_GET_REQUEST(_r, _idx) \
+ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req))
+
+#define RING_GET_RESPONSE(_r, _idx) \
+ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp))
+
+/* Loop termination condition: Would the specified index overflow the ring? */
+#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \
+ (((_cons) - (_r)->rsp_prod_pvt) >= RING_SIZE(_r))
+
+#define RING_PUSH_REQUESTS(_r) do { \
+ xen_wmb(); /* back sees requests /before/ updated producer index */ \
+ (_r)->sring->req_prod = (_r)->req_prod_pvt; \
+} while (0)
+
+#define RING_PUSH_RESPONSES(_r) do { \
+ xen_wmb(); /* front sees resps /before/ updated producer index */ \
+ (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
+} while (0)
+
+/*
+ * Notification hold-off (req_event and rsp_event):
+ *
+ * When queueing requests or responses on a shared ring, it may not always be
+ * necessary to notify the remote end. For example, if requests are in flight
+ * in a backend, the front may be able to queue further requests without
+ * notifying the back (if the back checks for new requests when it queues
+ * responses).
+ *
+ * When enqueuing requests or responses:
+ *
+ * Use RING_PUSH_{REQUESTS,RESPONSES}_AND_CHECK_NOTIFY(). The second argument
+ * is a boolean return value. True indicates that the receiver requires an
+ * asynchronous notification.
+ *
+ * After dequeuing requests or responses (before sleeping the connection):
+ *
+ * Use RING_FINAL_CHECK_FOR_REQUESTS() or RING_FINAL_CHECK_FOR_RESPONSES().
+ * The second argument is a boolean return value. True indicates that there
+ * are pending messages on the ring (i.e., the connection should not be put
+ * to sleep).
+ *
+ * These macros will set the req_event/rsp_event field to trigger a
+ * notification on the very next message that is enqueued. If you want to
+ * create batches of work (i.e., only receive a notification after several
+ * messages have been enqueued) then you will need to create a customised
+ * version of the FINAL_CHECK macro in your own code, which sets the event
+ * field appropriately.
+ */
+
+#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
+ RING_IDX __old = (_r)->sring->req_prod; \
+ RING_IDX __new = (_r)->req_prod_pvt; \
+ xen_wmb(); /* back sees requests /before/ updated producer index */ \
+ (_r)->sring->req_prod = __new; \
+ xen_mb(); /* back sees new requests /before/ we check req_event */ \
+ (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
+ (RING_IDX)(__new - __old)); \
+} while (0)
+
+#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
+ RING_IDX __old = (_r)->sring->rsp_prod; \
+ RING_IDX __new = (_r)->rsp_prod_pvt; \
+ xen_wmb(); /* front sees resps /before/ updated producer index */ \
+ (_r)->sring->rsp_prod = __new; \
+ xen_mb(); /* front sees new resps /before/ we check rsp_event */ \
+ (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
+ (RING_IDX)(__new - __old)); \
+} while (0)
+
+#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \
+ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
+ if (_work_to_do) break; \
+ (_r)->sring->req_event = (_r)->req_cons + 1; \
+ xen_mb(); \
+ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
+} while (0)
+
+#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \
+ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
+ if (_work_to_do) break; \
+ (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
+ xen_mb(); \
+ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
+} while (0)
+
+#endif /* __XEN_PUBLIC_IO_RING_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/tpmif.h b/xen/public/io/tpmif.h
new file mode 100644
index 0000000..02ccdab
--- /dev/null
+++ b/xen/public/io/tpmif.h
@@ -0,0 +1,77 @@
+/******************************************************************************
+ * tpmif.h
+ *
+ * TPM I/O interface for Xen guest OSes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, IBM Corporation
+ *
+ * Author: Stefan Berger, stefanb@us.ibm.com
+ * Grant table support: Mahadevan Gomathisankaran
+ *
+ * This code has been derived from tools/libxc/xen/io/netif.h
+ *
+ * Copyright (c) 2003-2004, Keir Fraser
+ */
+
+#ifndef __XEN_PUBLIC_IO_TPMIF_H__
+#define __XEN_PUBLIC_IO_TPMIF_H__
+
+#include "../grant_table.h"
+
+struct tpmif_tx_request {
+ unsigned long addr; /* Machine address of packet. */
+ grant_ref_t ref; /* grant table access reference */
+ uint16_t unused;
+ uint16_t size; /* Packet size in bytes. */
+};
+typedef struct tpmif_tx_request tpmif_tx_request_t;
+
+/*
+ * The TPMIF_TX_RING_SIZE defines the number of pages the
+ * front-end and backend can exchange (= size of array).
+ */
+typedef uint32_t TPMIF_RING_IDX;
+
+#define TPMIF_TX_RING_SIZE 1
+
+/* This structure must fit in a memory page. */
+
+struct tpmif_ring {
+ struct tpmif_tx_request req;
+};
+typedef struct tpmif_ring tpmif_ring_t;
+
+struct tpmif_tx_interface {
+ struct tpmif_ring ring[TPMIF_TX_RING_SIZE];
+};
+typedef struct tpmif_tx_interface tpmif_tx_interface_t;
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/xenbus.h b/xen/public/io/xenbus.h
new file mode 100644
index 0000000..4a053df
--- /dev/null
+++ b/xen/public/io/xenbus.h
@@ -0,0 +1,80 @@
+/*****************************************************************************
+ * xenbus.h
+ *
+ * Xenbus protocol details.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2005 XenSource Ltd.
+ */
+
+#ifndef _XEN_PUBLIC_IO_XENBUS_H
+#define _XEN_PUBLIC_IO_XENBUS_H
+
+/*
+ * The state of either end of the Xenbus, i.e. the current communication
+ * status of initialisation across the bus. States here imply nothing about
+ * the state of the connection between the driver and the kernel's device
+ * layers.
+ */
+enum xenbus_state {
+ XenbusStateUnknown = 0,
+
+ XenbusStateInitialising = 1,
+
+ /*
+ * InitWait: Finished early initialisation but waiting for information
+ * from the peer or hotplug scripts.
+ */
+ XenbusStateInitWait = 2,
+
+ /*
+ * Initialised: Waiting for a connection from the peer.
+ */
+ XenbusStateInitialised = 3,
+
+ XenbusStateConnected = 4,
+
+ /*
+ * Closing: The device is being closed due to an error or an unplug event.
+ */
+ XenbusStateClosing = 5,
+
+ XenbusStateClosed = 6,
+
+ /*
+ * Reconfiguring: The device is being reconfigured.
+ */
+ XenbusStateReconfiguring = 7,
+
+ XenbusStateReconfigured = 8
+};
+typedef enum xenbus_state XenbusState;
+
+#endif /* _XEN_PUBLIC_IO_XENBUS_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/io/xs_wire.h b/xen/public/io/xs_wire.h
new file mode 100644
index 0000000..dd2d966
--- /dev/null
+++ b/xen/public/io/xs_wire.h
@@ -0,0 +1,132 @@
+/*
+ * Details of the "wire" protocol between Xen Store Daemon and client
+ * library or guest kernel.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2005 Rusty Russell IBM Corporation
+ */
+
+#ifndef _XS_WIRE_H
+#define _XS_WIRE_H
+
+enum xsd_sockmsg_type
+{
+ XS_DEBUG,
+ XS_DIRECTORY,
+ XS_READ,
+ XS_GET_PERMS,
+ XS_WATCH,
+ XS_UNWATCH,
+ XS_TRANSACTION_START,
+ XS_TRANSACTION_END,
+ XS_INTRODUCE,
+ XS_RELEASE,
+ XS_GET_DOMAIN_PATH,
+ XS_WRITE,
+ XS_MKDIR,
+ XS_RM,
+ XS_SET_PERMS,
+ XS_WATCH_EVENT,
+ XS_ERROR,
+ XS_IS_DOMAIN_INTRODUCED,
+ XS_RESUME,
+ XS_SET_TARGET
+};
+
+#define XS_WRITE_NONE "NONE"
+#define XS_WRITE_CREATE "CREATE"
+#define XS_WRITE_CREATE_EXCL "CREATE|EXCL"
+
+#ifdef linux_specific
+/* We hand errors as strings, for portability. */
+struct xsd_errors
+{
+ int errnum;
+ const char *errstring;
+};
+#define XSD_ERROR(x) { x, #x }
+/* LINTED: static unused */
+static struct xsd_errors xsd_errors[]
+#if defined(__GNUC__)
+__attribute__((unused))
+#endif
+ = {
+ XSD_ERROR(EINVAL),
+ XSD_ERROR(EACCES),
+ XSD_ERROR(EEXIST),
+ XSD_ERROR(EISDIR),
+ XSD_ERROR(ENOENT),
+ XSD_ERROR(ENOMEM),
+ XSD_ERROR(ENOSPC),
+ XSD_ERROR(EIO),
+ XSD_ERROR(ENOTEMPTY),
+ XSD_ERROR(ENOSYS),
+ XSD_ERROR(EROFS),
+ XSD_ERROR(EBUSY),
+ XSD_ERROR(EAGAIN),
+ XSD_ERROR(EISCONN)
+};
+#endif
+
+struct xsd_sockmsg
+{
+ uint32_t type; /* XS_??? */
+ uint32_t req_id;/* Request identifier, echoed in daemon's response. */
+ uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
+ uint32_t len; /* Length of data following this. */
+
+ /* Generally followed by nul-terminated string(s). */
+};
+
+enum xs_watch_type
+{
+ XS_WATCH_PATH = 0,
+ XS_WATCH_TOKEN
+};
+
+/* Inter-domain shared memory communications. */
+#define XENSTORE_RING_SIZE 1024
+typedef uint32_t XENSTORE_RING_IDX;
+#define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
+struct xenstore_domain_interface {
+ char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
+ char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
+ XENSTORE_RING_IDX req_cons, req_prod;
+ XENSTORE_RING_IDX rsp_cons, rsp_prod;
+};
+
+/* Violating this is very bad. See docs/misc/xenstore.txt. */
+#define XENSTORE_PAYLOAD_MAX 4096
+
+/* Violating these just gets you an error back */
+#define XENSTORE_ABS_PATH_MAX 3072
+#define XENSTORE_REL_PATH_MAX 2048
+
+#endif /* _XS_WIRE_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/kexec.h b/xen/public/kexec.h
new file mode 100644
index 0000000..fc19f2f
--- /dev/null
+++ b/xen/public/kexec.h
@@ -0,0 +1,189 @@
+/******************************************************************************
+ * kexec.h - Public portion
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Xen port written by:
+ * - Simon 'Horms' Horman <horms@verge.net.au>
+ * - Magnus Damm <magnus@valinux.co.jp>
+ */
+
+#ifndef _XEN_PUBLIC_KEXEC_H
+#define _XEN_PUBLIC_KEXEC_H
+
+
+/* This file describes the Kexec / Kdump hypercall interface for Xen.
+ *
+ * Kexec under vanilla Linux allows a user to reboot the physical machine
+ * into a new user-specified kernel. The Xen port extends this idea
+ * to allow rebooting of the machine from dom0. When kexec for dom0
+ * is used to reboot, both the hypervisor and the domains get replaced
+ * with some other kernel. It is possible to kexec between vanilla
+ * Linux and Xen and back again. Xen to Xen works well too.
+ *
+ * The hypercall interface for kexec can be divided into three main
+ * types of hypercall operations:
+ *
+ * 1) Range information:
+ * This is used by the dom0 kernel to ask the hypervisor about various
+ * address information. This information is needed to allow kexec-tools
+ * to fill in the ELF headers for /proc/vmcore properly.
+ *
+ * 2) Load and unload of images:
+ * There are no big surprises here, the kexec binary from kexec-tools
+ * runs in userspace in dom0. The tool loads/unloads data into the
+ * dom0 kernel such as new kernel, initramfs and hypervisor. When
+ * loaded the dom0 kernel performs a load hypercall operation, and
+ * before releasing all page references the dom0 kernel calls unload.
+ *
+ * 3) Kexec operation:
+ * This is used to start a previously loaded kernel.
+ */
+
+#include "xen.h"
+
+#if defined(__i386__) || defined(__x86_64__)
+#define KEXEC_XEN_NO_PAGES 17
+#endif
+
+/*
+ * Prototype for this hypercall is:
+ * int kexec_op(int cmd, void *args)
+ * @cmd == KEXEC_CMD_...
+ * KEXEC operation to perform
+ * @args == Operation-specific extra arguments (NULL if none).
+ */
+
+/*
+ * Kexec supports two types of operation:
+ * - kexec into a regular kernel, very similar to a standard reboot
+ * - KEXEC_TYPE_DEFAULT is used to specify this type
+ * - kexec into a special "crash kernel", aka kexec-on-panic
+ * - KEXEC_TYPE_CRASH is used to specify this type
+ * - parts of our system may be broken at kexec-on-panic time
+ * - the code should be kept as simple and self-contained as possible
+ */
+
+#define KEXEC_TYPE_DEFAULT 0
+#define KEXEC_TYPE_CRASH 1
+
+
+/* The kexec implementation for Xen allows the user to load two
+ * types of kernels, KEXEC_TYPE_DEFAULT and KEXEC_TYPE_CRASH.
+ * All data needed for a kexec reboot is kept in one xen_kexec_image_t
+ * per "instance". The data mainly consists of machine address lists to pages
+ * together with destination addresses. The data in xen_kexec_image_t
+ * is passed to the "code page" which is one page of code that performs
+ * the final relocations before jumping to the new kernel.
+ */
+
+typedef struct xen_kexec_image {
+#if defined(__i386__) || defined(__x86_64__)
+ unsigned long page_list[KEXEC_XEN_NO_PAGES];
+#endif
+#if defined(__ia64__)
+ unsigned long reboot_code_buffer;
+#endif
+ unsigned long indirection_page;
+ unsigned long start_address;
+} xen_kexec_image_t;
+
+/*
+ * Perform kexec having previously loaded a kexec or kdump kernel
+ * as appropriate.
+ * type == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
+ */
+#define KEXEC_CMD_kexec 0
+typedef struct xen_kexec_exec {
+ int type;
+} xen_kexec_exec_t;
+
+/*
+ * Load/Unload kernel image for kexec or kdump.
+ * type == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
+ * image == relocation information for kexec (ignored for unload) [in]
+ */
+#define KEXEC_CMD_kexec_load 1
+#define KEXEC_CMD_kexec_unload 2
+typedef struct xen_kexec_load {
+ int type;
+ xen_kexec_image_t image;
+} xen_kexec_load_t;
+
+#define KEXEC_RANGE_MA_CRASH 0 /* machine address and size of crash area */
+#define KEXEC_RANGE_MA_XEN 1 /* machine address and size of Xen itself */
+#define KEXEC_RANGE_MA_CPU 2 /* machine address and size of a CPU note */
+#define KEXEC_RANGE_MA_XENHEAP 3 /* machine address and size of xenheap
+ * Note that although this is adjacent
+ * to Xen it exists in a separate EFI
+ * region on ia64, and thus needs to be
+ * inserted into iomem_machine separately */
+#define KEXEC_RANGE_MA_BOOT_PARAM 4 /* machine address and size of
+ * the ia64_boot_param */
+#define KEXEC_RANGE_MA_EFI_MEMMAP 5 /* machine address and size of
+ * of the EFI Memory Map */
+#define KEXEC_RANGE_MA_VMCOREINFO 6 /* machine address and size of vmcoreinfo */
+
+/*
+ * Find the address and size of certain memory areas
+ * range == KEXEC_RANGE_... [in]
+ * nr == physical CPU number (starting from 0) if KEXEC_RANGE_MA_CPU [in]
+ * size == number of bytes reserved in window [out]
+ * start == address of the first byte in the window [out]
+ */
+#define KEXEC_CMD_kexec_get_range 3
+typedef struct xen_kexec_range {
+ int range;
+ int nr;
+ unsigned long size;
+ unsigned long start;
+} xen_kexec_range_t;
+
+/* vmcoreinfo stuff */
+#define VMCOREINFO_BYTES (4096)
+#define VMCOREINFO_NOTE_NAME "VMCOREINFO_XEN"
+void arch_crash_save_vmcoreinfo(void);
+void vmcoreinfo_append_str(const char *fmt, ...)
+ __attribute__ ((format (printf, 1, 2)));
+#define VMCOREINFO_PAGESIZE(value) \
+ vmcoreinfo_append_str("PAGESIZE=%ld\n", value)
+#define VMCOREINFO_SYMBOL(name) \
+ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name)
+#define VMCOREINFO_SYMBOL_ALIAS(alias, name) \
+ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #alias, (unsigned long)&name)
+#define VMCOREINFO_STRUCT_SIZE(name) \
+ vmcoreinfo_append_str("SIZE(%s)=%zu\n", #name, sizeof(struct name))
+#define VMCOREINFO_OFFSET(name, field) \
+ vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \
+ (unsigned long)offsetof(struct name, field))
+#define VMCOREINFO_OFFSET_ALIAS(name, field, alias) \
+ vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #alias, \
+ (unsigned long)offsetof(struct name, field))
+
+#endif /* _XEN_PUBLIC_KEXEC_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/libelf.h b/xen/public/libelf.h
new file mode 100644
index 0000000..d238330
--- /dev/null
+++ b/xen/public/libelf.h
@@ -0,0 +1,265 @@
+/******************************************************************************
+ * libelf.h
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XC_LIBELF__
+#define __XC_LIBELF__ 1
+
+#if defined(__i386__) || defined(__x86_64__) || defined(__ia64__)
+#define XEN_ELF_LITTLE_ENDIAN
+#else
+#error define architectural endianness
+#endif
+
+#undef ELFSIZE
+#include "elfnote.h"
+#include "elfstructs.h"
+#include "features.h"
+
+/* ------------------------------------------------------------------------ */
+
+typedef union {
+ Elf32_Ehdr e32;
+ Elf64_Ehdr e64;
+} elf_ehdr;
+
+typedef union {
+ Elf32_Phdr e32;
+ Elf64_Phdr e64;
+} elf_phdr;
+
+typedef union {
+ Elf32_Shdr e32;
+ Elf64_Shdr e64;
+} elf_shdr;
+
+typedef union {
+ Elf32_Sym e32;
+ Elf64_Sym e64;
+} elf_sym;
+
+typedef union {
+ Elf32_Rel e32;
+ Elf64_Rel e64;
+} elf_rel;
+
+typedef union {
+ Elf32_Rela e32;
+ Elf64_Rela e64;
+} elf_rela;
+
+typedef union {
+ Elf32_Note e32;
+ Elf64_Note e64;
+} elf_note;
+
+struct elf_binary {
+ /* elf binary */
+ const char *image;
+ size_t size;
+ char class;
+ char data;
+
+ const elf_ehdr *ehdr;
+ const char *sec_strtab;
+ const elf_shdr *sym_tab;
+ const char *sym_strtab;
+
+ /* loaded to */
+ char *dest;
+ uint64_t pstart;
+ uint64_t pend;
+ uint64_t reloc_offset;
+
+ uint64_t bsd_symtab_pstart;
+ uint64_t bsd_symtab_pend;
+
+#ifndef __XEN__
+ /* misc */
+ FILE *log;
+#endif
+ int verbose;
+};
+
+/* ------------------------------------------------------------------------ */
+/* accessing elf header fields */
+
+#ifdef XEN_ELF_BIG_ENDIAN
+# define NATIVE_ELFDATA ELFDATA2MSB
+#else
+# define NATIVE_ELFDATA ELFDATA2LSB
+#endif
+
+#define elf_32bit(elf) (ELFCLASS32 == (elf)->class)
+#define elf_64bit(elf) (ELFCLASS64 == (elf)->class)
+#define elf_msb(elf) (ELFDATA2MSB == (elf)->data)
+#define elf_lsb(elf) (ELFDATA2LSB == (elf)->data)
+#define elf_swap(elf) (NATIVE_ELFDATA != (elf)->data)
+
+#define elf_uval(elf, str, elem) \
+ ((ELFCLASS64 == (elf)->class) \
+ ? elf_access_unsigned((elf), (str), \
+ offsetof(typeof(*(str)),e64.elem), \
+ sizeof((str)->e64.elem)) \
+ : elf_access_unsigned((elf), (str), \
+ offsetof(typeof(*(str)),e32.elem), \
+ sizeof((str)->e32.elem)))
+
+#define elf_sval(elf, str, elem) \
+ ((ELFCLASS64 == (elf)->class) \
+ ? elf_access_signed((elf), (str), \
+ offsetof(typeof(*(str)),e64.elem), \
+ sizeof((str)->e64.elem)) \
+ : elf_access_signed((elf), (str), \
+ offsetof(typeof(*(str)),e32.elem), \
+ sizeof((str)->e32.elem)))
+
+#define elf_size(elf, str) \
+ ((ELFCLASS64 == (elf)->class) \
+ ? sizeof((str)->e64) : sizeof((str)->e32))
+
+uint64_t elf_access_unsigned(struct elf_binary *elf, const void *ptr,
+ uint64_t offset, size_t size);
+int64_t elf_access_signed(struct elf_binary *elf, const void *ptr,
+ uint64_t offset, size_t size);
+
+uint64_t elf_round_up(struct elf_binary *elf, uint64_t addr);
+
+/* ------------------------------------------------------------------------ */
+/* xc_libelf_tools.c */
+
+int elf_shdr_count(struct elf_binary *elf);
+int elf_phdr_count(struct elf_binary *elf);
+
+const elf_shdr *elf_shdr_by_name(struct elf_binary *elf, const char *name);
+const elf_shdr *elf_shdr_by_index(struct elf_binary *elf, int index);
+const elf_phdr *elf_phdr_by_index(struct elf_binary *elf, int index);
+
+const char *elf_section_name(struct elf_binary *elf, const elf_shdr * shdr);
+const void *elf_section_start(struct elf_binary *elf, const elf_shdr * shdr);
+const void *elf_section_end(struct elf_binary *elf, const elf_shdr * shdr);
+
+const void *elf_segment_start(struct elf_binary *elf, const elf_phdr * phdr);
+const void *elf_segment_end(struct elf_binary *elf, const elf_phdr * phdr);
+
+const elf_sym *elf_sym_by_name(struct elf_binary *elf, const char *symbol);
+const elf_sym *elf_sym_by_index(struct elf_binary *elf, int index);
+
+const char *elf_note_name(struct elf_binary *elf, const elf_note * note);
+const void *elf_note_desc(struct elf_binary *elf, const elf_note * note);
+uint64_t elf_note_numeric(struct elf_binary *elf, const elf_note * note);
+const elf_note *elf_note_next(struct elf_binary *elf, const elf_note * note);
+
+int elf_is_elfbinary(const void *image);
+int elf_phdr_is_loadable(struct elf_binary *elf, const elf_phdr * phdr);
+
+/* ------------------------------------------------------------------------ */
+/* xc_libelf_loader.c */
+
+int elf_init(struct elf_binary *elf, const char *image, size_t size);
+#ifdef __XEN__
+void elf_set_verbose(struct elf_binary *elf);
+#else
+void elf_set_logfile(struct elf_binary *elf, FILE * log, int verbose);
+#endif
+
+void elf_parse_binary(struct elf_binary *elf);
+void elf_load_binary(struct elf_binary *elf);
+
+void *elf_get_ptr(struct elf_binary *elf, unsigned long addr);
+uint64_t elf_lookup_addr(struct elf_binary *elf, const char *symbol);
+
+void elf_parse_bsdsyms(struct elf_binary *elf, uint64_t pstart); /* private */
+
+/* ------------------------------------------------------------------------ */
+/* xc_libelf_relocate.c */
+
+int elf_reloc(struct elf_binary *elf);
+
+/* ------------------------------------------------------------------------ */
+/* xc_libelf_dominfo.c */
+
+#define UNSET_ADDR ((uint64_t)-1)
+
+enum xen_elfnote_type {
+ XEN_ENT_NONE = 0,
+ XEN_ENT_LONG = 1,
+ XEN_ENT_STR = 2
+};
+
+struct xen_elfnote {
+ enum xen_elfnote_type type;
+ const char *name;
+ union {
+ const char *str;
+ uint64_t num;
+ } data;
+};
+
+struct elf_dom_parms {
+ /* raw */
+ const char *guest_info;
+ const void *elf_note_start;
+ const void *elf_note_end;
+ struct xen_elfnote elf_notes[XEN_ELFNOTE_MAX + 1];
+
+ /* parsed */
+ char guest_os[16];
+ char guest_ver[16];
+ char xen_ver[16];
+ char loader[16];
+ int pae;
+ int bsd_symtab;
+ uint64_t virt_base;
+ uint64_t virt_entry;
+ uint64_t virt_hypercall;
+ uint64_t virt_hv_start_low;
+ uint64_t elf_paddr_offset;
+ uint32_t f_supported[XENFEAT_NR_SUBMAPS];
+ uint32_t f_required[XENFEAT_NR_SUBMAPS];
+
+ /* calculated */
+ uint64_t virt_offset;
+ uint64_t virt_kstart;
+ uint64_t virt_kend;
+};
+
+static inline void elf_xen_feature_set(int nr, uint32_t * addr)
+{
+ addr[nr >> 5] |= 1 << (nr & 31);
+}
+static inline int elf_xen_feature_get(int nr, uint32_t * addr)
+{
+ return !!(addr[nr >> 5] & (1 << (nr & 31)));
+}
+
+int elf_xen_parse_features(const char *features,
+ uint32_t *supported,
+ uint32_t *required);
+int elf_xen_parse_note(struct elf_binary *elf,
+ struct elf_dom_parms *parms,
+ const elf_note *note);
+int elf_xen_parse_guest_info(struct elf_binary *elf,
+ struct elf_dom_parms *parms);
+int elf_xen_parse(struct elf_binary *elf,
+ struct elf_dom_parms *parms);
+
+#endif /* __XC_LIBELF__ */
diff --git a/xen/public/memory.h b/xen/public/memory.h
new file mode 100644
index 0000000..d7b9fff
--- /dev/null
+++ b/xen/public/memory.h
@@ -0,0 +1,312 @@
+/******************************************************************************
+ * memory.h
+ *
+ * Memory reservation and information.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, Keir Fraser <keir@xensource.com>
+ */
+
+#ifndef __XEN_PUBLIC_MEMORY_H__
+#define __XEN_PUBLIC_MEMORY_H__
+
+/*
+ * Increase or decrease the specified domain's memory reservation. Returns the
+ * number of extents successfully allocated or freed.
+ * arg == addr of struct xen_memory_reservation.
+ */
+#define XENMEM_increase_reservation 0
+#define XENMEM_decrease_reservation 1
+#define XENMEM_populate_physmap 6
+
+#if __XEN_INTERFACE_VERSION__ >= 0x00030209
+/*
+ * Maximum # bits addressable by the user of the allocated region (e.g., I/O
+ * devices often have a 32-bit limitation even in 64-bit systems). If zero
+ * then the user has no addressing restriction. This field is not used by
+ * XENMEM_decrease_reservation.
+ */
+#define XENMEMF_address_bits(x) (x)
+#define XENMEMF_get_address_bits(x) ((x) & 0xffu)
+/* NUMA node to allocate from. */
+#define XENMEMF_node(x) (((x) + 1) << 8)
+#define XENMEMF_get_node(x) ((((x) >> 8) - 1) & 0xffu)
+#endif
+
+struct xen_memory_reservation {
+
+ /*
+ * XENMEM_increase_reservation:
+ * OUT: MFN (*not* GMFN) bases of extents that were allocated
+ * XENMEM_decrease_reservation:
+ * IN: GMFN bases of extents to free
+ * XENMEM_populate_physmap:
+ * IN: GPFN bases of extents to populate with memory
+ * OUT: GMFN bases of extents that were allocated
+ * (NB. This command also updates the mach_to_phys translation table)
+ */
+ XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
+
+ /* Number of extents, and size/alignment of each (2^extent_order pages). */
+ xen_ulong_t nr_extents;
+ unsigned int extent_order;
+
+#if __XEN_INTERFACE_VERSION__ >= 0x00030209
+ /* XENMEMF flags. */
+ unsigned int mem_flags;
+#else
+ unsigned int address_bits;
+#endif
+
+ /*
+ * Domain whose reservation is being changed.
+ * Unprivileged domains can specify only DOMID_SELF.
+ */
+ domid_t domid;
+};
+typedef struct xen_memory_reservation xen_memory_reservation_t;
+DEFINE_XEN_GUEST_HANDLE(xen_memory_reservation_t);
+
+/*
+ * An atomic exchange of memory pages. If return code is zero then
+ * @out.extent_list provides GMFNs of the newly-allocated memory.
+ * Returns zero on complete success, otherwise a negative error code.
+ * On complete success then always @nr_exchanged == @in.nr_extents.
+ * On partial success @nr_exchanged indicates how much work was done.
+ */
+#define XENMEM_exchange 11
+struct xen_memory_exchange {
+ /*
+ * [IN] Details of memory extents to be exchanged (GMFN bases).
+ * Note that @in.address_bits is ignored and unused.
+ */
+ struct xen_memory_reservation in;
+
+ /*
+ * [IN/OUT] Details of new memory extents.
+ * We require that:
+ * 1. @in.domid == @out.domid
+ * 2. @in.nr_extents << @in.extent_order ==
+ * @out.nr_extents << @out.extent_order
+ * 3. @in.extent_start and @out.extent_start lists must not overlap
+ * 4. @out.extent_start lists GPFN bases to be populated
+ * 5. @out.extent_start is overwritten with allocated GMFN bases
+ */
+ struct xen_memory_reservation out;
+
+ /*
+ * [OUT] Number of input extents that were successfully exchanged:
+ * 1. The first @nr_exchanged input extents were successfully
+ * deallocated.
+ * 2. The corresponding first entries in the output extent list correctly
+ * indicate the GMFNs that were successfully exchanged.
+ * 3. All other input and output extents are untouched.
+ * 4. If not all input exents are exchanged then the return code of this
+ * command will be non-zero.
+ * 5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER!
+ */
+ xen_ulong_t nr_exchanged;
+};
+typedef struct xen_memory_exchange xen_memory_exchange_t;
+DEFINE_XEN_GUEST_HANDLE(xen_memory_exchange_t);
+
+/*
+ * Returns the maximum machine frame number of mapped RAM in this system.
+ * This command always succeeds (it never returns an error code).
+ * arg == NULL.
+ */
+#define XENMEM_maximum_ram_page 2
+
+/*
+ * Returns the current or maximum memory reservation, in pages, of the
+ * specified domain (may be DOMID_SELF). Returns -ve errcode on failure.
+ * arg == addr of domid_t.
+ */
+#define XENMEM_current_reservation 3
+#define XENMEM_maximum_reservation 4
+
+/*
+ * Returns the maximum GPFN in use by the guest, or -ve errcode on failure.
+ */
+#define XENMEM_maximum_gpfn 14
+
+/*
+ * Returns a list of MFN bases of 2MB extents comprising the machine_to_phys
+ * mapping table. Architectures which do not have a m2p table do not implement
+ * this command.
+ * arg == addr of xen_machphys_mfn_list_t.
+ */
+#define XENMEM_machphys_mfn_list 5
+struct xen_machphys_mfn_list {
+ /*
+ * Size of the 'extent_start' array. Fewer entries will be filled if the
+ * machphys table is smaller than max_extents * 2MB.
+ */
+ unsigned int max_extents;
+
+ /*
+ * Pointer to buffer to fill with list of extent starts. If there are
+ * any large discontiguities in the machine address space, 2MB gaps in
+ * the machphys table will be represented by an MFN base of zero.
+ */
+ XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
+
+ /*
+ * Number of extents written to the above array. This will be smaller
+ * than 'max_extents' if the machphys table is smaller than max_e * 2MB.
+ */
+ unsigned int nr_extents;
+};
+typedef struct xen_machphys_mfn_list xen_machphys_mfn_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_machphys_mfn_list_t);
+
+/*
+ * Returns the location in virtual address space of the machine_to_phys
+ * mapping table. Architectures which do not have a m2p table, or which do not
+ * map it by default into guest address space, do not implement this command.
+ * arg == addr of xen_machphys_mapping_t.
+ */
+#define XENMEM_machphys_mapping 12
+struct xen_machphys_mapping {
+ xen_ulong_t v_start, v_end; /* Start and end virtual addresses. */
+ xen_ulong_t max_mfn; /* Maximum MFN that can be looked up. */
+};
+typedef struct xen_machphys_mapping xen_machphys_mapping_t;
+DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t);
+
+/*
+ * Sets the GPFN at which a particular page appears in the specified guest's
+ * pseudophysical address space.
+ * arg == addr of xen_add_to_physmap_t.
+ */
+#define XENMEM_add_to_physmap 7
+struct xen_add_to_physmap {
+ /* Which domain to change the mapping for. */
+ domid_t domid;
+
+ /* Source mapping space. */
+#define XENMAPSPACE_shared_info 0 /* shared info page */
+#define XENMAPSPACE_grant_table 1 /* grant table page */
+#define XENMAPSPACE_mfn 2 /* usual MFN */
+ unsigned int space;
+
+ /* Index into source mapping space. */
+ xen_ulong_t idx;
+
+ /* GPFN where the source mapping page should appear. */
+ xen_pfn_t gpfn;
+};
+typedef struct xen_add_to_physmap xen_add_to_physmap_t;
+DEFINE_XEN_GUEST_HANDLE(xen_add_to_physmap_t);
+
+/*
+ * Unmaps the page appearing at a particular GPFN from the specified guest's
+ * pseudophysical address space.
+ * arg == addr of xen_remove_from_physmap_t.
+ */
+#define XENMEM_remove_from_physmap 15
+struct xen_remove_from_physmap {
+ /* Which domain to change the mapping for. */
+ domid_t domid;
+
+ /* GPFN of the current mapping of the page. */
+ xen_pfn_t gpfn;
+};
+typedef struct xen_remove_from_physmap xen_remove_from_physmap_t;
+DEFINE_XEN_GUEST_HANDLE(xen_remove_from_physmap_t);
+
+/*
+ * Translates a list of domain-specific GPFNs into MFNs. Returns a -ve error
+ * code on failure. This call only works for auto-translated guests.
+ */
+#define XENMEM_translate_gpfn_list 8
+struct xen_translate_gpfn_list {
+ /* Which domain to translate for? */
+ domid_t domid;
+
+ /* Length of list. */
+ xen_ulong_t nr_gpfns;
+
+ /* List of GPFNs to translate. */
+ XEN_GUEST_HANDLE(xen_pfn_t) gpfn_list;
+
+ /*
+ * Output list to contain MFN translations. May be the same as the input
+ * list (in which case each input GPFN is overwritten with the output MFN).
+ */
+ XEN_GUEST_HANDLE(xen_pfn_t) mfn_list;
+};
+typedef struct xen_translate_gpfn_list xen_translate_gpfn_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_translate_gpfn_list_t);
+
+/*
+ * Returns the pseudo-physical memory map as it was when the domain
+ * was started (specified by XENMEM_set_memory_map).
+ * arg == addr of xen_memory_map_t.
+ */
+#define XENMEM_memory_map 9
+struct xen_memory_map {
+ /*
+ * On call the number of entries which can be stored in buffer. On
+ * return the number of entries which have been stored in
+ * buffer.
+ */
+ unsigned int nr_entries;
+
+ /*
+ * Entries in the buffer are in the same format as returned by the
+ * BIOS INT 0x15 EAX=0xE820 call.
+ */
+ XEN_GUEST_HANDLE(void) buffer;
+};
+typedef struct xen_memory_map xen_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_memory_map_t);
+
+/*
+ * Returns the real physical memory map. Passes the same structure as
+ * XENMEM_memory_map.
+ * arg == addr of xen_memory_map_t.
+ */
+#define XENMEM_machine_memory_map 10
+
+/*
+ * Set the pseudo-physical memory map of a domain, as returned by
+ * XENMEM_memory_map.
+ * arg == addr of xen_foreign_memory_map_t.
+ */
+#define XENMEM_set_memory_map 13
+struct xen_foreign_memory_map {
+ domid_t domid;
+ struct xen_memory_map map;
+};
+typedef struct xen_foreign_memory_map xen_foreign_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_foreign_memory_map_t);
+
+#endif /* __XEN_PUBLIC_MEMORY_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/nmi.h b/xen/public/nmi.h
new file mode 100644
index 0000000..b2b8401
--- /dev/null
+++ b/xen/public/nmi.h
@@ -0,0 +1,78 @@
+/******************************************************************************
+ * nmi.h
+ *
+ * NMI callback registration and reason codes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, Keir Fraser <keir@xensource.com>
+ */
+
+#ifndef __XEN_PUBLIC_NMI_H__
+#define __XEN_PUBLIC_NMI_H__
+
+/*
+ * NMI reason codes:
+ * Currently these are x86-specific, stored in arch_shared_info.nmi_reason.
+ */
+ /* I/O-check error reported via ISA port 0x61, bit 6. */
+#define _XEN_NMIREASON_io_error 0
+#define XEN_NMIREASON_io_error (1UL << _XEN_NMIREASON_io_error)
+ /* Parity error reported via ISA port 0x61, bit 7. */
+#define _XEN_NMIREASON_parity_error 1
+#define XEN_NMIREASON_parity_error (1UL << _XEN_NMIREASON_parity_error)
+ /* Unknown hardware-generated NMI. */
+#define _XEN_NMIREASON_unknown 2
+#define XEN_NMIREASON_unknown (1UL << _XEN_NMIREASON_unknown)
+
+/*
+ * long nmi_op(unsigned int cmd, void *arg)
+ * NB. All ops return zero on success, else a negative error code.
+ */
+
+/*
+ * Register NMI callback for this (calling) VCPU. Currently this only makes
+ * sense for domain 0, vcpu 0. All other callers will be returned EINVAL.
+ * arg == pointer to xennmi_callback structure.
+ */
+#define XENNMI_register_callback 0
+struct xennmi_callback {
+ unsigned long handler_address;
+ unsigned long pad;
+};
+typedef struct xennmi_callback xennmi_callback_t;
+DEFINE_XEN_GUEST_HANDLE(xennmi_callback_t);
+
+/*
+ * Deregister NMI callback for this (calling) VCPU.
+ * arg == NULL.
+ */
+#define XENNMI_unregister_callback 1
+
+#endif /* __XEN_PUBLIC_NMI_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/physdev.h b/xen/public/physdev.h
new file mode 100644
index 0000000..8057277
--- /dev/null
+++ b/xen/public/physdev.h
@@ -0,0 +1,219 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_PHYSDEV_H__
+#define __XEN_PUBLIC_PHYSDEV_H__
+
+/*
+ * Prototype for this hypercall is:
+ * int physdev_op(int cmd, void *args)
+ * @cmd == PHYSDEVOP_??? (physdev operation).
+ * @args == Operation-specific extra arguments (NULL if none).
+ */
+
+/*
+ * Notify end-of-interrupt (EOI) for the specified IRQ.
+ * @arg == pointer to physdev_eoi structure.
+ */
+#define PHYSDEVOP_eoi 12
+struct physdev_eoi {
+ /* IN */
+ uint32_t irq;
+};
+typedef struct physdev_eoi physdev_eoi_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_eoi_t);
+
+/*
+ * Query the status of an IRQ line.
+ * @arg == pointer to physdev_irq_status_query structure.
+ */
+#define PHYSDEVOP_irq_status_query 5
+struct physdev_irq_status_query {
+ /* IN */
+ uint32_t irq;
+ /* OUT */
+ uint32_t flags; /* XENIRQSTAT_* */
+};
+typedef struct physdev_irq_status_query physdev_irq_status_query_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_irq_status_query_t);
+
+/* Need to call PHYSDEVOP_eoi when the IRQ has been serviced? */
+#define _XENIRQSTAT_needs_eoi (0)
+#define XENIRQSTAT_needs_eoi (1U<<_XENIRQSTAT_needs_eoi)
+
+/* IRQ shared by multiple guests? */
+#define _XENIRQSTAT_shared (1)
+#define XENIRQSTAT_shared (1U<<_XENIRQSTAT_shared)
+
+/*
+ * Set the current VCPU's I/O privilege level.
+ * @arg == pointer to physdev_set_iopl structure.
+ */
+#define PHYSDEVOP_set_iopl 6
+struct physdev_set_iopl {
+ /* IN */
+ uint32_t iopl;
+};
+typedef struct physdev_set_iopl physdev_set_iopl_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_set_iopl_t);
+
+/*
+ * Set the current VCPU's I/O-port permissions bitmap.
+ * @arg == pointer to physdev_set_iobitmap structure.
+ */
+#define PHYSDEVOP_set_iobitmap 7
+struct physdev_set_iobitmap {
+ /* IN */
+#if __XEN_INTERFACE_VERSION__ >= 0x00030205
+ XEN_GUEST_HANDLE(uint8) bitmap;
+#else
+ uint8_t *bitmap;
+#endif
+ uint32_t nr_ports;
+};
+typedef struct physdev_set_iobitmap physdev_set_iobitmap_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_set_iobitmap_t);
+
+/*
+ * Read or write an IO-APIC register.
+ * @arg == pointer to physdev_apic structure.
+ */
+#define PHYSDEVOP_apic_read 8
+#define PHYSDEVOP_apic_write 9
+struct physdev_apic {
+ /* IN */
+ unsigned long apic_physbase;
+ uint32_t reg;
+ /* IN or OUT */
+ uint32_t value;
+};
+typedef struct physdev_apic physdev_apic_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_apic_t);
+
+/*
+ * Allocate or free a physical upcall vector for the specified IRQ line.
+ * @arg == pointer to physdev_irq structure.
+ */
+#define PHYSDEVOP_alloc_irq_vector 10
+#define PHYSDEVOP_free_irq_vector 11
+struct physdev_irq {
+ /* IN */
+ uint32_t irq;
+ /* IN or OUT */
+ uint32_t vector;
+};
+typedef struct physdev_irq physdev_irq_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_irq_t);
+
+#define MAP_PIRQ_TYPE_MSI 0x0
+#define MAP_PIRQ_TYPE_GSI 0x1
+#define MAP_PIRQ_TYPE_UNKNOWN 0x2
+
+#define PHYSDEVOP_map_pirq 13
+struct physdev_map_pirq {
+ domid_t domid;
+ /* IN */
+ int type;
+ /* IN */
+ int index;
+ /* IN or OUT */
+ int pirq;
+ /* IN */
+ int bus;
+ /* IN */
+ int devfn;
+ /* IN */
+ int entry_nr;
+ /* IN */
+ uint64_t table_base;
+};
+typedef struct physdev_map_pirq physdev_map_pirq_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_map_pirq_t);
+
+#define PHYSDEVOP_unmap_pirq 14
+struct physdev_unmap_pirq {
+ domid_t domid;
+ /* IN */
+ int pirq;
+};
+
+typedef struct physdev_unmap_pirq physdev_unmap_pirq_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_unmap_pirq_t);
+
+#define PHYSDEVOP_manage_pci_add 15
+#define PHYSDEVOP_manage_pci_remove 16
+struct physdev_manage_pci {
+ /* IN */
+ uint8_t bus;
+ uint8_t devfn;
+};
+
+typedef struct physdev_manage_pci physdev_manage_pci_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_manage_pci_t);
+
+/*
+ * Argument to physdev_op_compat() hypercall. Superceded by new physdev_op()
+ * hypercall since 0x00030202.
+ */
+struct physdev_op {
+ uint32_t cmd;
+ union {
+ struct physdev_irq_status_query irq_status_query;
+ struct physdev_set_iopl set_iopl;
+ struct physdev_set_iobitmap set_iobitmap;
+ struct physdev_apic apic_op;
+ struct physdev_irq irq_op;
+ } u;
+};
+typedef struct physdev_op physdev_op_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_op_t);
+
+/*
+ * Notify that some PIRQ-bound event channels have been unmasked.
+ * ** This command is obsolete since interface version 0x00030202 and is **
+ * ** unsupported by newer versions of Xen. **
+ */
+#define PHYSDEVOP_IRQ_UNMASK_NOTIFY 4
+
+/*
+ * These all-capitals physdev operation names are superceded by the new names
+ * (defined above) since interface version 0x00030202.
+ */
+#define PHYSDEVOP_IRQ_STATUS_QUERY PHYSDEVOP_irq_status_query
+#define PHYSDEVOP_SET_IOPL PHYSDEVOP_set_iopl
+#define PHYSDEVOP_SET_IOBITMAP PHYSDEVOP_set_iobitmap
+#define PHYSDEVOP_APIC_READ PHYSDEVOP_apic_read
+#define PHYSDEVOP_APIC_WRITE PHYSDEVOP_apic_write
+#define PHYSDEVOP_ASSIGN_VECTOR PHYSDEVOP_alloc_irq_vector
+#define PHYSDEVOP_FREE_VECTOR PHYSDEVOP_free_irq_vector
+#define PHYSDEVOP_IRQ_NEEDS_UNMASK_NOTIFY XENIRQSTAT_needs_eoi
+#define PHYSDEVOP_IRQ_SHARED XENIRQSTAT_shared
+
+#endif /* __XEN_PUBLIC_PHYSDEV_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/platform.h b/xen/public/platform.h
new file mode 100644
index 0000000..eee047b
--- /dev/null
+++ b/xen/public/platform.h
@@ -0,0 +1,346 @@
+/******************************************************************************
+ * platform.h
+ *
+ * Hardware platform operations. Intended for use by domain-0 kernel.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2002-2006, K Fraser
+ */
+
+#ifndef __XEN_PUBLIC_PLATFORM_H__
+#define __XEN_PUBLIC_PLATFORM_H__
+
+#include "xen.h"
+
+#define XENPF_INTERFACE_VERSION 0x03000001
+
+/*
+ * Set clock such that it would read <secs,nsecs> after 00:00:00 UTC,
+ * 1 January, 1970 if the current system time was <system_time>.
+ */
+#define XENPF_settime 17
+struct xenpf_settime {
+ /* IN variables. */
+ uint32_t secs;
+ uint32_t nsecs;
+ uint64_t system_time;
+};
+typedef struct xenpf_settime xenpf_settime_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_settime_t);
+
+/*
+ * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
+ * On x86, @type is an architecture-defined MTRR memory type.
+ * On success, returns the MTRR that was used (@reg) and a handle that can
+ * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting.
+ * (x86-specific).
+ */
+#define XENPF_add_memtype 31
+struct xenpf_add_memtype {
+ /* IN variables. */
+ xen_pfn_t mfn;
+ uint64_t nr_mfns;
+ uint32_t type;
+ /* OUT variables. */
+ uint32_t handle;
+ uint32_t reg;
+};
+typedef struct xenpf_add_memtype xenpf_add_memtype_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_add_memtype_t);
+
+/*
+ * Tear down an existing memory-range type. If @handle is remembered then it
+ * should be passed in to accurately tear down the correct setting (in case
+ * of overlapping memory regions with differing types). If it is not known
+ * then @handle should be set to zero. In all cases @reg must be set.
+ * (x86-specific).
+ */
+#define XENPF_del_memtype 32
+struct xenpf_del_memtype {
+ /* IN variables. */
+ uint32_t handle;
+ uint32_t reg;
+};
+typedef struct xenpf_del_memtype xenpf_del_memtype_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_del_memtype_t);
+
+/* Read current type of an MTRR (x86-specific). */
+#define XENPF_read_memtype 33
+struct xenpf_read_memtype {
+ /* IN variables. */
+ uint32_t reg;
+ /* OUT variables. */
+ xen_pfn_t mfn;
+ uint64_t nr_mfns;
+ uint32_t type;
+};
+typedef struct xenpf_read_memtype xenpf_read_memtype_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_read_memtype_t);
+
+#define XENPF_microcode_update 35
+struct xenpf_microcode_update {
+ /* IN variables. */
+ XEN_GUEST_HANDLE(const_void) data;/* Pointer to microcode data */
+ uint32_t length; /* Length of microcode data. */
+};
+typedef struct xenpf_microcode_update xenpf_microcode_update_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_microcode_update_t);
+
+#define XENPF_platform_quirk 39
+#define QUIRK_NOIRQBALANCING 1 /* Do not restrict IO-APIC RTE targets */
+#define QUIRK_IOAPIC_BAD_REGSEL 2 /* IO-APIC REGSEL forgets its value */
+#define QUIRK_IOAPIC_GOOD_REGSEL 3 /* IO-APIC REGSEL behaves properly */
+struct xenpf_platform_quirk {
+ /* IN variables. */
+ uint32_t quirk_id;
+};
+typedef struct xenpf_platform_quirk xenpf_platform_quirk_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_platform_quirk_t);
+
+#define XENPF_firmware_info 50
+#define XEN_FW_DISK_INFO 1 /* from int 13 AH=08/41/48 */
+#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */
+#define XEN_FW_VBEDDC_INFO 3 /* from int 10 AX=4f15 */
+struct xenpf_firmware_info {
+ /* IN variables. */
+ uint32_t type;
+ uint32_t index;
+ /* OUT variables. */
+ union {
+ struct {
+ /* Int13, Fn48: Check Extensions Present. */
+ uint8_t device; /* %dl: bios device number */
+ uint8_t version; /* %ah: major version */
+ uint16_t interface_support; /* %cx: support bitmap */
+ /* Int13, Fn08: Legacy Get Device Parameters. */
+ uint16_t legacy_max_cylinder; /* %cl[7:6]:%ch: max cyl # */
+ uint8_t legacy_max_head; /* %dh: max head # */
+ uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector # */
+ /* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */
+ /* NB. First uint16_t of buffer must be set to buffer size. */
+ XEN_GUEST_HANDLE(void) edd_params;
+ } disk_info; /* XEN_FW_DISK_INFO */
+ struct {
+ uint8_t device; /* bios device number */
+ uint32_t mbr_signature; /* offset 0x1b8 in mbr */
+ } disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */
+ struct {
+ /* Int10, AX=4F15: Get EDID info. */
+ uint8_t capabilities;
+ uint8_t edid_transfer_time;
+ /* must refer to 128-byte buffer */
+ XEN_GUEST_HANDLE(uint8) edid;
+ } vbeddc_info; /* XEN_FW_VBEDDC_INFO */
+ } u;
+};
+typedef struct xenpf_firmware_info xenpf_firmware_info_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_firmware_info_t);
+
+#define XENPF_enter_acpi_sleep 51
+struct xenpf_enter_acpi_sleep {
+ /* IN variables */
+ uint16_t pm1a_cnt_val; /* PM1a control value. */
+ uint16_t pm1b_cnt_val; /* PM1b control value. */
+ uint32_t sleep_state; /* Which state to enter (Sn). */
+ uint32_t flags; /* Must be zero. */
+};
+typedef struct xenpf_enter_acpi_sleep xenpf_enter_acpi_sleep_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_enter_acpi_sleep_t);
+
+#define XENPF_change_freq 52
+struct xenpf_change_freq {
+ /* IN variables */
+ uint32_t flags; /* Must be zero. */
+ uint32_t cpu; /* Physical cpu. */
+ uint64_t freq; /* New frequency (Hz). */
+};
+typedef struct xenpf_change_freq xenpf_change_freq_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_change_freq_t);
+
+/*
+ * Get idle times (nanoseconds since boot) for physical CPUs specified in the
+ * @cpumap_bitmap with range [0..@cpumap_nr_cpus-1]. The @idletime array is
+ * indexed by CPU number; only entries with the corresponding @cpumap_bitmap
+ * bit set are written to. On return, @cpumap_bitmap is modified so that any
+ * non-existent CPUs are cleared. Such CPUs have their @idletime array entry
+ * cleared.
+ */
+#define XENPF_getidletime 53
+struct xenpf_getidletime {
+ /* IN/OUT variables */
+ /* IN: CPUs to interrogate; OUT: subset of IN which are present */
+ XEN_GUEST_HANDLE(uint8) cpumap_bitmap;
+ /* IN variables */
+ /* Size of cpumap bitmap. */
+ uint32_t cpumap_nr_cpus;
+ /* Must be indexable for every cpu in cpumap_bitmap. */
+ XEN_GUEST_HANDLE(uint64) idletime;
+ /* OUT variables */
+ /* System time when the idletime snapshots were taken. */
+ uint64_t now;
+};
+typedef struct xenpf_getidletime xenpf_getidletime_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
+
+#define XENPF_set_processor_pminfo 54
+
+/* ability bits */
+#define XEN_PROCESSOR_PM_CX 1
+#define XEN_PROCESSOR_PM_PX 2
+#define XEN_PROCESSOR_PM_TX 4
+
+/* cmd type */
+#define XEN_PM_CX 0
+#define XEN_PM_PX 1
+#define XEN_PM_TX 2
+
+/* Px sub info type */
+#define XEN_PX_PCT 1
+#define XEN_PX_PSS 2
+#define XEN_PX_PPC 4
+#define XEN_PX_PSD 8
+
+struct xen_power_register {
+ uint32_t space_id;
+ uint32_t bit_width;
+ uint32_t bit_offset;
+ uint32_t access_size;
+ uint64_t address;
+};
+
+struct xen_processor_csd {
+ uint32_t domain; /* domain number of one dependent group */
+ uint32_t coord_type; /* coordination type */
+ uint32_t num; /* number of processors in same domain */
+};
+typedef struct xen_processor_csd xen_processor_csd_t;
+DEFINE_XEN_GUEST_HANDLE(xen_processor_csd_t);
+
+struct xen_processor_cx {
+ struct xen_power_register reg; /* GAS for Cx trigger register */
+ uint8_t type; /* cstate value, c0: 0, c1: 1, ... */
+ uint32_t latency; /* worst latency (ms) to enter/exit this cstate */
+ uint32_t power; /* average power consumption(mW) */
+ uint32_t dpcnt; /* number of dependency entries */
+ XEN_GUEST_HANDLE(xen_processor_csd_t) dp; /* NULL if no dependency */
+};
+typedef struct xen_processor_cx xen_processor_cx_t;
+DEFINE_XEN_GUEST_HANDLE(xen_processor_cx_t);
+
+struct xen_processor_flags {
+ uint32_t bm_control:1;
+ uint32_t bm_check:1;
+ uint32_t has_cst:1;
+ uint32_t power_setup_done:1;
+ uint32_t bm_rld_set:1;
+};
+
+struct xen_processor_power {
+ uint32_t count; /* number of C state entries in array below */
+ struct xen_processor_flags flags; /* global flags of this processor */
+ XEN_GUEST_HANDLE(xen_processor_cx_t) states; /* supported c states */
+};
+
+struct xen_pct_register {
+ uint8_t descriptor;
+ uint16_t length;
+ uint8_t space_id;
+ uint8_t bit_width;
+ uint8_t bit_offset;
+ uint8_t reserved;
+ uint64_t address;
+};
+
+struct xen_processor_px {
+ uint64_t core_frequency; /* megahertz */
+ uint64_t power; /* milliWatts */
+ uint64_t transition_latency; /* microseconds */
+ uint64_t bus_master_latency; /* microseconds */
+ uint64_t control; /* control value */
+ uint64_t status; /* success indicator */
+};
+typedef struct xen_processor_px xen_processor_px_t;
+DEFINE_XEN_GUEST_HANDLE(xen_processor_px_t);
+
+struct xen_psd_package {
+ uint64_t num_entries;
+ uint64_t revision;
+ uint64_t domain;
+ uint64_t coord_type;
+ uint64_t num_processors;
+};
+
+struct xen_processor_performance {
+ uint32_t flags; /* flag for Px sub info type */
+ uint32_t platform_limit; /* Platform limitation on freq usage */
+ struct xen_pct_register control_register;
+ struct xen_pct_register status_register;
+ uint32_t state_count; /* total available performance states */
+ XEN_GUEST_HANDLE(xen_processor_px_t) states;
+ struct xen_psd_package domain_info;
+ uint32_t shared_type; /* coordination type of this processor */
+};
+typedef struct xen_processor_performance xen_processor_performance_t;
+DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t);
+
+struct xenpf_set_processor_pminfo {
+ /* IN variables */
+ uint32_t id; /* ACPI CPU ID */
+ uint32_t type; /* {XEN_PM_CX, XEN_PM_PX} */
+ union {
+ struct xen_processor_power power;/* Cx: _CST/_CSD */
+ struct xen_processor_performance perf; /* Px: _PPC/_PCT/_PSS/_PSD */
+ };
+};
+typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_set_processor_pminfo_t);
+
+struct xen_platform_op {
+ uint32_t cmd;
+ uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
+ union {
+ struct xenpf_settime settime;
+ struct xenpf_add_memtype add_memtype;
+ struct xenpf_del_memtype del_memtype;
+ struct xenpf_read_memtype read_memtype;
+ struct xenpf_microcode_update microcode;
+ struct xenpf_platform_quirk platform_quirk;
+ struct xenpf_firmware_info firmware_info;
+ struct xenpf_enter_acpi_sleep enter_acpi_sleep;
+ struct xenpf_change_freq change_freq;
+ struct xenpf_getidletime getidletime;
+ struct xenpf_set_processor_pminfo set_pminfo;
+ uint8_t pad[128];
+ } u;
+};
+typedef struct xen_platform_op xen_platform_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_platform_op_t);
+
+#endif /* __XEN_PUBLIC_PLATFORM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/sched.h b/xen/public/sched.h
new file mode 100644
index 0000000..2227a95
--- /dev/null
+++ b/xen/public/sched.h
@@ -0,0 +1,121 @@
+/******************************************************************************
+ * sched.h
+ *
+ * Scheduler state interactions
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, Keir Fraser <keir@xensource.com>
+ */
+
+#ifndef __XEN_PUBLIC_SCHED_H__
+#define __XEN_PUBLIC_SCHED_H__
+
+#include "event_channel.h"
+
+/*
+ * The prototype for this hypercall is:
+ * long sched_op(int cmd, void *arg)
+ * @cmd == SCHEDOP_??? (scheduler operation).
+ * @arg == Operation-specific extra argument(s), as described below.
+ *
+ * Versions of Xen prior to 3.0.2 provided only the following legacy version
+ * of this hypercall, supporting only the commands yield, block and shutdown:
+ * long sched_op(int cmd, unsigned long arg)
+ * @cmd == SCHEDOP_??? (scheduler operation).
+ * @arg == 0 (SCHEDOP_yield and SCHEDOP_block)
+ * == SHUTDOWN_* code (SCHEDOP_shutdown)
+ * This legacy version is available to new guests as sched_op_compat().
+ */
+
+/*
+ * Voluntarily yield the CPU.
+ * @arg == NULL.
+ */
+#define SCHEDOP_yield 0
+
+/*
+ * Block execution of this VCPU until an event is received for processing.
+ * If called with event upcalls masked, this operation will atomically
+ * reenable event delivery and check for pending events before blocking the
+ * VCPU. This avoids a "wakeup waiting" race.
+ * @arg == NULL.
+ */
+#define SCHEDOP_block 1
+
+/*
+ * Halt execution of this domain (all VCPUs) and notify the system controller.
+ * @arg == pointer to sched_shutdown structure.
+ */
+#define SCHEDOP_shutdown 2
+struct sched_shutdown {
+ unsigned int reason; /* SHUTDOWN_* */
+};
+typedef struct sched_shutdown sched_shutdown_t;
+DEFINE_XEN_GUEST_HANDLE(sched_shutdown_t);
+
+/*
+ * Poll a set of event-channel ports. Return when one or more are pending. An
+ * optional timeout may be specified.
+ * @arg == pointer to sched_poll structure.
+ */
+#define SCHEDOP_poll 3
+struct sched_poll {
+ XEN_GUEST_HANDLE(evtchn_port_t) ports;
+ unsigned int nr_ports;
+ uint64_t timeout;
+};
+typedef struct sched_poll sched_poll_t;
+DEFINE_XEN_GUEST_HANDLE(sched_poll_t);
+
+/*
+ * Declare a shutdown for another domain. The main use of this function is
+ * in interpreting shutdown requests and reasons for fully-virtualized
+ * domains. A para-virtualized domain may use SCHEDOP_shutdown directly.
+ * @arg == pointer to sched_remote_shutdown structure.
+ */
+#define SCHEDOP_remote_shutdown 4
+struct sched_remote_shutdown {
+ domid_t domain_id; /* Remote domain ID */
+ unsigned int reason; /* SHUTDOWN_xxx reason */
+};
+typedef struct sched_remote_shutdown sched_remote_shutdown_t;
+DEFINE_XEN_GUEST_HANDLE(sched_remote_shutdown_t);
+
+/*
+ * Reason codes for SCHEDOP_shutdown. These may be interpreted by control
+ * software to determine the appropriate action. For the most part, Xen does
+ * not care about the shutdown code.
+ */
+#define SHUTDOWN_poweroff 0 /* Domain exited normally. Clean up and kill. */
+#define SHUTDOWN_reboot 1 /* Clean up, kill, and then restart. */
+#define SHUTDOWN_suspend 2 /* Clean up, save suspend info, kill. */
+#define SHUTDOWN_crash 3 /* Tell controller we've crashed. */
+
+#endif /* __XEN_PUBLIC_SCHED_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/sysctl.h b/xen/public/sysctl.h
new file mode 100644
index 0000000..6b10954
--- /dev/null
+++ b/xen/public/sysctl.h
@@ -0,0 +1,308 @@
+/******************************************************************************
+ * sysctl.h
+ *
+ * System management operations. For use by node control stack.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2002-2006, K Fraser
+ */
+
+#ifndef __XEN_PUBLIC_SYSCTL_H__
+#define __XEN_PUBLIC_SYSCTL_H__
+
+#if !defined(__XEN__) && !defined(__XEN_TOOLS__)
+#error "sysctl operations are intended for use by node control tools only"
+#endif
+
+#include "xen.h"
+#include "domctl.h"
+
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000006
+
+/*
+ * Read console content from Xen buffer ring.
+ */
+#define XEN_SYSCTL_readconsole 1
+struct xen_sysctl_readconsole {
+ /* IN: Non-zero -> clear after reading. */
+ uint8_t clear;
+ /* IN: Non-zero -> start index specified by @index field. */
+ uint8_t incremental;
+ uint8_t pad0, pad1;
+ /*
+ * IN: Start index for consuming from ring buffer (if @incremental);
+ * OUT: End index after consuming from ring buffer.
+ */
+ uint32_t index;
+ /* IN: Virtual address to write console data. */
+ XEN_GUEST_HANDLE_64(char) buffer;
+ /* IN: Size of buffer; OUT: Bytes written to buffer. */
+ uint32_t count;
+};
+typedef struct xen_sysctl_readconsole xen_sysctl_readconsole_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_readconsole_t);
+
+/* Get trace buffers machine base address */
+#define XEN_SYSCTL_tbuf_op 2
+struct xen_sysctl_tbuf_op {
+ /* IN variables */
+#define XEN_SYSCTL_TBUFOP_get_info 0
+#define XEN_SYSCTL_TBUFOP_set_cpu_mask 1
+#define XEN_SYSCTL_TBUFOP_set_evt_mask 2
+#define XEN_SYSCTL_TBUFOP_set_size 3
+#define XEN_SYSCTL_TBUFOP_enable 4
+#define XEN_SYSCTL_TBUFOP_disable 5
+ uint32_t cmd;
+ /* IN/OUT variables */
+ struct xenctl_cpumap cpu_mask;
+ uint32_t evt_mask;
+ /* OUT variables */
+ uint64_aligned_t buffer_mfn;
+ uint32_t size;
+};
+typedef struct xen_sysctl_tbuf_op xen_sysctl_tbuf_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tbuf_op_t);
+
+/*
+ * Get physical information about the host machine
+ */
+#define XEN_SYSCTL_physinfo 3
+ /* (x86) The platform supports HVM guests. */
+#define _XEN_SYSCTL_PHYSCAP_hvm 0
+#define XEN_SYSCTL_PHYSCAP_hvm (1u<<_XEN_SYSCTL_PHYSCAP_hvm)
+ /* (x86) The platform supports HVM-guest direct access to I/O devices. */
+#define _XEN_SYSCTL_PHYSCAP_hvm_directio 1
+#define XEN_SYSCTL_PHYSCAP_hvm_directio (1u<<_XEN_SYSCTL_PHYSCAP_hvm_directio)
+struct xen_sysctl_physinfo {
+ uint32_t threads_per_core;
+ uint32_t cores_per_socket;
+ uint32_t nr_cpus;
+ uint32_t nr_nodes;
+ uint32_t cpu_khz;
+ uint64_aligned_t total_pages;
+ uint64_aligned_t free_pages;
+ uint64_aligned_t scrub_pages;
+ uint32_t hw_cap[8];
+
+ /*
+ * IN: maximum addressable entry in the caller-provided cpu_to_node array.
+ * OUT: largest cpu identifier in the system.
+ * If OUT is greater than IN then the cpu_to_node array is truncated!
+ */
+ uint32_t max_cpu_id;
+ /*
+ * If not NULL, this array is filled with node identifier for each cpu.
+ * If a cpu has no node information (e.g., cpu not present) then the
+ * sentinel value ~0u is written.
+ * The size of this array is specified by the caller in @max_cpu_id.
+ * If the actual @max_cpu_id is smaller than the array then the trailing
+ * elements of the array will not be written by the sysctl.
+ */
+ XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
+
+ /* XEN_SYSCTL_PHYSCAP_??? */
+ uint32_t capabilities;
+};
+typedef struct xen_sysctl_physinfo xen_sysctl_physinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_physinfo_t);
+
+/*
+ * Get the ID of the current scheduler.
+ */
+#define XEN_SYSCTL_sched_id 4
+struct xen_sysctl_sched_id {
+ /* OUT variable */
+ uint32_t sched_id;
+};
+typedef struct xen_sysctl_sched_id xen_sysctl_sched_id_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_sched_id_t);
+
+/* Interface for controlling Xen software performance counters. */
+#define XEN_SYSCTL_perfc_op 5
+/* Sub-operations: */
+#define XEN_SYSCTL_PERFCOP_reset 1 /* Reset all counters to zero. */
+#define XEN_SYSCTL_PERFCOP_query 2 /* Get perfctr information. */
+struct xen_sysctl_perfc_desc {
+ char name[80]; /* name of perf counter */
+ uint32_t nr_vals; /* number of values for this counter */
+};
+typedef struct xen_sysctl_perfc_desc xen_sysctl_perfc_desc_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_perfc_desc_t);
+typedef uint32_t xen_sysctl_perfc_val_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_perfc_val_t);
+
+struct xen_sysctl_perfc_op {
+ /* IN variables. */
+ uint32_t cmd; /* XEN_SYSCTL_PERFCOP_??? */
+ /* OUT variables. */
+ uint32_t nr_counters; /* number of counters description */
+ uint32_t nr_vals; /* number of values */
+ /* counter information (or NULL) */
+ XEN_GUEST_HANDLE_64(xen_sysctl_perfc_desc_t) desc;
+ /* counter values (or NULL) */
+ XEN_GUEST_HANDLE_64(xen_sysctl_perfc_val_t) val;
+};
+typedef struct xen_sysctl_perfc_op xen_sysctl_perfc_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_perfc_op_t);
+
+#define XEN_SYSCTL_getdomaininfolist 6
+struct xen_sysctl_getdomaininfolist {
+ /* IN variables. */
+ domid_t first_domain;
+ uint32_t max_domains;
+ XEN_GUEST_HANDLE_64(xen_domctl_getdomaininfo_t) buffer;
+ /* OUT variables. */
+ uint32_t num_domains;
+};
+typedef struct xen_sysctl_getdomaininfolist xen_sysctl_getdomaininfolist_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_getdomaininfolist_t);
+
+/* Inject debug keys into Xen. */
+#define XEN_SYSCTL_debug_keys 7
+struct xen_sysctl_debug_keys {
+ /* IN variables. */
+ XEN_GUEST_HANDLE_64(char) keys;
+ uint32_t nr_keys;
+};
+typedef struct xen_sysctl_debug_keys xen_sysctl_debug_keys_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_debug_keys_t);
+
+/* Get physical CPU information. */
+#define XEN_SYSCTL_getcpuinfo 8
+struct xen_sysctl_cpuinfo {
+ uint64_aligned_t idletime;
+};
+typedef struct xen_sysctl_cpuinfo xen_sysctl_cpuinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpuinfo_t);
+struct xen_sysctl_getcpuinfo {
+ /* IN variables. */
+ uint32_t max_cpus;
+ XEN_GUEST_HANDLE_64(xen_sysctl_cpuinfo_t) info;
+ /* OUT variables. */
+ uint32_t nr_cpus;
+};
+typedef struct xen_sysctl_getcpuinfo xen_sysctl_getcpuinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_getcpuinfo_t);
+
+#define XEN_SYSCTL_availheap 9
+struct xen_sysctl_availheap {
+ /* IN variables. */
+ uint32_t min_bitwidth; /* Smallest address width (zero if don't care). */
+ uint32_t max_bitwidth; /* Largest address width (zero if don't care). */
+ int32_t node; /* NUMA node of interest (-1 for all nodes). */
+ /* OUT variables. */
+ uint64_aligned_t avail_bytes;/* Bytes available in the specified region. */
+};
+typedef struct xen_sysctl_availheap xen_sysctl_availheap_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_availheap_t);
+
+#define XEN_SYSCTL_get_pmstat 10
+struct pm_px_val {
+ uint64_aligned_t freq; /* Px core frequency */
+ uint64_aligned_t residency; /* Px residency time */
+ uint64_aligned_t count; /* Px transition count */
+};
+typedef struct pm_px_val pm_px_val_t;
+DEFINE_XEN_GUEST_HANDLE(pm_px_val_t);
+
+struct pm_px_stat {
+ uint8_t total; /* total Px states */
+ uint8_t usable; /* usable Px states */
+ uint8_t last; /* last Px state */
+ uint8_t cur; /* current Px state */
+ XEN_GUEST_HANDLE_64(uint64) trans_pt; /* Px transition table */
+ XEN_GUEST_HANDLE_64(pm_px_val_t) pt;
+};
+typedef struct pm_px_stat pm_px_stat_t;
+DEFINE_XEN_GUEST_HANDLE(pm_px_stat_t);
+
+struct pm_cx_stat {
+ uint32_t nr; /* entry nr in triggers & residencies, including C0 */
+ uint32_t last; /* last Cx state */
+ uint64_aligned_t idle_time; /* idle time from boot */
+ XEN_GUEST_HANDLE_64(uint64) triggers; /* Cx trigger counts */
+ XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
+};
+
+struct xen_sysctl_get_pmstat {
+#define PMSTAT_CATEGORY_MASK 0xf0
+#define PMSTAT_PX 0x10
+#define PMSTAT_CX 0x20
+#define PMSTAT_get_max_px (PMSTAT_PX | 0x1)
+#define PMSTAT_get_pxstat (PMSTAT_PX | 0x2)
+#define PMSTAT_reset_pxstat (PMSTAT_PX | 0x3)
+#define PMSTAT_get_max_cx (PMSTAT_CX | 0x1)
+#define PMSTAT_get_cxstat (PMSTAT_CX | 0x2)
+#define PMSTAT_reset_cxstat (PMSTAT_CX | 0x3)
+ uint32_t type;
+ uint32_t cpuid;
+ union {
+ struct pm_px_stat getpx;
+ struct pm_cx_stat getcx;
+ /* other struct for tx, etc */
+ } u;
+};
+typedef struct xen_sysctl_get_pmstat xen_sysctl_get_pmstat_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_get_pmstat_t);
+
+#define XEN_SYSCTL_cpu_hotplug 11
+struct xen_sysctl_cpu_hotplug {
+ /* IN variables */
+ uint32_t cpu; /* Physical cpu. */
+#define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0
+#define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1
+ uint32_t op; /* hotplug opcode */
+};
+typedef struct xen_sysctl_cpu_hotplug xen_sysctl_cpu_hotplug_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpu_hotplug_t);
+
+
+struct xen_sysctl {
+ uint32_t cmd;
+ uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
+ union {
+ struct xen_sysctl_readconsole readconsole;
+ struct xen_sysctl_tbuf_op tbuf_op;
+ struct xen_sysctl_physinfo physinfo;
+ struct xen_sysctl_sched_id sched_id;
+ struct xen_sysctl_perfc_op perfc_op;
+ struct xen_sysctl_getdomaininfolist getdomaininfolist;
+ struct xen_sysctl_debug_keys debug_keys;
+ struct xen_sysctl_getcpuinfo getcpuinfo;
+ struct xen_sysctl_availheap availheap;
+ struct xen_sysctl_get_pmstat get_pmstat;
+ struct xen_sysctl_cpu_hotplug cpu_hotplug;
+ uint8_t pad[128];
+ } u;
+};
+typedef struct xen_sysctl xen_sysctl_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_t);
+
+#endif /* __XEN_PUBLIC_SYSCTL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/trace.h b/xen/public/trace.h
new file mode 100644
index 0000000..0fc864d
--- /dev/null
+++ b/xen/public/trace.h
@@ -0,0 +1,206 @@
+/******************************************************************************
+ * include/public/trace.h
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Mark Williamson, (C) 2004 Intel Research Cambridge
+ * Copyright (C) 2005 Bin Ren
+ */
+
+#ifndef __XEN_PUBLIC_TRACE_H__
+#define __XEN_PUBLIC_TRACE_H__
+
+#define TRACE_EXTRA_MAX 7
+#define TRACE_EXTRA_SHIFT 28
+
+/* Trace classes */
+#define TRC_CLS_SHIFT 16
+#define TRC_GEN 0x0001f000 /* General trace */
+#define TRC_SCHED 0x0002f000 /* Xen Scheduler trace */
+#define TRC_DOM0OP 0x0004f000 /* Xen DOM0 operation trace */
+#define TRC_HVM 0x0008f000 /* Xen HVM trace */
+#define TRC_MEM 0x0010f000 /* Xen memory trace */
+#define TRC_PV 0x0020f000 /* Xen PV traces */
+#define TRC_SHADOW 0x0040f000 /* Xen shadow tracing */
+#define TRC_PM 0x0080f000 /* Xen power management trace */
+#define TRC_ALL 0x0ffff000
+#define TRC_HD_TO_EVENT(x) ((x)&0x0fffffff)
+#define TRC_HD_CYCLE_FLAG (1UL<<31)
+#define TRC_HD_INCLUDES_CYCLE_COUNT(x) ( !!( (x) & TRC_HD_CYCLE_FLAG ) )
+#define TRC_HD_EXTRA(x) (((x)>>TRACE_EXTRA_SHIFT)&TRACE_EXTRA_MAX)
+
+/* Trace subclasses */
+#define TRC_SUBCLS_SHIFT 12
+
+/* trace subclasses for SVM */
+#define TRC_HVM_ENTRYEXIT 0x00081000 /* VMENTRY and #VMEXIT */
+#define TRC_HVM_HANDLER 0x00082000 /* various HVM handlers */
+
+#define TRC_SCHED_MIN 0x00021000 /* Just runstate changes */
+#define TRC_SCHED_VERBOSE 0x00028000 /* More inclusive scheduling */
+
+/* Trace events per class */
+#define TRC_LOST_RECORDS (TRC_GEN + 1)
+#define TRC_TRACE_WRAP_BUFFER (TRC_GEN + 2)
+#define TRC_TRACE_CPU_CHANGE (TRC_GEN + 3)
+
+#define TRC_SCHED_RUNSTATE_CHANGE (TRC_SCHED_MIN + 1)
+#define TRC_SCHED_DOM_ADD (TRC_SCHED_VERBOSE + 1)
+#define TRC_SCHED_DOM_REM (TRC_SCHED_VERBOSE + 2)
+#define TRC_SCHED_SLEEP (TRC_SCHED_VERBOSE + 3)
+#define TRC_SCHED_WAKE (TRC_SCHED_VERBOSE + 4)
+#define TRC_SCHED_YIELD (TRC_SCHED_VERBOSE + 5)
+#define TRC_SCHED_BLOCK (TRC_SCHED_VERBOSE + 6)
+#define TRC_SCHED_SHUTDOWN (TRC_SCHED_VERBOSE + 7)
+#define TRC_SCHED_CTL (TRC_SCHED_VERBOSE + 8)
+#define TRC_SCHED_ADJDOM (TRC_SCHED_VERBOSE + 9)
+#define TRC_SCHED_SWITCH (TRC_SCHED_VERBOSE + 10)
+#define TRC_SCHED_S_TIMER_FN (TRC_SCHED_VERBOSE + 11)
+#define TRC_SCHED_T_TIMER_FN (TRC_SCHED_VERBOSE + 12)
+#define TRC_SCHED_DOM_TIMER_FN (TRC_SCHED_VERBOSE + 13)
+#define TRC_SCHED_SWITCH_INFPREV (TRC_SCHED_VERBOSE + 14)
+#define TRC_SCHED_SWITCH_INFNEXT (TRC_SCHED_VERBOSE + 15)
+
+#define TRC_MEM_PAGE_GRANT_MAP (TRC_MEM + 1)
+#define TRC_MEM_PAGE_GRANT_UNMAP (TRC_MEM + 2)
+#define TRC_MEM_PAGE_GRANT_TRANSFER (TRC_MEM + 3)
+
+#define TRC_PV_HYPERCALL (TRC_PV + 1)
+#define TRC_PV_TRAP (TRC_PV + 3)
+#define TRC_PV_PAGE_FAULT (TRC_PV + 4)
+#define TRC_PV_FORCED_INVALID_OP (TRC_PV + 5)
+#define TRC_PV_EMULATE_PRIVOP (TRC_PV + 6)
+#define TRC_PV_EMULATE_4GB (TRC_PV + 7)
+#define TRC_PV_MATH_STATE_RESTORE (TRC_PV + 8)
+#define TRC_PV_PAGING_FIXUP (TRC_PV + 9)
+#define TRC_PV_GDT_LDT_MAPPING_FAULT (TRC_PV + 10)
+#define TRC_PV_PTWR_EMULATION (TRC_PV + 11)
+#define TRC_PV_PTWR_EMULATION_PAE (TRC_PV + 12)
+ /* Indicates that addresses in trace record are 64 bits */
+#define TRC_64_FLAG (0x100)
+
+#define TRC_SHADOW_NOT_SHADOW (TRC_SHADOW + 1)
+#define TRC_SHADOW_FAST_PROPAGATE (TRC_SHADOW + 2)
+#define TRC_SHADOW_FAST_MMIO (TRC_SHADOW + 3)
+#define TRC_SHADOW_FALSE_FAST_PATH (TRC_SHADOW + 4)
+#define TRC_SHADOW_MMIO (TRC_SHADOW + 5)
+#define TRC_SHADOW_FIXUP (TRC_SHADOW + 6)
+#define TRC_SHADOW_DOMF_DYING (TRC_SHADOW + 7)
+#define TRC_SHADOW_EMULATE (TRC_SHADOW + 8)
+#define TRC_SHADOW_EMULATE_UNSHADOW_USER (TRC_SHADOW + 9)
+#define TRC_SHADOW_EMULATE_UNSHADOW_EVTINJ (TRC_SHADOW + 10)
+#define TRC_SHADOW_EMULATE_UNSHADOW_UNHANDLED (TRC_SHADOW + 11)
+#define TRC_SHADOW_WRMAP_BF (TRC_SHADOW + 12)
+#define TRC_SHADOW_PREALLOC_UNPIN (TRC_SHADOW + 13)
+#define TRC_SHADOW_RESYNC_FULL (TRC_SHADOW + 14)
+#define TRC_SHADOW_RESYNC_ONLY (TRC_SHADOW + 15)
+
+/* trace events per subclass */
+#define TRC_HVM_VMENTRY (TRC_HVM_ENTRYEXIT + 0x01)
+#define TRC_HVM_VMEXIT (TRC_HVM_ENTRYEXIT + 0x02)
+#define TRC_HVM_VMEXIT64 (TRC_HVM_ENTRYEXIT + TRC_64_FLAG + 0x02)
+#define TRC_HVM_PF_XEN (TRC_HVM_HANDLER + 0x01)
+#define TRC_HVM_PF_XEN64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x01)
+#define TRC_HVM_PF_INJECT (TRC_HVM_HANDLER + 0x02)
+#define TRC_HVM_PF_INJECT64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x02)
+#define TRC_HVM_INJ_EXC (TRC_HVM_HANDLER + 0x03)
+#define TRC_HVM_INJ_VIRQ (TRC_HVM_HANDLER + 0x04)
+#define TRC_HVM_REINJ_VIRQ (TRC_HVM_HANDLER + 0x05)
+#define TRC_HVM_IO_READ (TRC_HVM_HANDLER + 0x06)
+#define TRC_HVM_IO_WRITE (TRC_HVM_HANDLER + 0x07)
+#define TRC_HVM_CR_READ (TRC_HVM_HANDLER + 0x08)
+#define TRC_HVM_CR_READ64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x08)
+#define TRC_HVM_CR_WRITE (TRC_HVM_HANDLER + 0x09)
+#define TRC_HVM_CR_WRITE64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x09)
+#define TRC_HVM_DR_READ (TRC_HVM_HANDLER + 0x0A)
+#define TRC_HVM_DR_WRITE (TRC_HVM_HANDLER + 0x0B)
+#define TRC_HVM_MSR_READ (TRC_HVM_HANDLER + 0x0C)
+#define TRC_HVM_MSR_WRITE (TRC_HVM_HANDLER + 0x0D)
+#define TRC_HVM_CPUID (TRC_HVM_HANDLER + 0x0E)
+#define TRC_HVM_INTR (TRC_HVM_HANDLER + 0x0F)
+#define TRC_HVM_NMI (TRC_HVM_HANDLER + 0x10)
+#define TRC_HVM_SMI (TRC_HVM_HANDLER + 0x11)
+#define TRC_HVM_VMMCALL (TRC_HVM_HANDLER + 0x12)
+#define TRC_HVM_HLT (TRC_HVM_HANDLER + 0x13)
+#define TRC_HVM_INVLPG (TRC_HVM_HANDLER + 0x14)
+#define TRC_HVM_INVLPG64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x14)
+#define TRC_HVM_MCE (TRC_HVM_HANDLER + 0x15)
+#define TRC_HVM_IO_ASSIST (TRC_HVM_HANDLER + 0x16)
+#define TRC_HVM_IO_ASSIST64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x16)
+#define TRC_HVM_MMIO_ASSIST (TRC_HVM_HANDLER + 0x17)
+#define TRC_HVM_MMIO_ASSIST64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x17)
+#define TRC_HVM_CLTS (TRC_HVM_HANDLER + 0x18)
+#define TRC_HVM_LMSW (TRC_HVM_HANDLER + 0x19)
+#define TRC_HVM_LMSW64 (TRC_HVM_HANDLER + TRC_64_FLAG + 0x19)
+
+/* trace subclasses for power management */
+#define TRC_PM_FREQ 0x00801000 /* xen cpu freq events */
+#define TRC_PM_IDLE 0x00802000 /* xen cpu idle events */
+
+/* trace events for per class */
+#define TRC_PM_FREQ_CHANGE (TRC_PM_FREQ + 0x01)
+#define TRC_PM_IDLE_ENTRY (TRC_PM_IDLE + 0x01)
+#define TRC_PM_IDLE_EXIT (TRC_PM_IDLE + 0x02)
+
+/* This structure represents a single trace buffer record. */
+struct t_rec {
+ uint32_t event:28;
+ uint32_t extra_u32:3; /* # entries in trailing extra_u32[] array */
+ uint32_t cycles_included:1; /* u.cycles or u.no_cycles? */
+ union {
+ struct {
+ uint32_t cycles_lo, cycles_hi; /* cycle counter timestamp */
+ uint32_t extra_u32[7]; /* event data items */
+ } cycles;
+ struct {
+ uint32_t extra_u32[7]; /* event data items */
+ } nocycles;
+ } u;
+};
+
+/*
+ * This structure contains the metadata for a single trace buffer. The head
+ * field, indexes into an array of struct t_rec's.
+ */
+struct t_buf {
+ /* Assume the data buffer size is X. X is generally not a power of 2.
+ * CONS and PROD are incremented modulo (2*X):
+ * 0 <= cons < 2*X
+ * 0 <= prod < 2*X
+ * This is done because addition modulo X breaks at 2^32 when X is not a
+ * power of 2:
+ * (((2^32 - 1) % X) + 1) % X != (2^32) % X
+ */
+ uint32_t cons; /* Offset of next item to be consumed by control tools. */
+ uint32_t prod; /* Offset of next item to be produced by Xen. */
+ /* Records follow immediately after the meta-data header. */
+};
+
+#endif /* __XEN_PUBLIC_TRACE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+ */
diff --git a/xen/public/vcpu.h b/xen/public/vcpu.h
new file mode 100644
index 0000000..ab65493
--- /dev/null
+++ b/xen/public/vcpu.h
@@ -0,0 +1,213 @@
+/******************************************************************************
+ * vcpu.h
+ *
+ * VCPU initialisation, query, and hotplug.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, Keir Fraser <keir@xensource.com>
+ */
+
+#ifndef __XEN_PUBLIC_VCPU_H__
+#define __XEN_PUBLIC_VCPU_H__
+
+/*
+ * Prototype for this hypercall is:
+ * int vcpu_op(int cmd, int vcpuid, void *extra_args)
+ * @cmd == VCPUOP_??? (VCPU operation).
+ * @vcpuid == VCPU to operate on.
+ * @extra_args == Operation-specific extra arguments (NULL if none).
+ */
+
+/*
+ * Initialise a VCPU. Each VCPU can be initialised only once. A
+ * newly-initialised VCPU will not run until it is brought up by VCPUOP_up.
+ *
+ * @extra_arg == pointer to vcpu_guest_context structure containing initial
+ * state for the VCPU.
+ */
+#define VCPUOP_initialise 0
+
+/*
+ * Bring up a VCPU. This makes the VCPU runnable. This operation will fail
+ * if the VCPU has not been initialised (VCPUOP_initialise).
+ */
+#define VCPUOP_up 1
+
+/*
+ * Bring down a VCPU (i.e., make it non-runnable).
+ * There are a few caveats that callers should observe:
+ * 1. This operation may return, and VCPU_is_up may return false, before the
+ * VCPU stops running (i.e., the command is asynchronous). It is a good
+ * idea to ensure that the VCPU has entered a non-critical loop before
+ * bringing it down. Alternatively, this operation is guaranteed
+ * synchronous if invoked by the VCPU itself.
+ * 2. After a VCPU is initialised, there is currently no way to drop all its
+ * references to domain memory. Even a VCPU that is down still holds
+ * memory references via its pagetable base pointer and GDT. It is good
+ * practise to move a VCPU onto an 'idle' or default page table, LDT and
+ * GDT before bringing it down.
+ */
+#define VCPUOP_down 2
+
+/* Returns 1 if the given VCPU is up. */
+#define VCPUOP_is_up 3
+
+/*
+ * Return information about the state and running time of a VCPU.
+ * @extra_arg == pointer to vcpu_runstate_info structure.
+ */
+#define VCPUOP_get_runstate_info 4
+struct vcpu_runstate_info {
+ /* VCPU's current state (RUNSTATE_*). */
+ int state;
+ /* When was current state entered (system time, ns)? */
+ uint64_t state_entry_time;
+ /*
+ * Time spent in each RUNSTATE_* (ns). The sum of these times is
+ * guaranteed not to drift from system time.
+ */
+ uint64_t time[4];
+};
+typedef struct vcpu_runstate_info vcpu_runstate_info_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_t);
+
+/* VCPU is currently running on a physical CPU. */
+#define RUNSTATE_running 0
+
+/* VCPU is runnable, but not currently scheduled on any physical CPU. */
+#define RUNSTATE_runnable 1
+
+/* VCPU is blocked (a.k.a. idle). It is therefore not runnable. */
+#define RUNSTATE_blocked 2
+
+/*
+ * VCPU is not runnable, but it is not blocked.
+ * This is a 'catch all' state for things like hotplug and pauses by the
+ * system administrator (or for critical sections in the hypervisor).
+ * RUNSTATE_blocked dominates this state (it is the preferred state).
+ */
+#define RUNSTATE_offline 3
+
+/*
+ * Register a shared memory area from which the guest may obtain its own
+ * runstate information without needing to execute a hypercall.
+ * Notes:
+ * 1. The registered address may be virtual or physical or guest handle,
+ * depending on the platform. Virtual address or guest handle should be
+ * registered on x86 systems.
+ * 2. Only one shared area may be registered per VCPU. The shared area is
+ * updated by the hypervisor each time the VCPU is scheduled. Thus
+ * runstate.state will always be RUNSTATE_running and
+ * runstate.state_entry_time will indicate the system time at which the
+ * VCPU was last scheduled to run.
+ * @extra_arg == pointer to vcpu_register_runstate_memory_area structure.
+ */
+#define VCPUOP_register_runstate_memory_area 5
+struct vcpu_register_runstate_memory_area {
+ union {
+ XEN_GUEST_HANDLE(vcpu_runstate_info_t) h;
+ struct vcpu_runstate_info *v;
+ uint64_t p;
+ } addr;
+};
+typedef struct vcpu_register_runstate_memory_area vcpu_register_runstate_memory_area_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_register_runstate_memory_area_t);
+
+/*
+ * Set or stop a VCPU's periodic timer. Every VCPU has one periodic timer
+ * which can be set via these commands. Periods smaller than one millisecond
+ * may not be supported.
+ */
+#define VCPUOP_set_periodic_timer 6 /* arg == vcpu_set_periodic_timer_t */
+#define VCPUOP_stop_periodic_timer 7 /* arg == NULL */
+struct vcpu_set_periodic_timer {
+ uint64_t period_ns;
+};
+typedef struct vcpu_set_periodic_timer vcpu_set_periodic_timer_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_set_periodic_timer_t);
+
+/*
+ * Set or stop a VCPU's single-shot timer. Every VCPU has one single-shot
+ * timer which can be set via these commands.
+ */
+#define VCPUOP_set_singleshot_timer 8 /* arg == vcpu_set_singleshot_timer_t */
+#define VCPUOP_stop_singleshot_timer 9 /* arg == NULL */
+struct vcpu_set_singleshot_timer {
+ uint64_t timeout_abs_ns; /* Absolute system time value in nanoseconds. */
+ uint32_t flags; /* VCPU_SSHOTTMR_??? */
+};
+typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
+
+/* Flags to VCPUOP_set_singleshot_timer. */
+ /* Require the timeout to be in the future (return -ETIME if it's passed). */
+#define _VCPU_SSHOTTMR_future (0)
+#define VCPU_SSHOTTMR_future (1U << _VCPU_SSHOTTMR_future)
+
+/*
+ * Register a memory location in the guest address space for the
+ * vcpu_info structure. This allows the guest to place the vcpu_info
+ * structure in a convenient place, such as in a per-cpu data area.
+ * The pointer need not be page aligned, but the structure must not
+ * cross a page boundary.
+ *
+ * This may be called only once per vcpu.
+ */
+#define VCPUOP_register_vcpu_info 10 /* arg == vcpu_register_vcpu_info_t */
+struct vcpu_register_vcpu_info {
+ uint64_t mfn; /* mfn of page to place vcpu_info */
+ uint32_t offset; /* offset within page */
+ uint32_t rsvd; /* unused */
+};
+typedef struct vcpu_register_vcpu_info vcpu_register_vcpu_info_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_register_vcpu_info_t);
+
+/* Send an NMI to the specified VCPU. @extra_arg == NULL. */
+#define VCPUOP_send_nmi 11
+
+/*
+ * Get the physical ID information for a pinned vcpu's underlying physical
+ * processor. The physical ID informmation is architecture-specific.
+ * On x86: id[31:0]=apic_id, id[63:32]=acpi_id, and all values 0xff and
+ * greater are reserved.
+ * This command returns -EINVAL if it is not a valid operation for this VCPU.
+ */
+#define VCPUOP_get_physid 12 /* arg == vcpu_get_physid_t */
+struct vcpu_get_physid {
+ uint64_t phys_id;
+};
+typedef struct vcpu_get_physid vcpu_get_physid_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_get_physid_t);
+#define xen_vcpu_physid_to_x86_apicid(physid) \
+ ((((uint32_t)(physid)) >= 0xff) ? 0xff : ((uint8_t)(physid)))
+#define xen_vcpu_physid_to_x86_acpiid(physid) \
+ ((((uint32_t)((physid)>>32)) >= 0xff) ? 0xff : ((uint8_t)((physid)>>32)))
+
+#endif /* __XEN_PUBLIC_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/version.h b/xen/public/version.h
new file mode 100644
index 0000000..944ca62
--- /dev/null
+++ b/xen/public/version.h
@@ -0,0 +1,91 @@
+/******************************************************************************
+ * version.h
+ *
+ * Xen version, type, and compile information.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2005, Nguyen Anh Quynh <aquynh@gmail.com>
+ * Copyright (c) 2005, Keir Fraser <keir@xensource.com>
+ */
+
+#ifndef __XEN_PUBLIC_VERSION_H__
+#define __XEN_PUBLIC_VERSION_H__
+
+/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+
+/* arg == NULL; returns major:minor (16:16). */
+#define XENVER_version 0
+
+/* arg == xen_extraversion_t. */
+#define XENVER_extraversion 1
+typedef char xen_extraversion_t[16];
+#define XEN_EXTRAVERSION_LEN (sizeof(xen_extraversion_t))
+
+/* arg == xen_compile_info_t. */
+#define XENVER_compile_info 2
+struct xen_compile_info {
+ char compiler[64];
+ char compile_by[16];
+ char compile_domain[32];
+ char compile_date[32];
+};
+typedef struct xen_compile_info xen_compile_info_t;
+
+#define XENVER_capabilities 3
+typedef char xen_capabilities_info_t[1024];
+#define XEN_CAPABILITIES_INFO_LEN (sizeof(xen_capabilities_info_t))
+
+#define XENVER_changeset 4
+typedef char xen_changeset_info_t[64];
+#define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
+
+#define XENVER_platform_parameters 5
+struct xen_platform_parameters {
+ unsigned long virt_start;
+};
+typedef struct xen_platform_parameters xen_platform_parameters_t;
+
+#define XENVER_get_features 6
+struct xen_feature_info {
+ unsigned int submap_idx; /* IN: which 32-bit submap to return */
+ uint32_t submap; /* OUT: 32-bit submap */
+};
+typedef struct xen_feature_info xen_feature_info_t;
+
+/* Declares the features reported by XENVER_get_features. */
+#include "features.h"
+
+/* arg == NULL; returns host memory page size. */
+#define XENVER_pagesize 7
+
+/* arg == xen_domain_handle_t. */
+#define XENVER_guest_handle 8
+
+#endif /* __XEN_PUBLIC_VERSION_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/xen-compat.h b/xen/public/xen-compat.h
new file mode 100644
index 0000000..329be07
--- /dev/null
+++ b/xen/public/xen-compat.h
@@ -0,0 +1,44 @@
+/******************************************************************************
+ * xen-compat.h
+ *
+ * Guest OS interface to Xen. Compatibility layer.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2006, Christian Limpach
+ */
+
+#ifndef __XEN_PUBLIC_XEN_COMPAT_H__
+#define __XEN_PUBLIC_XEN_COMPAT_H__
+
+#define __XEN_LATEST_INTERFACE_VERSION__ 0x00030209
+
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+/* Xen is built with matching headers and implements the latest interface. */
+#define __XEN_INTERFACE_VERSION__ __XEN_LATEST_INTERFACE_VERSION__
+#elif !defined(__XEN_INTERFACE_VERSION__)
+/* Guests which do not specify a version get the legacy interface. */
+#define __XEN_INTERFACE_VERSION__ 0x00000000
+#endif
+
+#if __XEN_INTERFACE_VERSION__ > __XEN_LATEST_INTERFACE_VERSION__
+#error "These header files do not support the requested interface version."
+#endif
+
+#endif /* __XEN_PUBLIC_XEN_COMPAT_H__ */
diff --git a/xen/public/xen.h b/xen/public/xen.h
new file mode 100644
index 0000000..084bb90
--- /dev/null
+++ b/xen/public/xen.h
@@ -0,0 +1,657 @@
+/******************************************************************************
+ * xen.h
+ *
+ * Guest OS interface to Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004, K A Fraser
+ */
+
+#ifndef __XEN_PUBLIC_XEN_H__
+#define __XEN_PUBLIC_XEN_H__
+
+#include <sys/types.h>
+
+#include "xen-compat.h"
+
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/xen.h"
+#elif defined(__ia64__)
+#include "arch-ia64.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#ifndef __ASSEMBLY__
+/* Guest handles for primitive C types. */
+DEFINE_XEN_GUEST_HANDLE(char);
+__DEFINE_XEN_GUEST_HANDLE(uchar, unsigned char);
+DEFINE_XEN_GUEST_HANDLE(int);
+__DEFINE_XEN_GUEST_HANDLE(uint, unsigned int);
+DEFINE_XEN_GUEST_HANDLE(long);
+__DEFINE_XEN_GUEST_HANDLE(ulong, unsigned long);
+DEFINE_XEN_GUEST_HANDLE(void);
+
+DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
+#endif
+
+/*
+ * HYPERCALLS
+ */
+
+#define __HYPERVISOR_set_trap_table 0
+#define __HYPERVISOR_mmu_update 1
+#define __HYPERVISOR_set_gdt 2
+#define __HYPERVISOR_stack_switch 3
+#define __HYPERVISOR_set_callbacks 4
+#define __HYPERVISOR_fpu_taskswitch 5
+#define __HYPERVISOR_sched_op_compat 6 /* compat since 0x00030101 */
+#define __HYPERVISOR_platform_op 7
+#define __HYPERVISOR_set_debugreg 8
+#define __HYPERVISOR_get_debugreg 9
+#define __HYPERVISOR_update_descriptor 10
+#define __HYPERVISOR_memory_op 12
+#define __HYPERVISOR_multicall 13
+#define __HYPERVISOR_update_va_mapping 14
+#define __HYPERVISOR_set_timer_op 15
+#define __HYPERVISOR_event_channel_op_compat 16 /* compat since 0x00030202 */
+#define __HYPERVISOR_xen_version 17
+#define __HYPERVISOR_console_io 18
+#define __HYPERVISOR_physdev_op_compat 19 /* compat since 0x00030202 */
+#define __HYPERVISOR_grant_table_op 20
+#define __HYPERVISOR_vm_assist 21
+#define __HYPERVISOR_update_va_mapping_otherdomain 22
+#define __HYPERVISOR_iret 23 /* x86 only */
+#define __HYPERVISOR_vcpu_op 24
+#define __HYPERVISOR_set_segment_base 25 /* x86/64 only */
+#define __HYPERVISOR_mmuext_op 26
+#define __HYPERVISOR_xsm_op 27
+#define __HYPERVISOR_nmi_op 28
+#define __HYPERVISOR_sched_op 29
+#define __HYPERVISOR_callback_op 30
+#define __HYPERVISOR_xenoprof_op 31
+#define __HYPERVISOR_event_channel_op 32
+#define __HYPERVISOR_physdev_op 33
+#define __HYPERVISOR_hvm_op 34
+#define __HYPERVISOR_sysctl 35
+#define __HYPERVISOR_domctl 36
+#define __HYPERVISOR_kexec_op 37
+
+/* Architecture-specific hypercall definitions. */
+#define __HYPERVISOR_arch_0 48
+#define __HYPERVISOR_arch_1 49
+#define __HYPERVISOR_arch_2 50
+#define __HYPERVISOR_arch_3 51
+#define __HYPERVISOR_arch_4 52
+#define __HYPERVISOR_arch_5 53
+#define __HYPERVISOR_arch_6 54
+#define __HYPERVISOR_arch_7 55
+
+/*
+ * HYPERCALL COMPATIBILITY.
+ */
+
+/* New sched_op hypercall introduced in 0x00030101. */
+#if __XEN_INTERFACE_VERSION__ < 0x00030101
+#undef __HYPERVISOR_sched_op
+#define __HYPERVISOR_sched_op __HYPERVISOR_sched_op_compat
+#endif
+
+/* New event-channel and physdev hypercalls introduced in 0x00030202. */
+#if __XEN_INTERFACE_VERSION__ < 0x00030202
+#undef __HYPERVISOR_event_channel_op
+#define __HYPERVISOR_event_channel_op __HYPERVISOR_event_channel_op_compat
+#undef __HYPERVISOR_physdev_op
+#define __HYPERVISOR_physdev_op __HYPERVISOR_physdev_op_compat
+#endif
+
+/* New platform_op hypercall introduced in 0x00030204. */
+#if __XEN_INTERFACE_VERSION__ < 0x00030204
+#define __HYPERVISOR_dom0_op __HYPERVISOR_platform_op
+#endif
+
+/*
+ * VIRTUAL INTERRUPTS
+ *
+ * Virtual interrupts that a guest OS may receive from Xen.
+ *
+ * In the side comments, 'V.' denotes a per-VCPU VIRQ while 'G.' denotes a
+ * global VIRQ. The former can be bound once per VCPU and cannot be re-bound.
+ * The latter can be allocated only once per guest: they must initially be
+ * allocated to VCPU0 but can subsequently be re-bound.
+ */
+#define VIRQ_TIMER 0 /* V. Timebase update, and/or requested timeout. */
+#define VIRQ_DEBUG 1 /* V. Request guest to dump debug info. */
+#define VIRQ_CONSOLE 2 /* G. (DOM0) Bytes received on emergency console. */
+#define VIRQ_DOM_EXC 3 /* G. (DOM0) Exceptional event for some domain. */
+#define VIRQ_TBUF 4 /* G. (DOM0) Trace buffer has records available. */
+#define VIRQ_DEBUGGER 6 /* G. (DOM0) A domain has paused for debugging. */
+#define VIRQ_XENOPROF 7 /* V. XenOprofile interrupt: new sample available */
+#define VIRQ_CON_RING 8 /* G. (DOM0) Bytes received on console */
+
+/* Architecture-specific VIRQ definitions. */
+#define VIRQ_ARCH_0 16
+#define VIRQ_ARCH_1 17
+#define VIRQ_ARCH_2 18
+#define VIRQ_ARCH_3 19
+#define VIRQ_ARCH_4 20
+#define VIRQ_ARCH_5 21
+#define VIRQ_ARCH_6 22
+#define VIRQ_ARCH_7 23
+
+#define NR_VIRQS 24
+
+/*
+ * MMU-UPDATE REQUESTS
+ *
+ * HYPERVISOR_mmu_update() accepts a list of (ptr, val) pairs.
+ * A foreigndom (FD) can be specified (or DOMID_SELF for none).
+ * Where the FD has some effect, it is described below.
+ * ptr[1:0] specifies the appropriate MMU_* command.
+ *
+ * ptr[1:0] == MMU_NORMAL_PT_UPDATE:
+ * Updates an entry in a page table. If updating an L1 table, and the new
+ * table entry is valid/present, the mapped frame must belong to the FD, if
+ * an FD has been specified. If attempting to map an I/O page then the
+ * caller assumes the privilege of the FD.
+ * FD == DOMID_IO: Permit /only/ I/O mappings, at the priv level of the caller.
+ * FD == DOMID_XEN: Map restricted areas of Xen's heap space.
+ * ptr[:2] -- Machine address of the page-table entry to modify.
+ * val -- Value to write.
+ *
+ * ptr[1:0] == MMU_MACHPHYS_UPDATE:
+ * Updates an entry in the machine->pseudo-physical mapping table.
+ * ptr[:2] -- Machine address within the frame whose mapping to modify.
+ * The frame must belong to the FD, if one is specified.
+ * val -- Value to write into the mapping entry.
+ *
+ * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
+ * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed
+ * with those in @val.
+ */
+#define MMU_NORMAL_PT_UPDATE 0 /* checked '*ptr = val'. ptr is MA. */
+#define MMU_MACHPHYS_UPDATE 1 /* ptr = MA of frame to modify entry for */
+#define MMU_PT_UPDATE_PRESERVE_AD 2 /* atomically: *ptr = val | (*ptr&(A|D)) */
+
+/*
+ * MMU EXTENDED OPERATIONS
+ *
+ * HYPERVISOR_mmuext_op() accepts a list of mmuext_op structures.
+ * A foreigndom (FD) can be specified (or DOMID_SELF for none).
+ * Where the FD has some effect, it is described below.
+ *
+ * cmd: MMUEXT_(UN)PIN_*_TABLE
+ * mfn: Machine frame number to be (un)pinned as a p.t. page.
+ * The frame must belong to the FD, if one is specified.
+ *
+ * cmd: MMUEXT_NEW_BASEPTR
+ * mfn: Machine frame number of new page-table base to install in MMU.
+ *
+ * cmd: MMUEXT_NEW_USER_BASEPTR [x86/64 only]
+ * mfn: Machine frame number of new page-table base to install in MMU
+ * when in user space.
+ *
+ * cmd: MMUEXT_TLB_FLUSH_LOCAL
+ * No additional arguments. Flushes local TLB.
+ *
+ * cmd: MMUEXT_INVLPG_LOCAL
+ * linear_addr: Linear address to be flushed from the local TLB.
+ *
+ * cmd: MMUEXT_TLB_FLUSH_MULTI
+ * vcpumask: Pointer to bitmap of VCPUs to be flushed.
+ *
+ * cmd: MMUEXT_INVLPG_MULTI
+ * linear_addr: Linear address to be flushed.
+ * vcpumask: Pointer to bitmap of VCPUs to be flushed.
+ *
+ * cmd: MMUEXT_TLB_FLUSH_ALL
+ * No additional arguments. Flushes all VCPUs' TLBs.
+ *
+ * cmd: MMUEXT_INVLPG_ALL
+ * linear_addr: Linear address to be flushed from all VCPUs' TLBs.
+ *
+ * cmd: MMUEXT_FLUSH_CACHE
+ * No additional arguments. Writes back and flushes cache contents.
+ *
+ * cmd: MMUEXT_SET_LDT
+ * linear_addr: Linear address of LDT base (NB. must be page-aligned).
+ * nr_ents: Number of entries in LDT.
+ *
+ * cmd: MMUEXT_CLEAR_PAGE
+ * mfn: Machine frame number to be cleared.
+ *
+ * cmd: MMUEXT_COPY_PAGE
+ * mfn: Machine frame number of the destination page.
+ * src_mfn: Machine frame number of the source page.
+ */
+#define MMUEXT_PIN_L1_TABLE 0
+#define MMUEXT_PIN_L2_TABLE 1
+#define MMUEXT_PIN_L3_TABLE 2
+#define MMUEXT_PIN_L4_TABLE 3
+#define MMUEXT_UNPIN_TABLE 4
+#define MMUEXT_NEW_BASEPTR 5
+#define MMUEXT_TLB_FLUSH_LOCAL 6
+#define MMUEXT_INVLPG_LOCAL 7
+#define MMUEXT_TLB_FLUSH_MULTI 8
+#define MMUEXT_INVLPG_MULTI 9
+#define MMUEXT_TLB_FLUSH_ALL 10
+#define MMUEXT_INVLPG_ALL 11
+#define MMUEXT_FLUSH_CACHE 12
+#define MMUEXT_SET_LDT 13
+#define MMUEXT_NEW_USER_BASEPTR 15
+#define MMUEXT_CLEAR_PAGE 16
+#define MMUEXT_COPY_PAGE 17
+
+#ifndef __ASSEMBLY__
+struct mmuext_op {
+ unsigned int cmd;
+ union {
+ /* [UN]PIN_TABLE, NEW_BASEPTR, NEW_USER_BASEPTR
+ * CLEAR_PAGE, COPY_PAGE */
+ xen_pfn_t mfn;
+ /* INVLPG_LOCAL, INVLPG_ALL, SET_LDT */
+ unsigned long linear_addr;
+ } arg1;
+ union {
+ /* SET_LDT */
+ unsigned int nr_ents;
+ /* TLB_FLUSH_MULTI, INVLPG_MULTI */
+#if __XEN_INTERFACE_VERSION__ >= 0x00030205
+ XEN_GUEST_HANDLE(void) vcpumask;
+#else
+ void *vcpumask;
+#endif
+ /* COPY_PAGE */
+ xen_pfn_t src_mfn;
+ } arg2;
+};
+typedef struct mmuext_op mmuext_op_t;
+DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
+#endif
+
+/* These are passed as 'flags' to update_va_mapping. They can be ORed. */
+/* When specifying UVMF_MULTI, also OR in a pointer to a CPU bitmap. */
+/* UVMF_LOCAL is merely UVMF_MULTI with a NULL bitmap pointer. */
+#define UVMF_NONE (0UL<<0) /* No flushing at all. */
+#define UVMF_TLB_FLUSH (1UL<<0) /* Flush entire TLB(s). */
+#define UVMF_INVLPG (2UL<<0) /* Flush only one entry. */
+#define UVMF_FLUSHTYPE_MASK (3UL<<0)
+#define UVMF_MULTI (0UL<<2) /* Flush subset of TLBs. */
+#define UVMF_LOCAL (0UL<<2) /* Flush local TLB. */
+#define UVMF_ALL (1UL<<2) /* Flush all TLBs. */
+
+/*
+ * Commands to HYPERVISOR_console_io().
+ */
+#define CONSOLEIO_write 0
+#define CONSOLEIO_read 1
+
+/*
+ * Commands to HYPERVISOR_vm_assist().
+ */
+#define VMASST_CMD_enable 0
+#define VMASST_CMD_disable 1
+
+/* x86/32 guests: simulate full 4GB segment limits. */
+#define VMASST_TYPE_4gb_segments 0
+
+/* x86/32 guests: trap (vector 15) whenever above vmassist is used. */
+#define VMASST_TYPE_4gb_segments_notify 1
+
+/*
+ * x86 guests: support writes to bottom-level PTEs.
+ * NB1. Page-directory entries cannot be written.
+ * NB2. Guest must continue to remove all writable mappings of PTEs.
+ */
+#define VMASST_TYPE_writable_pagetables 2
+
+/* x86/PAE guests: support PDPTs above 4GB. */
+#define VMASST_TYPE_pae_extended_cr3 3
+
+#define MAX_VMASST_TYPE 3
+
+#ifndef __ASSEMBLY__
+
+typedef uint16_t domid_t;
+
+/* Domain ids >= DOMID_FIRST_RESERVED cannot be used for ordinary domains. */
+#define DOMID_FIRST_RESERVED (0x7FF0U)
+
+/* DOMID_SELF is used in certain contexts to refer to oneself. */
+#define DOMID_SELF (0x7FF0U)
+
+/*
+ * DOMID_IO is used to restrict page-table updates to mapping I/O memory.
+ * Although no Foreign Domain need be specified to map I/O pages, DOMID_IO
+ * is useful to ensure that no mappings to the OS's own heap are accidentally
+ * installed. (e.g., in Linux this could cause havoc as reference counts
+ * aren't adjusted on the I/O-mapping code path).
+ * This only makes sense in MMUEXT_SET_FOREIGNDOM, but in that context can
+ * be specified by any calling domain.
+ */
+#define DOMID_IO (0x7FF1U)
+
+/*
+ * DOMID_XEN is used to allow privileged domains to map restricted parts of
+ * Xen's heap space (e.g., the machine_to_phys table).
+ * This only makes sense in MMUEXT_SET_FOREIGNDOM, and is only permitted if
+ * the caller is privileged.
+ */
+#define DOMID_XEN (0x7FF2U)
+
+/*
+ * Send an array of these to HYPERVISOR_mmu_update().
+ * NB. The fields are natural pointer/address size for this architecture.
+ */
+struct mmu_update {
+ uint64_t ptr; /* Machine address of PTE. */
+ uint64_t val; /* New contents of PTE. */
+};
+typedef struct mmu_update mmu_update_t;
+DEFINE_XEN_GUEST_HANDLE(mmu_update_t);
+
+/*
+ * Send an array of these to HYPERVISOR_multicall().
+ * NB. The fields are natural register size for this architecture.
+ */
+struct multicall_entry {
+ unsigned long op, result;
+ unsigned long args[6];
+};
+typedef struct multicall_entry multicall_entry_t;
+DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
+
+/*
+ * Event channel endpoints per domain:
+ * 1024 if a long is 32 bits; 4096 if a long is 64 bits.
+ */
+#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
+
+struct vcpu_time_info {
+ /*
+ * Updates to the following values are preceded and followed by an
+ * increment of 'version'. The guest can therefore detect updates by
+ * looking for changes to 'version'. If the least-significant bit of
+ * the version number is set then an update is in progress and the guest
+ * must wait to read a consistent set of values.
+ * The correct way to interact with the version number is similar to
+ * Linux's seqlock: see the implementations of read_seqbegin/read_seqretry.
+ */
+ uint32_t version;
+ uint32_t pad0;
+ uint64_t tsc_timestamp; /* TSC at last update of time vals. */
+ uint64_t system_time; /* Time, in nanosecs, since boot. */
+ /*
+ * Current system time:
+ * system_time +
+ * ((((tsc - tsc_timestamp) << tsc_shift) * tsc_to_system_mul) >> 32)
+ * CPU frequency (Hz):
+ * ((10^9 << 32) / tsc_to_system_mul) >> tsc_shift
+ */
+ uint32_t tsc_to_system_mul;
+ int8_t tsc_shift;
+ int8_t pad1[3];
+}; /* 32 bytes */
+typedef struct vcpu_time_info vcpu_time_info_t;
+
+struct vcpu_info {
+ /*
+ * 'evtchn_upcall_pending' is written non-zero by Xen to indicate
+ * a pending notification for a particular VCPU. It is then cleared
+ * by the guest OS /before/ checking for pending work, thus avoiding
+ * a set-and-check race. Note that the mask is only accessed by Xen
+ * on the CPU that is currently hosting the VCPU. This means that the
+ * pending and mask flags can be updated by the guest without special
+ * synchronisation (i.e., no need for the x86 LOCK prefix).
+ * This may seem suboptimal because if the pending flag is set by
+ * a different CPU then an IPI may be scheduled even when the mask
+ * is set. However, note:
+ * 1. The task of 'interrupt holdoff' is covered by the per-event-
+ * channel mask bits. A 'noisy' event that is continually being
+ * triggered can be masked at source at this very precise
+ * granularity.
+ * 2. The main purpose of the per-VCPU mask is therefore to restrict
+ * reentrant execution: whether for concurrency control, or to
+ * prevent unbounded stack usage. Whatever the purpose, we expect
+ * that the mask will be asserted only for short periods at a time,
+ * and so the likelihood of a 'spurious' IPI is suitably small.
+ * The mask is read before making an event upcall to the guest: a
+ * non-zero mask therefore guarantees that the VCPU will not receive
+ * an upcall activation. The mask is cleared when the VCPU requests
+ * to block: this avoids wakeup-waiting races.
+ */
+ uint8_t evtchn_upcall_pending;
+ uint8_t evtchn_upcall_mask;
+ unsigned long evtchn_pending_sel;
+ struct arch_vcpu_info arch;
+ struct vcpu_time_info time;
+}; /* 64 bytes (x86) */
+#ifndef __XEN__
+typedef struct vcpu_info vcpu_info_t;
+#endif
+
+/*
+ * Xen/kernel shared data -- pointer provided in start_info.
+ *
+ * This structure is defined to be both smaller than a page, and the
+ * only data on the shared page, but may vary in actual size even within
+ * compatible Xen versions; guests should not rely on the size
+ * of this structure remaining constant.
+ */
+struct shared_info {
+ struct vcpu_info vcpu_info[MAX_VIRT_CPUS];
+
+ /*
+ * A domain can create "event channels" on which it can send and receive
+ * asynchronous event notifications. There are three classes of event that
+ * are delivered by this mechanism:
+ * 1. Bi-directional inter- and intra-domain connections. Domains must
+ * arrange out-of-band to set up a connection (usually by allocating
+ * an unbound 'listener' port and avertising that via a storage service
+ * such as xenstore).
+ * 2. Physical interrupts. A domain with suitable hardware-access
+ * privileges can bind an event-channel port to a physical interrupt
+ * source.
+ * 3. Virtual interrupts ('events'). A domain can bind an event-channel
+ * port to a virtual interrupt source, such as the virtual-timer
+ * device or the emergency console.
+ *
+ * Event channels are addressed by a "port index". Each channel is
+ * associated with two bits of information:
+ * 1. PENDING -- notifies the domain that there is a pending notification
+ * to be processed. This bit is cleared by the guest.
+ * 2. MASK -- if this bit is clear then a 0->1 transition of PENDING
+ * will cause an asynchronous upcall to be scheduled. This bit is only
+ * updated by the guest. It is read-only within Xen. If a channel
+ * becomes pending while the channel is masked then the 'edge' is lost
+ * (i.e., when the channel is unmasked, the guest must manually handle
+ * pending notifications as no upcall will be scheduled by Xen).
+ *
+ * To expedite scanning of pending notifications, any 0->1 pending
+ * transition on an unmasked channel causes a corresponding bit in a
+ * per-vcpu selector word to be set. Each bit in the selector covers a
+ * 'C long' in the PENDING bitfield array.
+ */
+ unsigned long evtchn_pending[sizeof(unsigned long) * 8];
+ unsigned long evtchn_mask[sizeof(unsigned long) * 8];
+
+ /*
+ * Wallclock time: updated only by control software. Guests should base
+ * their gettimeofday() syscall on this wallclock-base value.
+ */
+ uint32_t wc_version; /* Version counter: see vcpu_time_info_t. */
+ uint32_t wc_sec; /* Secs 00:00:00 UTC, Jan 1, 1970. */
+ uint32_t wc_nsec; /* Nsecs 00:00:00 UTC, Jan 1, 1970. */
+
+ struct arch_shared_info arch;
+
+};
+#ifndef __XEN__
+typedef struct shared_info shared_info_t;
+#endif
+
+/*
+ * Start-of-day memory layout:
+ * 1. The domain is started within contiguous virtual-memory region.
+ * 2. The contiguous region ends on an aligned 4MB boundary.
+ * 3. This the order of bootstrap elements in the initial virtual region:
+ * a. relocated kernel image
+ * b. initial ram disk [mod_start, mod_len]
+ * c. list of allocated page frames [mfn_list, nr_pages]
+ * d. start_info_t structure [register ESI (x86)]
+ * e. bootstrap page tables [pt_base, CR3 (x86)]
+ * f. bootstrap stack [register ESP (x86)]
+ * 4. Bootstrap elements are packed together, but each is 4kB-aligned.
+ * 5. The initial ram disk may be omitted.
+ * 6. The list of page frames forms a contiguous 'pseudo-physical' memory
+ * layout for the domain. In particular, the bootstrap virtual-memory
+ * region is a 1:1 mapping to the first section of the pseudo-physical map.
+ * 7. All bootstrap elements are mapped read-writable for the guest OS. The
+ * only exception is the bootstrap page table, which is mapped read-only.
+ * 8. There is guaranteed to be at least 512kB padding after the final
+ * bootstrap element. If necessary, the bootstrap virtual region is
+ * extended by an extra 4MB to ensure this.
+ */
+
+#define MAX_GUEST_CMDLINE 1024
+struct start_info {
+ /* THE FOLLOWING ARE FILLED IN BOTH ON INITIAL BOOT AND ON RESUME. */
+ char magic[32]; /* "xen-<version>-<platform>". */
+ unsigned long nr_pages; /* Total pages allocated to this domain. */
+ unsigned long shared_info; /* MACHINE address of shared info struct. */
+ uint32_t flags; /* SIF_xxx flags. */
+ xen_pfn_t store_mfn; /* MACHINE page number of shared page. */
+ uint32_t store_evtchn; /* Event channel for store communication. */
+ union {
+ struct {
+ xen_pfn_t mfn; /* MACHINE page number of console page. */
+ uint32_t evtchn; /* Event channel for console page. */
+ } domU;
+ struct {
+ uint32_t info_off; /* Offset of console_info struct. */
+ uint32_t info_size; /* Size of console_info struct from start.*/
+ } dom0;
+ } console;
+ /* THE FOLLOWING ARE ONLY FILLED IN ON INITIAL BOOT (NOT RESUME). */
+ unsigned long pt_base; /* VIRTUAL address of page directory. */
+ unsigned long nr_pt_frames; /* Number of bootstrap p.t. frames. */
+ unsigned long mfn_list; /* VIRTUAL address of page-frame list. */
+ unsigned long mod_start; /* VIRTUAL address of pre-loaded module. */
+ unsigned long mod_len; /* Size (bytes) of pre-loaded module. */
+ int8_t cmd_line[MAX_GUEST_CMDLINE];
+
+ /* hackish, for multiboot compatibility */
+ unsigned mods_count;
+};
+typedef struct start_info start_info_t;
+
+/* New console union for dom0 introduced in 0x00030203. */
+#if __XEN_INTERFACE_VERSION__ < 0x00030203
+#define console_mfn console.domU.mfn
+#define console_evtchn console.domU.evtchn
+#endif
+
+/* These flags are passed in the 'flags' field of start_info_t. */
+#define SIF_PRIVILEGED (1<<0) /* Is the domain privileged? */
+#define SIF_INITDOMAIN (1<<1) /* Is this the initial control domain? */
+#define SIF_MULTIBOOT_MOD (1<<2) /* Is this the initial control domain? */
+#define SIF_PM_MASK (0xFF<<8) /* reserve 1 byte for xen-pm options */
+
+typedef struct dom0_vga_console_info {
+ uint8_t video_type; /* DOM0_VGA_CONSOLE_??? */
+#define XEN_VGATYPE_TEXT_MODE_3 0x03
+#define XEN_VGATYPE_VESA_LFB 0x23
+
+ union {
+ struct {
+ /* Font height, in pixels. */
+ uint16_t font_height;
+ /* Cursor location (column, row). */
+ uint16_t cursor_x, cursor_y;
+ /* Number of rows and columns (dimensions in characters). */
+ uint16_t rows, columns;
+ } text_mode_3;
+
+ struct {
+ /* Width and height, in pixels. */
+ uint16_t width, height;
+ /* Bytes per scan line. */
+ uint16_t bytes_per_line;
+ /* Bits per pixel. */
+ uint16_t bits_per_pixel;
+ /* LFB physical address, and size (in units of 64kB). */
+ uint32_t lfb_base;
+ uint32_t lfb_size;
+ /* RGB mask offsets and sizes, as defined by VBE 1.2+ */
+ uint8_t red_pos, red_size;
+ uint8_t green_pos, green_size;
+ uint8_t blue_pos, blue_size;
+ uint8_t rsvd_pos, rsvd_size;
+#if __XEN_INTERFACE_VERSION__ >= 0x00030206
+ /* VESA capabilities (offset 0xa, VESA command 0x4f00). */
+ uint32_t gbl_caps;
+ /* Mode attributes (offset 0x0, VESA command 0x4f01). */
+ uint16_t mode_attrs;
+#endif
+ } vesa_lfb;
+ } u;
+} dom0_vga_console_info_t;
+#define xen_vga_console_info dom0_vga_console_info
+#define xen_vga_console_info_t dom0_vga_console_info_t
+
+typedef uint8_t xen_domain_handle_t[16];
+
+/* Turn a plain number into a C unsigned long constant. */
+#define __mk_unsigned_long(x) x ## UL
+#define mk_unsigned_long(x) __mk_unsigned_long(x)
+
+__DEFINE_XEN_GUEST_HANDLE(uint8, uint8_t);
+__DEFINE_XEN_GUEST_HANDLE(uint16, uint16_t);
+__DEFINE_XEN_GUEST_HANDLE(uint32, uint32_t);
+__DEFINE_XEN_GUEST_HANDLE(uint64, uint64_t);
+
+#else /* __ASSEMBLY__ */
+
+/* In assembly code we cannot use C numeric constant suffixes. */
+#define mk_unsigned_long(x) x
+
+#endif /* !__ASSEMBLY__ */
+
+/* Default definitions for macros used by domctl/sysctl. */
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+#ifndef uint64_aligned_t
+#define uint64_aligned_t uint64_t
+#endif
+#ifndef XEN_GUEST_HANDLE_64
+#define XEN_GUEST_HANDLE_64(name) XEN_GUEST_HANDLE(name)
+#endif
+#endif
+
+#endif /* __XEN_PUBLIC_XEN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/public/xencomm.h b/xen/public/xencomm.h
new file mode 100644
index 0000000..ac45e07
--- /dev/null
+++ b/xen/public/xencomm.h
@@ -0,0 +1,41 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) IBM Corp. 2006
+ */
+
+#ifndef _XEN_XENCOMM_H_
+#define _XEN_XENCOMM_H_
+
+/* A xencomm descriptor is a scatter/gather list containing physical
+ * addresses corresponding to a virtually contiguous memory area. The
+ * hypervisor translates these physical addresses to machine addresses to copy
+ * to and from the virtually contiguous area.
+ */
+
+#define XENCOMM_MAGIC 0x58434F4D /* 'XCOM' */
+#define XENCOMM_INVALID (~0UL)
+
+struct xencomm_desc {
+ uint32_t magic;
+ uint32_t nr_addrs; /* the number of entries in address[] */
+ uint64_t address[0];
+};
+
+#endif /* _XEN_XENCOMM_H_ */
diff --git a/xen/public/xenoprof.h b/xen/public/xenoprof.h
new file mode 100644
index 0000000..183078d
--- /dev/null
+++ b/xen/public/xenoprof.h
@@ -0,0 +1,138 @@
+/******************************************************************************
+ * xenoprof.h
+ *
+ * Interface for enabling system wide profiling based on hardware performance
+ * counters
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ * Written by Aravind Menon & Jose Renato Santos
+ */
+
+#ifndef __XEN_PUBLIC_XENOPROF_H__
+#define __XEN_PUBLIC_XENOPROF_H__
+
+#include "xen.h"
+
+/*
+ * Commands to HYPERVISOR_xenoprof_op().
+ */
+#define XENOPROF_init 0
+#define XENOPROF_reset_active_list 1
+#define XENOPROF_reset_passive_list 2
+#define XENOPROF_set_active 3
+#define XENOPROF_set_passive 4
+#define XENOPROF_reserve_counters 5
+#define XENOPROF_counter 6
+#define XENOPROF_setup_events 7
+#define XENOPROF_enable_virq 8
+#define XENOPROF_start 9
+#define XENOPROF_stop 10
+#define XENOPROF_disable_virq 11
+#define XENOPROF_release_counters 12
+#define XENOPROF_shutdown 13
+#define XENOPROF_get_buffer 14
+#define XENOPROF_set_backtrace 15
+#define XENOPROF_last_op 15
+
+#define MAX_OPROF_EVENTS 32
+#define MAX_OPROF_DOMAINS 25
+#define XENOPROF_CPU_TYPE_SIZE 64
+
+/* Xenoprof performance events (not Xen events) */
+struct event_log {
+ uint64_t eip;
+ uint8_t mode;
+ uint8_t event;
+};
+
+/* PC value that indicates a special code */
+#define XENOPROF_ESCAPE_CODE ~0UL
+/* Transient events for the xenoprof->oprofile cpu buf */
+#define XENOPROF_TRACE_BEGIN 1
+
+/* Xenoprof buffer shared between Xen and domain - 1 per VCPU */
+struct xenoprof_buf {
+ uint32_t event_head;
+ uint32_t event_tail;
+ uint32_t event_size;
+ uint32_t vcpu_id;
+ uint64_t xen_samples;
+ uint64_t kernel_samples;
+ uint64_t user_samples;
+ uint64_t lost_samples;
+ struct event_log event_log[1];
+};
+#ifndef __XEN__
+typedef struct xenoprof_buf xenoprof_buf_t;
+DEFINE_XEN_GUEST_HANDLE(xenoprof_buf_t);
+#endif
+
+struct xenoprof_init {
+ int32_t num_events;
+ int32_t is_primary;
+ char cpu_type[XENOPROF_CPU_TYPE_SIZE];
+};
+typedef struct xenoprof_init xenoprof_init_t;
+DEFINE_XEN_GUEST_HANDLE(xenoprof_init_t);
+
+struct xenoprof_get_buffer {
+ int32_t max_samples;
+ int32_t nbuf;
+ int32_t bufsize;
+ uint64_t buf_gmaddr;
+};
+typedef struct xenoprof_get_buffer xenoprof_get_buffer_t;
+DEFINE_XEN_GUEST_HANDLE(xenoprof_get_buffer_t);
+
+struct xenoprof_counter {
+ uint32_t ind;
+ uint64_t count;
+ uint32_t enabled;
+ uint32_t event;
+ uint32_t hypervisor;
+ uint32_t kernel;
+ uint32_t user;
+ uint64_t unit_mask;
+};
+typedef struct xenoprof_counter xenoprof_counter_t;
+DEFINE_XEN_GUEST_HANDLE(xenoprof_counter_t);
+
+typedef struct xenoprof_passive {
+ uint16_t domain_id;
+ int32_t max_samples;
+ int32_t nbuf;
+ int32_t bufsize;
+ uint64_t buf_gmaddr;
+} xenoprof_passive_t;
+DEFINE_XEN_GUEST_HANDLE(xenoprof_passive_t);
+
+
+#endif /* __XEN_PUBLIC_XENOPROF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/ring.c b/xen/ring.c
new file mode 100644
index 0000000..8644059
--- /dev/null
+++ b/xen/ring.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <string.h>
+#include "ring.h"
+
+/* dest is ring */
+void hyp_ring_store(void *dest, const void *src, size_t size, void *start, void *end)
+{
+ if (dest + size > end) {
+ size_t first_size = end - dest;
+ memcpy(dest, src, first_size);
+ src += first_size;
+ dest = start;
+ size -= first_size;
+ }
+ memcpy(dest, src, size);
+}
+
+/* src is ring */
+void hyp_ring_fetch(void *dest, const void *src, size_t size, void *start, void *end)
+{
+ if (src + size > end) {
+ size_t first_size = end - src;
+ memcpy(dest, src, first_size);
+ dest += first_size;
+ src = start;
+ size -= first_size;
+ }
+ memcpy(dest, src, size);
+}
+
+size_t hyp_ring_next_word(char **c, void *start, void *end)
+{
+ size_t n = 0;
+
+ while (**c) {
+ n++;
+ if (++(*c) == end)
+ *c = start;
+ }
+ (*c)++;
+
+ return n;
+}
diff --git a/xen/ring.h b/xen/ring.h
new file mode 100644
index 0000000..6ed00ac
--- /dev/null
+++ b/xen/ring.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_RING_H
+#define XEN_RING_H
+
+typedef unsigned32_t hyp_ring_pos_t;
+
+#define hyp_ring_idx(ring, pos) (((unsigned)(pos)) & (sizeof(ring)-1))
+#define hyp_ring_cell(ring, pos) (ring)[hyp_ring_idx((ring), (pos))]
+#define hyp_ring_smash(ring, prod, cons) (hyp_ring_idx((ring), (prod) + 1) == \
+ hyp_ring_idx((ring), (cons)))
+#define hyp_ring_available(ring, prod, cons) hyp_ring_idx((ring), (cons)-(prod)-1)
+
+void hyp_ring_store(void *dest, const void *src, size_t size, void *start, void *end);
+void hyp_ring_fetch(void *dest, const void *src, size_t size, void *start, void *end);
+size_t hyp_ring_next_word(char **c, void *start, void *end);
+
+#endif /* XEN_RING_H */
diff --git a/xen/store.c b/xen/store.c
new file mode 100644
index 0000000..3c6baeb
--- /dev/null
+++ b/xen/store.c
@@ -0,0 +1,334 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <mach/mig_support.h>
+#include <machine/pmap.h>
+#include <machine/ipl.h>
+#include <stdarg.h>
+#include <string.h>
+#include <alloca.h>
+#include <xen/public/xen.h>
+#include <xen/public/io/xs_wire.h>
+#include <util/atoi.h>
+#include "store.h"
+#include "ring.h"
+#include "evt.h"
+#include "xen.h"
+
+/* TODO use events instead of just yielding */
+
+/* Hypervisor part */
+
+decl_simple_lock_data(static, lock);
+
+static struct xenstore_domain_interface *store;
+
+struct store_req {
+ const char *data;
+ unsigned len;
+};
+
+/* Send a request */
+static void store_put(hyp_store_transaction_t t, unsigned32_t type, struct store_req *req, unsigned nr_reqs) {
+ struct xsd_sockmsg head = {
+ .type = type,
+ .req_id = 0,
+ .tx_id = t,
+ };
+ unsigned totlen, len;
+ unsigned i;
+
+ totlen = 0;
+ for (i = 0; i < nr_reqs; i++)
+ totlen += req[i].len;
+ head.len = totlen;
+ totlen += sizeof(head);
+
+ if (totlen > sizeof(store->req) - 1)
+ panic("too big store message %d, max %d", totlen, sizeof(store->req));
+
+ while (hyp_ring_available(store->req, store->req_prod, store->req_cons) < totlen)
+ hyp_yield();
+
+ mb();
+ hyp_ring_store(&hyp_ring_cell(store->req, store->req_prod), &head, sizeof(head), store->req, store->req + sizeof(store->req));
+ len = sizeof(head);
+ for (i=0; i<nr_reqs; i++) {
+ hyp_ring_store(&hyp_ring_cell(store->req, store->req_prod + len), req[i].data, req[i].len, store->req, store->req + sizeof(store->req));
+ len += req[i].len;
+ }
+
+ wmb();
+ store->req_prod += totlen;
+ hyp_event_channel_send(boot_info.store_evtchn);
+}
+
+static const char *errors[] = {
+ "EINVAL",
+ "EACCES",
+ "EEXIST",
+ "EISDIR",
+ "ENOENT",
+ "ENOMEM",
+ "ENOSPC",
+ "EIO",
+ "ENOTEMPTY",
+ "ENOSYS",
+ "EROFS",
+ "EBUSY",
+ "EAGAIN",
+ "EISCONN",
+ NULL,
+};
+
+/* Send a request and wait for a reply, whose header is put in head, and
+ * data is returned (beware, that's in the ring !)
+ * On error, returns NULL. Else takes the lock and return pointer on data and
+ * store_put_wait_end shall be called after reading it. */
+static struct xsd_sockmsg head;
+const char *hyp_store_error;
+
+static void *store_put_wait(hyp_store_transaction_t t, unsigned32_t type, struct store_req *req, unsigned nr_reqs) {
+ unsigned len;
+ const char **error;
+ void *data;
+
+ simple_lock(&lock);
+ store_put(t, type, req, nr_reqs);
+again:
+ while (store->rsp_prod - store->rsp_cons < sizeof(head))
+ hyp_yield();
+ rmb();
+ hyp_ring_fetch(&head, &hyp_ring_cell(store->rsp, store->rsp_cons), sizeof(head), store->rsp, store->rsp + sizeof(store->rsp));
+ len = sizeof(head) + head.len;
+ while (store->rsp_prod - store->rsp_cons < len)
+ hyp_yield();
+ rmb();
+ if (head.type == XS_WATCH_EVENT) {
+ /* Spurious watch event, drop */
+ store->rsp_cons += sizeof(head) + head.len;
+ hyp_event_channel_send(boot_info.store_evtchn);
+ goto again;
+ }
+ data = &hyp_ring_cell(store->rsp, store->rsp_cons + sizeof(head));
+ if (head.len <= 10) {
+ char c[10];
+ hyp_ring_fetch(c, data, head.len, store->rsp, store->rsp + sizeof(store->rsp));
+ for (error = errors; *error; error++) {
+ if (head.len == strlen(*error) + 1 && !memcmp(*error, c, head.len)) {
+ hyp_store_error = *error;
+ store->rsp_cons += len;
+ hyp_event_channel_send(boot_info.store_evtchn);
+ simple_unlock(&lock);
+ return NULL;
+ }
+ }
+ }
+ return data;
+}
+
+/* Must be called after each store_put_wait. Releases lock. */
+static void store_put_wait_end(void) {
+ mb();
+ store->rsp_cons += sizeof(head) + head.len;
+ hyp_event_channel_send(boot_info.store_evtchn);
+ simple_unlock(&lock);
+}
+
+/* Start a transaction. */
+hyp_store_transaction_t hyp_store_transaction_start(void) {
+ struct store_req req = {
+ .data = "",
+ .len = 1,
+ };
+ char *rep;
+ char *s;
+ int i;
+
+ rep = store_put_wait(0, XS_TRANSACTION_START, &req, 1);
+ if (!rep)
+ panic("couldn't start transaction (%s)", hyp_store_error);
+ s = alloca(head.len);
+ hyp_ring_fetch(s, rep, head.len, store->rsp, store->rsp + sizeof(store->rsp));
+ mach_atoi((u_char*) s, &i);
+ if (i == MACH_ATOI_DEFAULT)
+ panic("bogus transaction id len %d '%s'", head.len, s);
+ store_put_wait_end();
+ return i;
+}
+
+/* Stop a transaction. */
+int hyp_store_transaction_stop(hyp_store_transaction_t t) {
+ struct store_req req = {
+ .data = "T",
+ .len = 2,
+ };
+ int ret = 1;
+ void *rep;
+ rep = store_put_wait(t, XS_TRANSACTION_END, &req, 1);
+ if (!rep)
+ return 0;
+ store_put_wait_end();
+ return ret;
+}
+
+/* List a directory: returns an array to file names, terminated by NULL. Free
+ * with kfree. */
+char **hyp_store_ls(hyp_store_transaction_t t, int n, ...) {
+ struct store_req req[n];
+ va_list listp;
+ int i;
+ char *rep;
+ char *c;
+ char **res, **rsp;
+
+ va_start (listp, n);
+ for (i = 0; i < n; i++) {
+ req[i].data = va_arg(listp, char *);
+ req[i].len = strlen(req[i].data);
+ }
+ req[n - 1].len++;
+ va_end (listp);
+
+ rep = store_put_wait(t, XS_DIRECTORY, req, n);
+ if (!rep)
+ return NULL;
+ i = 0;
+ for ( c = rep, n = 0;
+ n < head.len;
+ n += hyp_ring_next_word(&c, store->rsp, store->rsp + sizeof(store->rsp)) + 1)
+ i++;
+ res = (void*) kalloc((i + 1) * sizeof(char*) + head.len);
+ if (!res)
+ hyp_store_error = "ENOMEM";
+ else {
+ hyp_ring_fetch(res + (i + 1), rep, head.len, store->rsp, store->rsp + sizeof(store->rsp));
+ rsp = res;
+ for (c = (char*) (res + (i + 1)); i; i--, c += strlen(c) + 1)
+ *rsp++ = c;
+ *rsp = NULL;
+ }
+ store_put_wait_end();
+ return res;
+}
+
+/* Get the value of an entry, va version. */
+static void *hyp_store_read_va(hyp_store_transaction_t t, int n, va_list listp) {
+ struct store_req req[n];
+ int i;
+ void *rep;
+ char *res;
+
+ for (i = 0; i < n; i++) {
+ req[i].data = va_arg(listp, char *);
+ req[i].len = strlen(req[i].data);
+ }
+ req[n - 1].len++;
+
+ rep = store_put_wait(t, XS_READ, req, n);
+ if (!rep)
+ return NULL;
+ res = (void*) kalloc(head.len + 1);
+ if (!res)
+ hyp_store_error = "ENOMEM";
+ else {
+ hyp_ring_fetch(res, rep, head.len, store->rsp, store->rsp + sizeof(store->rsp));
+ res[head.len] = 0;
+ }
+ store_put_wait_end();
+ return res;
+}
+
+/* Get the value of an entry. Free with kfree. */
+void *hyp_store_read(hyp_store_transaction_t t, int n, ...) {
+ va_list listp;
+ char *res;
+
+ va_start(listp, n);
+ res = hyp_store_read_va(t, n, listp);
+ va_end(listp);
+ return res;
+}
+
+/* Get the integer value of an entry, -1 on error. */
+int hyp_store_read_int(hyp_store_transaction_t t, int n, ...) {
+ va_list listp;
+ char *res;
+ int i;
+
+ va_start(listp, n);
+ res = hyp_store_read_va(t, n, listp);
+ va_end(listp);
+ if (!res)
+ return -1;
+ mach_atoi((u_char *) res, &i);
+ if (i == MACH_ATOI_DEFAULT)
+ printf("bogus integer '%s'\n", res);
+ kfree((vm_offset_t) res, strlen(res)+1);
+ return i;
+}
+
+/* Set the value of an entry. */
+char *hyp_store_write(hyp_store_transaction_t t, const char *data, int n, ...) {
+ struct store_req req[n + 1];
+ va_list listp;
+ int i;
+ void *rep;
+ char *res;
+
+ va_start (listp, n);
+ for (i = 0; i < n; i++) {
+ req[i].data = va_arg(listp, char *);
+ req[i].len = strlen(req[i].data);
+ }
+ req[n - 1].len++;
+ req[n].data = data;
+ req[n].len = strlen (data);
+ va_end (listp);
+
+ rep = store_put_wait (t, XS_WRITE, req, n + 1);
+ if (!rep)
+ return NULL;
+ res = (void*) kalloc(head.len + 1);
+ if (!res)
+ hyp_store_error = NULL;
+ else {
+ hyp_ring_fetch(res, rep, head.len, store->rsp, store->rsp + sizeof(store->rsp));
+ res[head.len] = 0;
+ }
+ store_put_wait_end();
+ return res;
+}
+
+static void hyp_store_handler(int unit)
+{
+ thread_wakeup(&boot_info.store_evtchn);
+}
+
+/* Map store's shared page. */
+void hyp_store_init(void)
+{
+ if (store)
+ return;
+ simple_lock_init(&lock);
+ store = (void*) mfn_to_kv(boot_info.store_mfn);
+ pmap_set_page_readwrite(store);
+ /* SPL sched */
+ hyp_evt_handler(boot_info.store_evtchn, hyp_store_handler, 0, SPL7);
+}
diff --git a/xen/store.h b/xen/store.h
new file mode 100644
index 0000000..4b3ee18
--- /dev/null
+++ b/xen/store.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_STORE_H
+#define XEN_STORE_H
+#include <machine/xen.h>
+#include <xen/public/io/xenbus.h>
+
+typedef unsigned32_t hyp_store_transaction_t;
+
+#define hyp_store_state_unknown "0"
+#define hyp_store_state_initializing "1"
+#define hyp_store_state_init_wait "2"
+#define hyp_store_state_initialized "3"
+#define hyp_store_state_connected "4"
+#define hyp_store_state_closing "5"
+#define hyp_store_state_closed "6"
+
+void hyp_store_init(void);
+
+extern const char *hyp_store_error;
+
+/* Start a transaction. */
+hyp_store_transaction_t hyp_store_transaction_start(void);
+/* Stop a transaction. Returns 1 if the transactions succeeded, 0 else. */
+int hyp_store_transaction_stop(hyp_store_transaction_t t);
+
+/* List a directory: returns an array to file names, terminated by NULL. Free
+ * with kfree. */
+char **hyp_store_ls(hyp_store_transaction_t t, int n, ...);
+
+/* Get the value of an entry. Free with kfree. */
+void *hyp_store_read(hyp_store_transaction_t t, int n, ...);
+/* Get the integer value of an entry, -1 on error. */
+int hyp_store_read_int(hyp_store_transaction_t t, int n, ...);
+/* Set the value of an entry. */
+char *hyp_store_write(hyp_store_transaction_t t, const char *data, int n, ...);
+
+#endif /* XEN_STORE_H */
diff --git a/xen/time.c b/xen/time.c
new file mode 100644
index 0000000..4c5cc35
--- /dev/null
+++ b/xen/time.c
@@ -0,0 +1,148 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <mach/mach_types.h>
+#include <kern/mach_clock.h>
+#include <mach/xen.h>
+#include <machine/xen.h>
+#include <machine/spl.h>
+#include <machine/ipl.h>
+#include <mach/machine/eflags.h>
+#include <xen/evt.h>
+#include "time.h"
+#include "store.h"
+
+static unsigned64_t lastnsec;
+
+/* 2^64 nanoseconds ~= 500 years */
+static unsigned64_t hyp_get_stime(void) {
+ unsigned32_t version;
+ unsigned64_t cpu_clock, last_cpu_clock, delta, system_time;
+ unsigned32_t mul;
+ signed8_t shift;
+ volatile struct vcpu_time_info *time = &hyp_shared_info.vcpu_info[0].time;
+
+ do {
+ version = time->version;
+ rmb();
+ cpu_clock = hyp_cpu_clock();
+ last_cpu_clock = time->tsc_timestamp;
+ system_time = time->system_time;
+ mul = time->tsc_to_system_mul;
+ shift = time->tsc_shift;
+ rmb();
+ } while (version != time->version);
+
+ delta = cpu_clock - last_cpu_clock;
+ if (shift < 0)
+ delta >>= -shift;
+ else
+ delta <<= shift;
+ return system_time + ((delta * (unsigned64_t) mul) >> 32);
+}
+
+unsigned64_t hyp_get_time(void) {
+ unsigned32_t version;
+ unsigned32_t sec, nsec;
+
+ do {
+ version = hyp_shared_info.wc_version;
+ rmb();
+ sec = hyp_shared_info.wc_sec;
+ nsec = hyp_shared_info.wc_nsec;
+ rmb();
+ } while (version != hyp_shared_info.wc_version);
+
+ return sec*1000000000ULL + nsec + hyp_get_stime();
+}
+
+static void hypclock_intr(int unit, int old_ipl, void *ret_addr, struct i386_interrupt_state *regs) {
+ unsigned64_t nsec, delta;
+
+ if (!lastnsec)
+ return;
+
+ nsec = hyp_get_stime();
+ if (nsec < lastnsec) {
+ printf("warning: nsec 0x%08lx%08lx < lastnsec 0x%08lx%08lx\n",(unsigned long)(nsec>>32), (unsigned long)nsec, (unsigned long)(lastnsec>>32), (unsigned long)lastnsec);
+ nsec = lastnsec;
+ }
+ delta = nsec-lastnsec;
+
+ lastnsec += (delta/1000)*1000;
+ hypclock_machine_intr(old_ipl, ret_addr, regs, delta);
+ /* 10ms tick rest */
+ hyp_do_set_timer_op(hyp_get_stime()+10*1000*1000);
+
+#if 0
+ char *c = hyp_store_read(0, 1, "control/shutdown");
+ if (c) {
+ static int go_down = 0;
+ if (!go_down) {
+ printf("uh oh, shutdown: %s\n", c);
+ go_down = 1;
+ /* TODO: somehow send startup_reboot notification to init */
+ if (!strcmp(c, "reboot")) {
+ /* this is just a reboot */
+ }
+ }
+ }
+#endif
+}
+
+extern struct timeval time;
+extern struct timezone tz;
+
+int
+readtodc(tp)
+ u_int *tp;
+{
+ unsigned64_t t = hyp_get_time();
+ u_int n = t / 1000000000;
+
+#ifndef MACH_KERNEL
+ n += tz.tz_minuteswest * 60;
+ if (tz.tz_dsttime)
+ n -= 3600;
+#endif /* MACH_KERNEL */
+ *tp = n;
+
+ return(0);
+}
+
+int
+writetodc()
+{
+ /* Not allowed in Xen */
+ return(-1);
+}
+
+void
+clkstart()
+{
+ evtchn_port_t port = hyp_event_channel_bind_virq(VIRQ_TIMER, 0);
+ hyp_evt_handler(port, hypclock_intr, 0, SPLHI);
+
+ /* first clock tick */
+ clock_interrupt(0, 0, 0);
+ lastnsec = hyp_get_stime();
+
+ /* 10ms tick rest */
+ hyp_do_set_timer_op(hyp_get_stime()+10*1000*1000);
+}
diff --git a/xen/time.h b/xen/time.h
new file mode 100644
index 0000000..f875588
--- /dev/null
+++ b/xen/time.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_TIME_H
+#define XEN_TIME_H
+
+#include <mach/mach_types.h>
+unsigned64_t hyp_get_time(void);
+
+#endif /* XEN_TIME_H */
diff --git a/xen/xen.c b/xen/xen.c
new file mode 100644
index 0000000..b3acef4
--- /dev/null
+++ b/xen/xen.c
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2007 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <sys/types.h>
+#include <string.h>
+#include <mach/xen.h>
+#include <machine/xen.h>
+#include <machine/ipl.h>
+#include <xen/block.h>
+#include <xen/console.h>
+#include <xen/grant.h>
+#include <xen/net.h>
+#include <xen/store.h>
+#include <xen/time.h>
+#include "xen.h"
+#include "evt.h"
+
+void hyp_invalidate_pte(pt_entry_t *pte)
+{
+ if (!hyp_mmu_update_pte(kv_to_ma(pte), (*pte) & ~INTEL_PTE_VALID))
+ panic("%s:%d could not set pte %p(%p) to %p(%p)\n",__FILE__,__LINE__,pte,kv_to_ma(pte),*pte,pa_to_ma(*pte));
+ hyp_mmuext_op_void(MMUEXT_TLB_FLUSH_LOCAL);
+}
+
+void hyp_debug()
+{
+ panic("debug");
+}
+
+void hyp_init(void)
+{
+ hyp_grant_init();
+ hyp_store_init();
+ /* these depend on the above */
+ hyp_block_init();
+ hyp_net_init();
+ evtchn_port_t port = hyp_event_channel_bind_virq(VIRQ_DEBUG, 0);
+ hyp_evt_handler(port, hyp_debug, 0, SPL7);
+}
+
+void _hyp_halt(void)
+{
+ hyp_halt();
+}
+
+void _hyp_todo(unsigned long from)
+{
+ printf("TODO: at %lx\n",from);
+ hyp_halt();
+}
+
+extern int int_mask[];
+void hyp_idle(void)
+{
+ int cpu = 0;
+ hyp_shared_info.vcpu_info[cpu].evtchn_upcall_mask = 0xff;
+ barrier();
+ /* Avoid blocking if there are pending events */
+ if (!hyp_shared_info.vcpu_info[cpu].evtchn_upcall_pending &&
+ !hyp_shared_info.evtchn_pending[cpu])
+ hyp_block();
+ while (1) {
+ hyp_shared_info.vcpu_info[cpu].evtchn_upcall_mask = 0x00;
+ barrier();
+ if (!hyp_shared_info.vcpu_info[cpu].evtchn_upcall_pending &&
+ !hyp_shared_info.evtchn_pending[cpu])
+ /* Didn't miss any event, can return to threads. */
+ break;
+ hyp_shared_info.vcpu_info[cpu].evtchn_upcall_mask = 0xff;
+ hyp_c_callback(NULL,NULL);
+ }
+}
diff --git a/xen/xen.h b/xen/xen.h
new file mode 100644
index 0000000..87e1256
--- /dev/null
+++ b/xen/xen.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2006 Samuel Thibault <samuel.thibault@ens-lyon.org>
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef XEN_XEN_H
+#define XEN_XEN_H
+
+void hyp_init(void);
+void hyp_invalidate_pte(pt_entry_t *pte);
+void hyp_idle(void);
+void hyp_p2m_init(void);
+
+#endif /* XEN_XEN_H */