\input texinfo @c -*- Texinfo -*- @setfilename mach.info @settitle The GNU Mach Reference Manual @setchapternewpage odd @comment Tell install-info what to do. @dircategory Kernel @direntry * GNUMach: (mach). Using and programming the GNU Mach microkernel. @end direntry @c Should have a glossary. @c Unify some of our indices. @syncodeindex pg cp @syncodeindex vr fn @syncodeindex tp fn @c Get the Mach version we are documenting. @include version.texi @set EDITION 0.4 @set UPDATED 2001-09-01 @c @set ISBN X-XXXXXX-XX-X @ifinfo This file documents the GNU Mach microkernel. This is Edition @value{EDITION}, last updated @value{UPDATED}, of @cite{The GNU Mach Reference Manual}, for Version @value{VERSION}. Copyright @copyright{} 2001 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being "Free Software Needs Free Documentation" and "GNU Lesser General Public License", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled "GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. This work is based on manual pages under the following copyright and license: @noindent Mach Operating System@* Copyright @copyright{} 1991,1990 Carnegie Mellon University@* All Rights Reserved. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. @end ifinfo @iftex @shorttitlepage The GNU Mach Reference Manual @end iftex @titlepage @center @titlefont{The GNU Mach} @sp 1 @center @titlefont{Reference Manual} @sp 2 @center Marcus Brinkmann @center with @center Gordon Matzigkeit, Gibran Hasnaoui, @center Robert V. Baron, Richard P. Draves, Mary R. Thompson, Joseph S. Barrera @sp 3 @center Edition @value{EDITION} @sp 1 @center last updated @value{UPDATED} @sp 1 @center for version @value{VERSION} @page @vskip 0pt plus 1filll Copyright @copyright{} 2001 Free Software Foundation, Inc. @c @sp 2 @c Published by the Free Software Foundation @* @c 59 Temple Place -- Suite 330, @* @c Boston, MA 02111-1307 USA @* @c ISBN @value{ISBN} @* Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being "Free Software Needs Free Documentation" and "GNU Lesser General Public License", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled "GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. This work is based on manual pages under the following copyright and license: @noindent Mach Operating System@* Copyright @copyright{} 1991,1990 Carnegie Mellon University@* All Rights Reserved. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. @end titlepage @c @titlepage @c @finalout @c @title The GNU Mach Reference Manual @c @author Marcus Brinkmann @c @author Gordon Matzigkeit @c @author Gibran Hasnaoui @c @author Robert V. Baron @c (rvb) @c @author Richard P. Draves @c (rpd) @c @author Mary R. Thompson @c (mrt) @c @author Joseph S. Barrera @c (jsb) @c @c The following occure rarely in the rcs commit logs of the man pages: @c @c Dan Stodolsky, (danner) @c @c David B. Golub, (dbg) @c @c Terri Watson, (elf) @c @c Lori Iannamico, (lli) [distribution coordinator] @c @c Further authors of kernel_interfaces.ps: @c @c David Black [OSF] @c @c William Bolosky @c @c Jonathan Chew @c @c Alessandro Forin @c @c Richard F. Rashid @c @c Avadis Tevanian Jr. @c @c Michael W. Young @c @c See also @c @c http://www.cs.cmu.edu/afs/cs/project/mach/public/www/people-former.html @page @ifnottex @node Top @top Main Menu This is Edition @value{EDITION}, last updated @value{UPDATED}, of @cite{The GNU Mach Reference Manual}, for Version @value{VERSION} of the GNU Mach microkernel. @end ifnottex @menu * Introduction:: How to use this manual. * Installing:: Setting up GNU Mach on your computer. * Bootstrap:: Running GNU Mach on your machine. * Inter Process Communication:: Communication between process. * Virtual Memory Interface:: Allocating and deallocating virtual memory. * External Memory Management:: Handling memory pages in user space. * Threads and Tasks:: Handling of threads and tasks. * Host Interface:: Interface to a Mach host. * Processors and Processor Sets:: Handling processors and sets of processors. * Device Interface:: Accesing kernel devices. * Kernel Debugger:: How to use the built-in kernel debugger. Appendices * Copying:: The GNU General Public License says how you can copy and share the GNU Mach microkernel. * Documentation License:: This manual is under the GNU Free Documentation License. Indices * Concept Index:: Index of concepts and programs. * Function and Data Index:: Index of functions, variables and data types. @detailmenu --- The Detailed Node Listing --- Introduction * Audience:: The people for whom this manual is written. * Features:: Reasons to install and use GNU Mach. * Overview:: Basic architecture of the Mach microkernel. * History:: The story about Mach. Installing * Binary Distributions:: Obtaining ready-to-run GNU distributions. * Compilation:: Building GNU Mach from its source code. * Configuration:: Configuration options at compilation time. * Cross-Compilation:: Building GNU Mach from another system. Bootstrap * Bootloader:: Starting the microkernel, or other OSes. * Modules:: Starting the first task of the OS. Inter Process Communication * Major Concepts:: The concepts behind the Mach IPC system. * Messaging Interface:: Composing, sending and receiving messages. * Port Manipulation Interface:: Manipulating ports, port rights, port sets. Messaging Interface * Mach Message Call:: Sending and receiving messages. * Message Format:: The format of Mach messages. * Exchanging Port Rights:: Sending and receiving port rights. * Memory:: Passing memory regions in messages. * Message Send:: Sending messages. * Message Receive:: Receiving messages. * Atomicity:: Atomicity of port rights. Port Manipulation Interface * Port Creation:: How to create new ports and port sets. * Port Destruction:: How to destroy ports and port sets. * Port Names:: How to query and manipulate port names. * Port Rights:: How to work with port rights. * Ports and other Tasks:: How to move rights between tasks. * Receive Rights:: How to work with receive rights. * Port Sets:: How to work with port sets. * Request Notifications:: How to request notifications for events. @c * Inherited Ports:: How to work with the inherited system ports. Virtual Memory Interface * Memory Allocation:: Allocation of new virtual memory. * Memory Deallocation:: Freeing unused virtual memory. * Data Transfer:: Reading, writing and copying memory. * Memory Attributes:: Tweaking memory regions. * Mapping Memory Objects:: How to map memory objects. * Memory Statistics:: How to get statistics about memory usage. External Memory Management * Memory Object Server:: The basics of external memory management. * Memory Object Creation:: How new memory objects are created. * Memory Object Termination:: How memory objects are terminated. * Memory Objects and Data:: Data transfer to and from memory objects. * Memory Object Locking:: How memory objects are locked. * Memory Object Attributes:: Manipulating attributes of memory objects. * Default Memory Manager:: Setting and using the default memory manager. Threads and Tasks * Thread Interface:: Manipulating threads. * Task Interface:: Manipulating tasks. * Profiling:: Profiling threads and tasks. Thread Interface * Thread Creation:: Creating threads. * Thread Termination:: Terminating threads. * Thread Information:: How to get informations on threads. * Thread Settings:: How to set threads related informations. * Thread Execution:: How to control the thread's machine state. * Scheduling:: Operations on thread scheduling. * Thread Special Ports:: How to handle the thread's special ports. * Exceptions:: Managing exceptions. Scheduling * Thread Priority:: Changing the priority of a thread. * Hand-Off Scheduling:: Switch to a new thread. * Scheduling Policy:: Setting the scheduling policy. Task Interface * Task Creation:: Creating tasks. * Task Termination:: Terminating tasks. * Task Information:: Informations on tasks. * Task Execution:: Thread scheduling in a task. * Task Special Ports:: How to get and set the task's special ports. * Syscall Emulation:: How to emulate system calls. Host Interface * Host Ports:: Ports representing a host. * Host Information:: Query information about a host. * Host Time:: Functions to query manipulate the host time. * Host Reboot:: Rebooting the system. Processors and Processor Sets * Processor Set Interface:: How to work with processor sets. * Processor Interface:: How to work with individual processors. Processor Set Interface * Processor Set Ports:: Ports representing a processor set. * Processor Set Access:: How the processor sets are accessed. * Processor Set Creation:: How new processor sets are created. * Processor Set Destruction:: How processor sets are destroyed. * Tasks and Threads on Sets:: Assigning tasks or threads to processor sets. * Processor Set Priority:: Specifying the priority of a processor set. * Processor Set Policy:: Changing the processor set policies. * Processor Set Info:: Obtaining information about a processor set. Processor Interface * Hosted Processors:: Getting a list of all processors on a host. * Processor Control:: Starting, stopping, controlling processors. * Processors and Sets:: Combining processors into processor sets. * Processor Info:: Obtaining information on processors. Device Interface * Device Open:: Opening hardware devices. * Device Close:: Closing hardware devices. * Device Read:: Reading data from the device. * Device Write:: Writing data to the device. * Device Map:: Mapping devices into virtual memory. * Device Status:: Querying and manipulating a device. * Device Filter:: Filtering packets arriving on a device. Kernel Debugger * Operation:: Basic architecture of the kernel debugger. * Commands:: Available commands in the kernel debugger. * Variables:: Access of variables from the kernel debugger. * Expressions:: Usage of expressions in the kernel debugger. Documentation License * Free Documentation License:: The GNU Free Documentation License. * CMU License:: The CMU license applies to the original Mach kernel and its documentation. @end detailmenu @end menu @node Introduction @chapter Introduction GNU Mach is the microkernel of the GNU Project. It is the base of the operating system, and provides its functionality to the Hurd servers, the GNU C Library and all user applications. The microkernel itself does not provide much functionality of the system, just enough to make it possible for the Hurd servers and the C library to implement the missing features you would expect from a POSIX compatible operating system. @menu * Audience:: The people for whom this manual is written. * Features:: Reasons to install and use GNU Mach. * Overview:: Basic architecture of the Mach microkernel. * History:: The story about Mach. @end menu @node Audience @section Audience This manual is designed to be useful to everybody who is interested in using, administering, or programming the Mach microkernel. If you are an end-user and you are looking for help on running the Mach kernel, the first few chapters of this manual describe the essential parts of installing and using the kernel in the GNU operating system. The rest of this manual is a technical discussion of the Mach programming interface and its implementation, and would not be helpful until you want to learn how to extend the system or modify the kernel. This manual is organized according to the subsystems of Mach, and each chapter begins with descriptions of conceptual ideas that are related to that subsystem. If you are a programmer and want to learn more about, say, the Mach IPC subsystem, you can skip to the IPC chapter (@pxref{Inter Process Communication}), and read about the related concepts and interface definitions. @node Features @section Features GNU Mach is not the most advanced microkernel known to the planet, nor is it the fastest or smallest, but it has a rich set of interfaces and some features which make it useful as the base of the Hurd system. @table @asis @item it's free software Anybody can use, modify, and redistribute it under the terms of the GNU General Public License (@pxref{Copying}). GNU Mach is part of the GNU system, which is a complete operating system licensed under the GPL. @item it's built to survive As a microkernel, GNU Mach doesn't implement a lot of the features commonly found in an operating system, but only the bare minimum that is required to implement a full operating system on top of it. This means that a lot of the operating system code is maintained outside of GNU Mach, and while this code may go through a complete redesign, the code of the microkernel can remain comparatively stable. @item it's scalable Mach is particularly well suited for SMP and network cluster techniques. Thread support is provided at the kernel level, and the kernel itself takes advantage of that. Network transparency at the IPC level makes resources of the system available across machine boundaries (with NORMA IPC, currently not available in GNU Mach). @item it exists The Mach microkernel is real software that works Right Now. It is not a research or a proposal. You don't have to wait at all before you can start using and developing it. Mach has been used in many operating systems in the past, usually as the base for a single UNIX server. In the GNU system, Mach is the base of a functional multi-server operating system, the Hurd. @end table @node Overview @section Overview @c This paragraph by Gordon Matzigkeit from the Hurd manual. An operating system kernel provides a framework for programs to share a computer's hardware resources securely and efficiently. This requires that the programs are seperated and protected from each other. To make running multiple programs in parallel useful, there also needs to be a facility for programs to exchange information by communication. The Mach microkernel provides abstractions of the underlying hardware resources like devices and memory. It organizes the running programs into tasks and threads (points of execution in the tasks). In addition, Mach provides a rich interface for inter-process communication. What Mach does not provide is a POSIX compatible programming interface. In fact, it has no understanding of file systems, POSIX process semantics, network protocols and many more. All this is implemented in tasks running on top of the microkernel. In the GNU operating system, the Hurd servers and the C library share the responsibility to implement the POSIX interface, and the additional interfaces which are specific to the GNU system. @node History @section History XXX A few lines about the history of Mach here. @node Installing @chapter Installing Before you can use the Mach microkernel in your system you'll need to install it and all components you want to use with it, e.g. the rest of the operating system. You also need a bootloader to load the kernel from the storage medium and run it when the computer is started. GNU Mach is only available for Intel i386-compatible architectures (such as the Pentium) currently. If you have a different architecture and want to run the GNU Mach microkernel, you will need to port the kernel and all other software of the system to your machine's architecture. Porting is an involved process which requires considerable programming skills, and it is not recommended for the faint-of-heart. If you have the talent and desire to do a port, contact @email{bug-hurd@@gnu.org} in order to coordinate the effort. @menu * Binary Distributions:: Obtaining ready-to-run GNU distributions. * Compilation:: Building GNU Mach from its source code. * Configuration:: Configuration options at compile time. * Cross-Compilation:: Building GNU Mach from another system. @end menu @node Binary Distributions @section Binary Distributions By far the easiest and best way to install GNU Mach and the operating system is to obtain a GNU binary distribution. The GNU operating system consists of GNU Mach, the Hurd, the C library and many applications. Without the GNU operating system, you will only have a microkernel, which is not very useful by itself, without the other programs. Building the whole operating system takes a huge effort, and you are well advised to not do it yourself, but to get a binary distribution of the GNU operating system. The distribution also includes a binary of the GNU Mach microkernel. Information on how to obtain the GNU system can be found in the Hurd info manual. @node Compilation @section Compilation If you already have a running GNU system, and only want to recompile the kernel, for example to select a different set of included hardware drivers, you can easily do this. You need the GNU C compiler and MiG, the Mach interface generator, which both come in their own packages. Building and installing the kernel is as easy as with any other GNU software package. The configure script is used to configure the source and set the compile time options. The compilation is done by running: @example make @end example To install the kernel and its header files, just enter the command: @example make install @end example This will install the kernel into $(prefix)/boot/gnumach and the header files into $(prefix)/include. You can also only install the kernel or the header files. For this, the two targets install-kernel and install-headers are provided. @node Configuration @section Configuration The following options can be passed to the configure script as command line arguments and control what components are built into the kernel, or where it is installed. The default for an option is to be disabled, unless otherwise noted. @table @code @item --prefix @var{prefix} Sets the prefix to PREFIX. The default prefix is the empty string, which is the correct value for the GNU system. The prefix is prepended to all file names at installation time. @item --enable-kdb Enables the in-kernel debugger. This is only useful if you actually anticipate debugging the kernel. It is not enabled by default because it adds considerably to the unpageable memory footprint of the kernel. @xref{Kernel Debugger}. @item --enable-kmsg Enables the kernel message device kmsg. @item --enable-lpr Enables the parallel port devices lpr%d. @item --enable-floppy Enables the PC floppy disk controller devices fd%d. @item --enable-ide Enables the IDE controller devices hd%d, hd%ds%d. @end table The following options enable drivers for various SCSI controller. SCSI devices are named sd%d (disks) or cd%d (CD ROMs). @table @code @item --enable-advansys Enables the AdvanSys SCSI controller devices sd%d, cd%d. @item --enable-buslogic Enables the BusLogic SCSI controller devices sd%d, cd%d. @item --disable-flashpoint Only meaningful in conjunction with @option{--enable-buslogic}. Omits the FlshPoint support. This option is enabled by default if @option{--enable-buslogic} is specified. @item --enable-u1434f Enables the UltraStor 14F/34F SCSI controller devices sd%d, cd%d. @item --enable-ultrastor Enables the UltraStor SCSI controller devices sd%d, cd%d. @item --enable-aha152x @itemx --enable-aha2825 Enables the Adaptec AHA-152x/2825 SCSI controller devices sd%d, cd%d. @item --enable-aha1542 Enables the Adaptec AHA-1542 SCSI controller devices sd%d, cd%d. @item --enable-aha1740 Enables the Adaptec AHA-1740 SCSI controller devices sd%d, cd%d. @item --enable-aic7xxx Enables the Adaptec AIC7xxx SCSI controller devices sd%d, cd%d. @item --enable-futuredomain Enables the Future Domain 16xx SCSI controller devices sd%d, cd%d. @item --enable-in2000 Enables the Always IN 2000 SCSI controller devices sd%d, cd%d. @item --enable-ncr5380 @itemx --enable-ncr53c400 Enables the generic NCR5380/53c400 SCSI controller devices sd%d, cd%d. @item --enable-ncr53c406a Enables the NCR53c406a SCSI controller devices sd%d, cd%d. @item --enable-pas16 Enables the PAS16 SCSI controller devices sd%d, cd%d. @item --enable-seagate Enables the Seagate ST02 and Future Domain TMC-8xx SCSI controller devices sd%d, cd%d. @item --enable-t128 @itemx --enable-t128f @itemx --enable-t228 Enables the Trantor T128/T128F/T228 SCSI controller devices sd%d, cd%d. @item --enable-ncr53c7xx Enables the NCR53C7,8xx SCSI controller devices sd%d, cd%d. @item --enable-eatadma Enables the EATA-DMA (DPT, NEC, AT&T, SNI, AST, Olivetti, Alphatronix) SCSI controller devices sd%d, cd%d. @item --enable-eatapio Enables the EATA-PIO (old DPT PM2001, PM2012A) SCSI controller devices sd%d, cd%d. @item --enable-wd7000 Enables the WD 7000 SCSI controller devices sd%d, cd%d. @item --enable-eata Enables the EATA ISA/EISA/PCI (DPT and generic EATA/DMA-compliant boards) SCSI controller devices sd%d, cd%d. @item --enable-am53c974 @itemx --enable-am79c974 Enables the AM53/79C974 SCSI controller devices sd%d, cd%d. @item --enable-dtc3280 @itemx --enable-dtc3180 Enables the DTC3180/3280 SCSI controller devices sd%d, cd%d. @item --enable-ncr53c8xx @itemx --enable-dc390w @itemx --enable-dc390u @itemx --enable-dc390f Enables the NCR53C8XX SCSI controller devices sd%d, cd%d. @item --enable-dc390t @itemx --enable-dc390 Enables the Tekram DC-390(T) SCSI controller devices sd%d, cd%d. @item --enable-ppa Enables the IOMEGA Parallel Port ZIP drive device sd%d. @item --enable-qlogicfas Enables the Qlogic FAS SCSI controller devices sd%d, cd%d. @item --enable-qlogicisp Enables the Qlogic ISP SCSI controller devices sd%d, cd%d. @item --enable-gdth Enables the GDT SCSI Disk Array controller devices sd%d, cd%d. @end table The following options enable drivers for various ethernet cards. NIC device names are usually eth%d, except for the pocket adaptors. GNU Mach does only autodetect one ethernet card. To enable any further cards, the source code has to be edited. @c XXX Reference to the source code. @table @code @item --enable-ne2000 @itemx --enable-ne1000 Enables the NE2000/NE1000 ISA netword card devices eth%d. @item --enable-3c503 @itemx --enable-el2 Enables the 3Com 503 (Etherlink II) netword card devices eth%d. @item --enable-3c509 @itemx --enable-3c579 @itemx --enable-el3 Enables the 3Com 509/579 (Etherlink III) netword card devices eth%d. @item --enable-wd80x3 Enables the WD80X3 netword card devices eth%d. @item --enable-3c501 @itemx --enable-el1 Enables the 3COM 501 netword card devices eth%d. @item --enable-ul Enables the SMC Ultra netword card devices eth%d. @item --enable-ul32 Enables the SMC Ultra 32 netword card devices eth%d. @item --enable-hplanplus Enables the HP PCLAN+ (27247B and 27252A) netword card devices eth%d. @item --enable-hplan Enables the HP PCLAN (27245 and other 27xxx series) netword card devices eth%d. @item --enable-3c59x @itemx --enable-3c90x @itemx --enable-vortex Enables the 3Com 590/900 series (592/595/597/900/905) "Vortex/Boomerang" netword card devices eth%d. @item --enable-seeq8005 Enables the Seeq8005 netword card devices eth%d. @item --enable-hp100 @itemx --enable-hpj2577 @itemx --enable-hpj2573 @itemx --enable-hp27248b @itemx --enable-hp2585 Enables the HP 10/100VG PCLAN (ISA, EISA, PCI) netword card devices eth%d. @item --enable-ac3200 Enables the Ansel Communications EISA 3200 netword card devices eth%d. @item --enable-e2100 Enables the Cabletron E21xx netword card devices eth%d. @item --enable-at1700 Enables the AT1700 (Fujitsu 86965) netword card devices eth%d. @item --enable-eth16i @itemx --enable-eth32 Enables the ICL EtherTeam 16i/32 netword card devices eth%d. @item --enable-znet @itemx --enable-znote Enables the Zenith Z-Note netword card devices eth%d. @item --enable-eexpress Enables the EtherExpress 16 netword card devices eth%d. @item --enable-eexpresspro Enables the EtherExpressPro netword card devices eth%d. @item --enable-eexpresspro100 Enables the Intel EtherExpressPro PCI 10+/100B/100+ netword card devices eth%d. @item --enable-depca @itemx --enable-de100 @itemx --enable-de101 @itemx --enable-de200 @itemx --enable-de201 @itemx --enable-de202 @itemx --enable-de210 @itemx --enable-de422 Enables the DEPCA, DE10x, DE200, DE201, DE202, DE210, DE422 netword card devices eth%d. @item --enable-ewrk3 @itemx --enable-de203 @itemx --enable-de204 @itemx --enable-de205 Enables the EtherWORKS 3 (DE203, DE204, DE205) netword card devices eth%d. @item --enable-de4x5 @itemx --enable-de425 @itemx --enable-de434 @itemx --enable-435 @itemx --enable-de450 @itemx --enable-500 Enables the DE425, DE434, DE435, DE450, DE500 netword card devices eth%d. @item --enable-apricot Enables the Apricot XEN-II on board ethernet netword card devices eth%d. @item --enable-wavelan Enables the AT&T WaveLAN & DEC RoamAbout DS netword card devices eth%d. @item --enable-3c507 @itemx --enable-el16 Enables the 3Com 507 netword card devices eth%d. @item --enable-3c505 @itemx --enable-elplus Enables the 3Com 505 netword card devices eth%d. @item --enable-de600 Enables the D-Link DE-600 netword card devices eth%d. @item --enable-de620 Enables the D-Link DE-620 netword card devices eth%d. @item --enable-skg16 Enables the Schneider & Koch G16 netword card devices eth%d. @item --enable-ni52 Enables the NI5210 netword card devices eth%d. @item --enable-ni65 Enables the NI6510 netword card devices eth%d. @item --enable-atp Enables the AT-LAN-TEC/RealTek pocket adaptor netword card devices atp%d. @item --enable-lance @itemx --enable-at1500 @itemx --enable-ne2100 Enables the AMD LANCE and PCnet (AT1500 and NE2100) netword card devices eth%d. @item --enable-elcp @itemx --enable-tulip Enables the DECchip Tulip (dc21x4x) PCI netword card devices eth%d. @item --enable-fmv18x Enables the FMV-181/182/183/184 netword card devices eth%d. @item --enable-3c515 Enables the 3Com 515 ISA Fast EtherLink netword card devices eth%d. @item --enable-pcnet32 Enables the AMD PCI PCnet32 (PCI bus NE2100 cards) netword card devices eth%d. @item --enable-ne2kpci Enables the PCI NE2000 netword card devices eth%d. @item --enable-yellowfin Enables the Packet Engines Yellowfin Gigabit-NIC netword card devices eth%d. @item --enable-rtl8139 @itemx --enable-rtl8129 Enables the RealTek 8129/8139 (not 8019/8029!) netword card devices eth%d. @item --enable-epic @itemx --enable-epic100 Enables the SMC 83c170/175 EPIC/100 (EtherPower II) netword card devices eth%d. @item --enable-tlan Enables the TI ThunderLAN netword card devices eth%d. @item --enable-viarhine Enables the VIA Rhine netword card devices eth%d. @item --enable-hamachi Enables the Packet Engines "Hamachi" GNIC-2 Gigabit Ethernet devices eth%d. @item --enable-intel-gige Enables the Intel PCI Gigabit Ethernet devices eth%d. @item --enable-myson803 Enables the Myson MTD803 Ethernet adapter series devices eth%d. @item --enable-natsemi Enables the National Semiconductor DP8381x series PCI Ethernet devices eth%d. @item --enable-ns820 Enables the National Semiconductor DP8382x series PCI Ethernet devices eth%d. @item --enable-starfire Enables the Adaptec Starfire network adapter devices eth%d. @item --enable-sundance Enables the Sundance ST201 "Alta" PCI Ethernet devices eth%d. @item --enable-winbond-840 Enables the Winbond W89c840 PCI Ethernet devices eth%d. @end table pcmcia drivers. @table @code @item --enable-i82365 @end table pcmcia device drivers. @table @code @item --enable-3c574_cs @item --enable-3c589_cs @item --enable-axnet_cs @item --enable-fmvj18x_cs @item --enable-nmclan_cs @item --enable-pcnet_cs @item --enable-smc91c92_cs @item --enable-xirc2ps_cs @item --enable-orinoco_cs @end table @node Cross-Compilation @section Cross-Compilation Another way to install the kernel is to use an existing operating system in order to compile the kernel binary. This is called @dfn{cross-compiling}, because it is done between two different platforms. If the pre-built kernels are not working for you, and you can't ask someone to compile a custom kernel for your machine, this is your last chance to get a kernel that boots on your hardware. Luckily, the kernel does have light dependencies. You don't even need a cross compiler if your build machine has a compiler and is the same architecture as the system you want to run GNU Mach on. You need a cross-mig, though. XXX More info needed. @node Bootstrap @chapter Bootstrap Bootstrapping@footnote{The term @dfn{bootstrapping} refers to a Dutch legend about a boy who was able to fly by pulling himself up by his bootstraps. In computers, this term refers to any process where a simple system activates a more complicated system.} is the procedure by which your machine loads the microkernel and transfers control to the operating system. @menu * Bootloader:: Starting the microkernel, or other OSes. * Modules:: Starting the first task of the OS. @end menu @node Bootloader @section Bootloader The @dfn{bootloader} is the first software that runs on your machine. Many hardware architectures have a very simple startup routine which reads a very simple bootloader from the beginning of the internal hard disk, then transfers control to it. Other architectures have startup routines which are able to understand more of the contents of the hard disk, and directly start a more advanced bootloader. @cindex GRUB @cindex GRand Unified Bootloader Currently, @dfn{GRUB}@footnote{The GRand Unified Bootloader, available from @uref{http://www.uruk.org/grub/}.} is the preferred GNU bootloader. GRUB provides advanced functionality, and is capable of loading several different kernels (such as Mach, Linux, DOS, and the *BSD family). @xref{Top, , Introduction, grub, GRUB Manual}. GNU Mach conforms to the Multiboot specification which defines an interface between the bootloader and the components that run very early at startup. GNU Mach can be started by any bootloader which supports the multiboot standard. After the bootloader loaded the kernel image to a designated address in the system memory, it jumps into the startup code of the kernel. This code initializes the kernel and detects the available hardware devices. Afterwards, the first system task is started. @xref{Top, , Overview, multiboot, Multiboot Specification}. @node Modules @section Modules @pindex serverboot Because the microkernel does not provide filesystem support and other features necessary to load the first system task from a storage medium, the first task is loaded by the bootloader as a module to a specified address. In the GNU system, this first program is the @code{serverboot} executable. GNU Mach inserts the host control port and the device master port into this task and appends the port numbers to the command line before executing it. The @code{serverboot} program is responsible for loading and executing the rest of the Hurd servers. Rather than containing specific instructions for starting the Hurd, it follows general steps given in a user-supplied boot script. XXX More about boot scripts. @node Inter Process Communication @chapter Inter Process Communication This chapter describes the details of the Mach IPC system. First the actual calls concerned with sending and receiving messages are discussed, then the details of the port system are described in detail. @menu * Major Concepts:: The concepts behind the Mach IPC system. * Messaging Interface:: Composing, sending and receiving messages. * Port Manipulation Interface:: Manipulating ports, port rights, port sets. @end menu @node Major Concepts @section Major Concepts @cindex interprocess communication (IPC) @cindex IPC (interprocess communication) @cindex communication between tasks @cindex remote procedure calls (RPC) @cindex RPC (remote procedure calls) @cindex messages The Mach kernel provides message-oriented, capability-based interprocess communication. The interprocess communication (IPC) primitives efficiently support many different styles of interaction, including remote procedure calls (RPC), object-oriented distributed programming, streaming of data, and sending very large amounts of data. The IPC primitives operate on three abstractions: messages, ports, and port sets. User tasks access all other kernel services and abstractions via the IPC primitives. The message primitives let tasks send and receive messages. Tasks send messages to ports. Messages sent to a port are delivered reliably (messages may not be lost) and are received in the order in which they were sent. Messages contain a fixed-size header and a variable amount of typed data following the header. The header describes the destination and size of the message. The IPC implementation makes use of the VM system to efficiently transfer large amounts of data. The message body can contain the address of a region in the sender's address space which should be transferred as part of the message. When a task receives a message containing an out-of-line region of data, the data appears in an unused portion of the receiver's address space. This transmission of out-of-line data is optimized so that sender and receiver share the physical pages of data copy-on-write, and no actual data copy occurs unless the pages are written. Regions of memory up to the size of a full address space may be sent in this manner. Ports hold a queue of messages. Tasks operate on a port to send and receive messages by exercising capabilities for the port. Multiple tasks can hold send capabilities, or rights, for a port. Tasks can also hold send-once rights, which grant the ability to send a single message. Only one task can hold the receive capability, or receive right, for a port. Port rights can be transferred between tasks via messages. The sender of a message can specify in the message body that the message contains a port right. If a message contains a receive right for a port, then the receive right is removed from the sender of the message and the right is transferred to the receiver of the message. While the receive right is in transit, tasks holding send rights can still send messages to the port, and they are queued until a task acquires the receive right and uses it to receive the messages. Tasks can receive messages from ports and port sets. The port set abstraction allows a single thread to wait for a message from any of several ports. Tasks manipulate port sets with a capability, or port-set right, which is taken from the same space as the port capabilities. The port-set right may not be transferred in a message. A port set holds receive rights, and a receive operation on a port set blocks waiting for a message sent to any of the constituent ports. A port may not belong to more than one port set, and if a port is a member of a port set, the holder of the receive right can't receive directly from the port. Port rights are a secure, location-independent way of naming ports. The port queue is a protected data structure, only accessible via the kernel's exported message primitives. Rights are also protected by the kernel; there is no way for a malicious user task to guess a port name and send a message to a port to which it shouldn't have access. Port rights do not carry any location information. When a receive right for a port moves from task to task, and even between tasks on different machines, the send rights for the port remain unchanged and continue to function. @node Messaging Interface @section Messaging Interface This section describes how messages are composed, sent and received within the Mach IPC system. @menu * Mach Message Call:: Sending and receiving messages. * Message Format:: The format of Mach messages. * Exchanging Port Rights:: Sending and receiving port rights. * Memory:: Passing memory regions in messages. * Message Send:: Sending messages. * Message Receive:: Receiving messages. * Atomicity:: Atomicity of port rights. @end menu @node Mach Message Call @subsection Mach Message Call To use the @code{mach_msg} call, you can include the header files @file{mach/port.h} and @file{mach/message.h}. @deftypefun mach_msg_return_t mach_msg (@w{mach_msg_header_t *@var{msg}}, @w{mach_msg_option_t @var{option}}, @w{mach_msg_size_t @var{send_size}}, @w{mach_msg_size_t @var{rcv_size}}, @w{mach_port_t @var{rcv_name}}, @w{mach_msg_timeout_t @var{timeout}}, @w{mach_port_t @var{notify}}) The @code{mach_msg} function is used to send and receive messages. Mach messages contain typed data, which can include port rights and references to large regions of memory. @var{msg} is the address of a buffer in the caller's address space. Message buffers should be aligned on long-word boundaries. The message options @var{option} are bit values, combined with bitwise-or. One or both of @code{MACH_SEND_MSG} and @code{MACH_RCV_MSG} should be used. Other options act as modifiers. When sending a message, @var{send_size} specifies the size of the message buffer. Otherwise zero should be supplied. When receiving a message, @var{rcv_size} specifies the size of the message buffer. Otherwise zero should be supplied. When receiving a message, @var{rcv_name} specifies the port or port set. Otherwise @code{MACH_PORT_NULL} should be supplied. When using the @code{MACH_SEND_TIMEOUT} and @code{MACH_RCV_TIMEOUT} options, @var{timeout} specifies the time in milliseconds to wait before giving up. Otherwise @code{MACH_MSG_TIMEOUT_NONE} should be supplied. When using the @code{MACH_SEND_NOTIFY}, @code{MACH_SEND_CANCEL}, and @code{MACH_RCV_NOTIFY} options, @var{notify} specifies the port used for the notification. Otherwise @code{MACH_PORT_NULL} should be supplied. If the option argument is @code{MACH_SEND_MSG}, it sends a message. The @var{send_size} argument specifies the size of the message to send. The @code{msgh_remote_port} field of the message header specifies the destination of the message. If the option argument is @code{MACH_RCV_MSG}, it receives a message. The @var{rcv_size} argument specifies the size of the message buffer that will receive the message; messages larger than @var{rcv_size} are not received. The @var{rcv_name} argument specifies the port or port set from which to receive. If the option argument is @code{MACH_SEND_MSG|MACH_RCV_MSG}, then @code{mach_msg} does both send and receive operations. If the send operation encounters an error (any return code other than @code{MACH_MSG_SUCCESS}), then the call returns immediately without attempting the receive operation. Semantically the combined call is equivalent to separate send and receive calls, but it saves a system call and enables other internal optimizations. If the option argument specifies neither @code{MACH_SEND_MSG} nor @code{MACH_RCV_MSG}, then @code{mach_msg} does nothing. Some options, like @code{MACH_SEND_TIMEOUT} and @code{MACH_RCV_TIMEOUT}, share a supporting argument. If these options are used together, they make independent use of the supporting argument's value. @end deftypefun @deftp {Data type} mach_msg_timeout_t This is a @code{natural_t} used by the timeout mechanism. The units are milliseconds. The value to be used when there is no timeout is @code{MACH_MSG_TIMEOUT_NONE}. @end deftp @node Message Format @subsection Message Format @cindex message format @cindex format of a message @cindex composing messages @cindex message composition A Mach message consists of a fixed size message header, a @code{mach_msg_header_t}, followed by zero or more data items. Data items are typed. Each item has a type descriptor followed by the actual data (or the address of the data, for out-of-line memory regions). The following data types are related to Mach ports: @deftp {Data type} mach_port_t The @code{mach_port_t} data type is an unsigned integer type which represents a port name in the task's port name space. In GNU Mach, this is an @code{unsigned int}. @end deftp @c This is defined elsewhere. @c @deftp {Data type} mach_port_seqno_t @c The @code{mach_port_seqno_t} data type is an unsigned integer type which @c represents a sequence number of a message. In GNU Mach, this is an @c @code{unsigned int}. @c @end deftp The following data types are related to Mach messages: @deftp {Data type} mach_msg_bits_t The @code{mach_msg_bits_t} data type is an @code{unsigned int} used to store various flags for a message. @end deftp @deftp {Data type} mach_msg_size_t The @code{mach_msg_size_t} data type is an @code{unsigned int} used to store the size of a message. @end deftp @deftp {Data type} mach_msg_id_t The @code{mach_msg_id_t} data type is an @code{integer_t} typically used to convey a function or operation id for the receiver. @end deftp @deftp {Data type} mach_msg_header_t This structure is the start of every message in the Mach IPC system. It has the following members: @table @code @item mach_msg_bits_t msgh_bits The @code{msgh_bits} field has the following bits defined, all other bits should be zero: @table @code @item MACH_MSGH_BITS_REMOTE_MASK @itemx MACH_MSGH_BITS_LOCAL_MASK The remote and local bits encode @code{mach_msg_type_name_t} values that specify the port rights in the @code{msgh_remote_port} and @code{msgh_local_port} fields. The remote value must specify a send or send-once right for the destination of the message. If the local value doesn't specify a send or send-once right for the message's reply port, it must be zero and msgh_local_port must be @code{MACH_PORT_NULL}. @item MACH_MSGH_BITS_COMPLEX The complex bit must be specified if the message body contains port rights or out-of-line memory regions. If it is not specified, then the message body carries no port rights or memory, no matter what the type descriptors may seem to indicate. @end table @code{MACH_MSGH_BITS_REMOTE} and @code{MACH_MSGH_BITS_LOCAL} macros return the appropriate @code{mach_msg_type_name_t} values, given a @code{msgh_bits} value. The @code{MACH_MSGH_BITS} macro constructs a value for @code{msgh_bits}, given two @code{mach_msg_type_name_t} values. @item mach_msg_size_t msgh_size The @code{msgh_size} field in the header of a received message contains the message's size. The message size, a byte quantity, includes the message header, type descriptors, and in-line data. For out-of-line memory regions, the message size includes the size of the in-line address, not the size of the actual memory region. There are no arbitrary limits on the size of a Mach message, the number of data items in a message, or the size of the data items. @item mach_port_t msgh_remote_port The @code{msgh_remote_port} field specifies the destination port of the message. The field must carry a legitimate send or send-once right for a port. @item mach_port_t msgh_local_port The @code{msgh_local_port} field specifies an auxiliary port right, which is conventionally used as a reply port by the recipient of the message. The field must carry a send right, a send-once right, @code{MACH_PORT_NULL}, or @code{MACH_PORT_DEAD}. @item mach_port_seqno_t msgh_seqno The @code{msgh_seqno} field provides a sequence number for the message. It is only valid in received messages; its value in sent messages is overwritten. @c XXX The "MESSAGE RECEIVE" section discusses message sequence numbers. @item mach_msg_id_t msgh_id The @code{mach_msg} call doesn't use the @code{msgh_id} field, but it conventionally conveys an operation or function id. @end table @end deftp @deftypefn Macro mach_msg_bits_t MACH_MSGH_BITS (@w{mach_msg_type_name_t @var{remote}}, @w{mach_msg_type_name_t @var{local}}) This macro composes two @code{mach_msg_type_name_t} values that specify the port rights in the @code{msgh_remote_port} and @code{msgh_local_port} fields of a @code{mach_msg} call into an appropriate @code{mach_msg_bits_t} value. @end deftypefn @deftypefn Macro mach_msg_type_name_t MACH_MSGH_BITS_REMOTE (@w{mach_msg_bits_t @var{bits}}) This macro extracts the @code{mach_msg_type_name_t} value for the remote port right in a @code{mach_msg_bits_t} value. @end deftypefn @deftypefn Macro mach_msg_type_name_t MACH_MSGH_BITS_LOCAL (@w{mach_msg_bits_t @var{bits}}) This macro extracts the @code{mach_msg_type_name_t} value for the local port right in a @code{mach_msg_bits_t} value. @end deftypefn @deftypefn Macro mach_msg_bits_t MACH_MSGH_BITS_PORTS (@w{mach_msg_bits_t @var{bits}}) This macro extracts the @code{mach_msg_bits_t} component consisting of the @code{mach_msg_type_name_t} values for the remote and local port right in a @code{mach_msg_bits_t} value. @end deftypefn @deftypefn Macro mach_msg_bits_t MACH_MSGH_BITS_OTHER (@w{mach_msg_bits_t @var{bits}}) This macro extracts the @code{mach_msg_bits_t} component consisting of everything except the @code{mach_msg_type_name_t} values for the remote and local port right in a @code{mach_msg_bits_t} value. @end deftypefn Each data item has a type descriptor, a @code{mach_msg_type_t} or a @code{mach_msg_type_long_t}. The @code{mach_msg_type_long_t} type descriptor allows larger values for some fields. The @code{msgtl_header} field in the long descriptor is only used for its inline, longform, and deallocate bits. @deftp {Data type} mach_msg_type_name_t This is an @code{unsigned int} and can be used to hold the @code{msgt_name} component of the @code{mach_msg_type_t} and @code{mach_msg_type_long_t} structure. @end deftp @deftp {Data type} mach_msg_type_size_t This is an @code{unsigned int} and can be used to hold the @code{msgt_size} component of the @code{mach_msg_type_t} and @code{mach_msg_type_long_t} structure. @end deftp @deftp {Data type} mach_msg_type_number_t This is an @code{natural_t} and can be used to hold the @code{msgt_number} component of the @code{mach_msg_type_t} and @code{mach_msg_type_long_t} structure. @c XXX This is used for the size of arrays, too. Mmh? @end deftp @deftp {Data type} mach_msg_type_t This structure has the following members: @table @code @item unsigned int msgt_name : 8 The @code{msgt_name} field specifies the data's type. The following types are predefined: @table @code @item MACH_MSG_TYPE_UNSTRUCTURED @item MACH_MSG_TYPE_BIT @item MACH_MSG_TYPE_BOOLEAN @item MACH_MSG_TYPE_INTEGER_16 @item MACH_MSG_TYPE_INTEGER_32 @item MACH_MSG_TYPE_CHAR @item MACH_MSG_TYPE_BYTE @item MACH_MSG_TYPE_INTEGER_8 @item MACH_MSG_TYPE_REAL @item MACH_MSG_TYPE_STRING @item MACH_MSG_TYPE_STRING_C @item MACH_MSG_TYPE_PORT_NAME @end table The following predefined types specify port rights, and receive special treatment. The next section discusses these types in detail. The type @c XXX cross ref @code{MACH_MSG_TYPE_PORT_NAME} describes port right names, when no rights are being transferred, but just names. For this purpose, it should be used in preference to @code{MACH_MSG_TYPE_INTEGER_32}. @table @code @item MACH_MSG_TYPE_MOVE_RECEIVE @item MACH_MSG_TYPE_MOVE_SEND @item MACH_MSG_TYPE_MOVE_SEND_ONCE @item MACH_MSG_TYPE_COPY_SEND @item MACH_MSG_TYPE_MAKE_SEND @item MACH_MSG_TYPE_MAKE_SEND_ONCE @end table @item msgt_size : 8 The @code{msgt_size} field specifies the size of each datum, in bits. For example, the msgt_size of @code{MACH_MSG_TYPE_INTEGER_32} data is 32. @item msgt_number : 12 The @code{msgt_number} field specifies how many data elements comprise the data item. Zero is a legitimate number. The total length specified by a type descriptor is @w{@code{(msgt_size * msgt_number)}}, rounded up to an integral number of bytes. In-line data is then padded to an integral number of long-words. This ensures that type descriptors always start on long-word boundaries. It implies that message sizes are always an integral multiple of a long-word's size. @item msgt_inline : 1 The @code{msgt_inline} bit specifies, when @code{FALSE}, that the data actually resides in an out-of-line region. The address of the memory region (a @code{vm_offset_t} or @code{vm_address_t}) follows the type descriptor in the message body. The @code{msgt_name}, @code{msgt_size}, and @code{msgt_number} fields describe the memory region, not the address. @item msgt_longform : 1 The @code{msgt_longform} bit specifies, when @code{TRUE}, that this type descriptor is a @code{mach_msg_type_long_t} instead of a @code{mach_msg_type_t}. The @code{msgt_name}, @code{msgt_size}, and @code{msgt_number} fields should be zero. Instead, @code{mach_msg} uses the following @code{msgtl_name}, @code{msgtl_size}, and @code{msgtl_number} fields. @item msgt_deallocate : 1 The @code{msgt_deallocate} bit is used with out-of-line regions. When @code{TRUE}, it specifies that the memory region should be deallocated from the sender's address space (as if with @code{vm_deallocate}) when the message is sent. @item msgt_unused : 1 The @code{msgt_unused} bit should be zero. @end table @end deftp @deftypefn Macro boolean_t MACH_MSG_TYPE_PORT_ANY (mach_msg_type_name_t type) This macro returns @code{TRUE} if the given type name specifies a port type, otherwise it returns @code{FALSE}. @end deftypefn @deftypefn Macro boolean_t MACH_MSG_TYPE_PORT_ANY_SEND (mach_msg_type_name_t type) This macro returns @code{TRUE} if the given type name specifies a port type with a send or send-once right, otherwise it returns @code{FALSE}. @end deftypefn @deftypefn Macro boolean_t MACH_MSG_TYPE_PORT_ANY_RIGHT (mach_msg_type_name_t type) This macro returns @code{TRUE} if the given type name specifies a port right type which is moved, otherwise it returns @code{FALSE}. @end deftypefn @deftp {Data type} mach_msg_type_long_t This structure has the following members: @table @code @item mach_msg_type_t msgtl_header Same meaning as @code{msgt_header}. @c XXX cross ref @item unsigned short msgtl_name Same meaning as @code{msgt_name}. @item unsigned short msgtl_size Same meaning as @code{msgt_size}. @item unsigned int msgtl_number Same meaning as @code{msgt_number}. @end table @end deftp @node Exchanging Port Rights @subsection Exchanging Port Rights @cindex sending port rights @cindex receiving port rights @cindex moving port rights Each task has its own space of port rights. Port rights are named with positive integers. Except for the reserved values @w{@code{MACH_PORT_NULL (0)}@footnote{In the Hurd system, we don't make the assumption that @code{MACH_PORT_NULL} is zero and evaluates to false, but rather compare port names to @code{MACH_PORT_NULL} explicitely}} and @w{@code{MACH_PORT_DEAD (~0)}}, this is a full 32-bit name space. When the kernel chooses a name for a new right, it is free to pick any unused name (one which denotes no right) in the space. There are five basic kinds of rights: receive rights, send rights, send-once rights, port-set rights, and dead names. Dead names are not capabilities. They act as place-holders to prevent a name from being otherwise used. A port is destroyed, or dies, when its receive right is deallocated. When a port dies, send and send-once rights for the port turn into dead names. Any messages queued at the port are destroyed, which deallocates the port rights and out-of-line memory in the messages. Tasks may hold multiple user-references for send rights and dead names. When a task receives a send right which it already holds, the kernel increments the right's user-reference count. When a task deallocates a send right, the kernel decrements its user-reference count, and the task only loses the send right when the count goes to zero. Send-once rights always have a user-reference count of one, although a port can have multiple send-once rights, because each send-once right held by a task has a different name. In contrast, when a task holds send rights or a receive right for a port, the rights share a single name. A message body can carry port rights; the @code{msgt_name} (@code{msgtl_name}) field in a type descriptor specifies the type of port right and how the port right is to be extracted from the caller. The values @code{MACH_PORT_NULL} and @code{MACH_PORT_DEAD} are always valid in place of a port right in a message body. In a sent message, the following @code{msgt_name} values denote port rights: @table @code @item MACH_MSG_TYPE_MAKE_SEND The message will carry a send right, but the caller must supply a receive right. The send right is created from the receive right, and the receive right's make-send count is incremented. @item MACH_MSG_TYPE_COPY_SEND The message will carry a send right, and the caller should supply a send right. The user reference count for the supplied send right is not changed. The caller may also supply a dead name and the receiving task will get @code{MACH_PORT_DEAD}. @item MACH_MSG_TYPE_MOVE_SEND The message will carry a send right, and the caller should supply a send right. The user reference count for the supplied send right is decremented, and the right is destroyed if the count becomes zero. Unless a receive right remains, the name becomes available for recycling. The caller may also supply a dead name, which loses a user reference, and the receiving task will get @code{MACH_PORT_DEAD}. @item MACH_MSG_TYPE_MAKE_SEND_ONCE The message will carry a send-once right, but the caller must supply a receive right. The send-once right is created from the receive right. @item MACH_MSG_TYPE_MOVE_SEND_ONCE The message will carry a send-once right, and the caller should supply a send-once right. The caller loses the supplied send-once right. The caller may also supply a dead name, which loses a user reference, and the receiving task will get @code{MACH_PORT_DEAD}. @item MACH_MSG_TYPE_MOVE_RECEIVE The message will carry a receive right, and the caller should supply a receive right. The caller loses the supplied receive right, but retains any send rights with the same name. @end table If a message carries a send or send-once right, and the port dies while the message is in transit, then the receiving task will get @code{MACH_PORT_DEAD} instead of a right. The following @code{msgt_name} values in a received message indicate that it carries port rights: @table @code @item MACH_MSG_TYPE_PORT_SEND This name is an alias for @code{MACH_MSG_TYPE_MOVE_SEND}. The message carried a send right. If the receiving task already has send and/or receive rights for the port, then that name for the port will be reused. Otherwise, the new right will have a new name. If the task already has send rights, it gains a user reference for the right (unless this would cause the user-reference count to overflow). Otherwise, it acquires the send right, with a user-reference count of one. @item MACH_MSG_TYPE_PORT_SEND_ONCE This name is an alias for @code{MACH_MSG_TYPE_MOVE_SEND_ONCE}. The message carried a send-once right. The right will have a new name. @item MACH_MSG_TYPE_PORT_RECEIVE This name is an alias for @code{MACH_MSG_TYPE_MOVE_RECEIVE}. The message carried a receive right. If the receiving task already has send rights for the port, then that name for the port will be reused. Otherwise, the right will have a new name. The make-send count of the receive right is reset to zero, but the port retains other attributes like queued messages, extant send and send-once rights, and requests for port-destroyed and no-senders notifications. @end table When the kernel chooses a new name for a port right, it can choose any name, other than @code{MACH_PORT_NULL} and @code{MACH_PORT_DEAD}, which is not currently being used for a port right or dead name. It might choose a name which at some previous time denoted a port right, but is currently unused. @node Memory @subsection Memory @cindex sending memory @cindex receiving memory A message body can contain the address of a region in the sender's address space which should be transferred as part of the message. The message carries a logical copy of the memory, but the kernel uses VM techniques to defer any actual page copies. Unless the sender or the receiver modifies the data, the physical pages remain shared. An out-of-line transfer occurs when the data's type descriptor specifies @code{msgt_inline} as @code{FALSE}. The address of the memory region (a @code{vm_offset_t} or @code{vm_address_t}) should follow the type descriptor in the message body. The type descriptor and the address contribute to the message's size (@code{send_size}, @code{msgh_size}). The out-of-line data does not contribute to the message's size. The name, size, and number fields in the type descriptor describe the type and length of the out-of-line data, not the in-line address. Out-of-line memory frequently requires long type descriptors (@code{mach_msg_type_long_t}), because the @code{msgt_number} field is too small to describe a page of 4K bytes. Out-of-line memory arrives somewhere in the receiver's address space as new memory. It has the same inheritance and protection attributes as newly @code{vm_allocate}'d memory. The receiver has the responsibility of deallocating (with @code{vm_deallocate}) the memory when it is no longer needed. Security-conscious receivers should exercise caution when using out-of-line memory from untrustworthy sources, because the memory may be backed by an unreliable memory manager. Null out-of-line memory is legal. If the out-of-line region size is zero (for example, because @code{msgtl_number} is zero), then the region's specified address is ignored. A received null out-of-line memory region always has a zero address. Unaligned addresses and region sizes that are not page multiples are legal. A received message can also contain memory with unaligned addresses and funny sizes. In the general case, the first and last pages in the new memory region in the receiver do not contain only data from the sender, but are partly zero.@footnote{Sending out-of-line memory with a non-page-aligned address, or a size which is not a page multiple, works but with a caveat. The extra bytes in the first and last page of the received memory are not zeroed, so the receiver can peek at more data than the sender intended to transfer. This might be a security problem for the sender.} The received address points to the start of the data in the first page. This possibility doesn't complicate deallocation, because @code{vm_deallocate} does the right thing, rounding the start address down and the end address up to deallocate all arrived pages. Out-of-line memory has a deallocate option, controlled by the @code{msgt_deallocate} bit. If it is @code{TRUE} and the out-of-line memory region is not null, then the region is implicitly deallocated from the sender, as if by @code{vm_deallocate}. In particular, the start and end addresses are rounded so that every page overlapped by the memory region is deallocated. The use of @code{msgt_deallocate} effectively changes the memory copy into a memory movement. In a received message, @code{msgt_deallocate} is @code{TRUE} in type descriptors for out-of-line memory. Out-of-line memory can carry port rights. @node Message Send @subsection Message Send @cindex sending messages The send operation queues a message to a port. The message carries a copy of the caller's data. After the send, the caller can freely modify the message buffer or the out-of-line memory regions and the message contents will remain unchanged. Message delivery is reliable and sequenced. Messages are not lost, and messages sent to a port, from a single thread, are received in the order in which they were sent. If the destination port's queue is full, then several things can happen. If the message is sent to a send-once right (@code{msgh_remote_port} carries a send-once right), then the kernel ignores the queue limit and delivers the message. Otherwise the caller blocks until there is room in the queue, unless the @code{MACH_SEND_TIMEOUT} or @code{MACH_SEND_NOTIFY} options are used. If a port has several blocked senders, then any of them may queue the next message when space in the queue becomes available, with the proviso that a blocked sender will not be indefinitely starved. These options modify @code{MACH_SEND_MSG}. If @code{MACH_SEND_MSG} is not also specified, they are ignored. @table @code @item MACH_SEND_TIMEOUT The timeout argument should specify a maximum time (in milliseconds) for the call to block before giving up.@footnote{If MACH_SEND_TIMEOUT is used without MACH_SEND_INTERRUPT, then the timeout duration might not be accurate. When the call is interrupted and automatically retried, the original timeout is used. If interrupts occur frequently enough, the timeout interval might never expire.} If the message can't be queued before the timeout interval elapses, then the call returns @code{MACH_SEND_TIMED_OUT}. A zero timeout is legitimate. @item MACH_SEND_NOTIFY The notify argument should specify a receive right for a notify port. If the send were to block, then instead the message is queued, @code{MACH_SEND_WILL_NOTIFY} is returned, and a msg-accepted notification is requested. If @code{MACH_SEND_TIMEOUT} is also specified, then @code{MACH_SEND_NOTIFY} doesn't take effect until the timeout interval elapses. With @code{MACH_SEND_NOTIFY}, a task can forcibly queue to a send right one message at a time. A msg-accepted notification is sent to the the notify port when another message can be forcibly queued. If an attempt is made to use @code{MACH_SEND_NOTIFY} before then, the call returns a @code{MACH_SEND_NOTIFY_IN_PROGRESS} error. The msg-accepted notification carries the name of the send right. If the send right is deallocated before the msg-accepted notification is generated, then the msg-accepted notification carries the value @code{MACH_PORT_NULL}. If the destination port is destroyed before the notification is generated, then a send-once notification is generated instead. @item MACH_SEND_INTERRUPT If specified, the @code{mach_msg} call will return @code{MACH_SEND_INTERRUPTED} if a software interrupt aborts the call. Otherwise, the send operation will be retried. @item MACH_SEND_CANCEL The notify argument should specify a receive right for a notify port. If the send operation removes the destination port right from the caller, and the removed right had a dead-name request registered for it, and notify is the notify port for the dead-name request, then the dead-name request may be silently canceled (instead of resulting in a port-deleted notification). This option is typically used to cancel a dead-name request made with the @code{MACH_RCV_NOTIFY} option. It should only be used as an optimization. @end table The send operation can generate the following return codes. These return codes imply that the call did nothing: @table @code @item MACH_SEND_MSG_TOO_SMALL The specified send_size was smaller than the minimum size for a message. @item MACH_SEND_NO_BUFFER A resource shortage prevented the kernel from allocating a message buffer. @item MACH_SEND_INVALID_DATA The supplied message buffer was not readable. @item MACH_SEND_INVALID_HEADER The @code{msgh_bits} value was invalid. @item MACH_SEND_INVALID_DEST The @code{msgh_remote_port} value was invalid. @item MACH_SEND_INVALID_REPLY The @code{msgh_local_port} value was invalid. @item MACH_SEND_INVALID_NOTIFY When using @code{MACH_SEND_CANCEL}, the notify argument did not denote a valid receive right. @end table These return codes imply that some or all of the message was destroyed: @table @code @item MACH_SEND_INVALID_MEMORY The message body specified out-of-line data that was not readable. @item MACH_SEND_INVALID_RIGHT The message body specified a port right which the caller didn't possess. @item MACH_SEND_INVALID_TYPE A type descriptor was invalid. @item MACH_SEND_MSG_TOO_SMALL The last data item in the message ran over the end of the message. @end table These return codes imply that the message was returned to the caller with a pseudo-receive operation: @table @code @item MACH_SEND_TIMED_OUT The timeout interval expired. @item MACH_SEND_INTERRUPTED A software interrupt occurred. @item MACH_SEND_INVALID_NOTIFY When using @code{MACH_SEND_NOTIFY}, the notify argument did not denote a valid receive right. @item MACH_SEND_NO_NOTIFY A resource shortage prevented the kernel from setting up a msg-accepted notification. @item MACH_SEND_NOTIFY_IN_PROGRESS A msg-accepted notification was already requested, and hasn't yet been generated. @end table These return codes imply that the message was queued: @table @code @item MACH_SEND_WILL_NOTIFY The message was forcibly queued, and a msg-accepted notification was requested. @item MACH_MSG_SUCCESS The message was queued. @end table Some return codes, like @code{MACH_SEND_TIMED_OUT}, imply that the message was almost sent, but could not be queued. In these situations, the kernel tries to return the message contents to the caller with a pseudo-receive operation. This prevents the loss of port rights or memory which only exist in the message. For example, a receive right which was moved into the message, or out-of-line memory sent with the deallocate bit. The pseudo-receive operation is very similar to a normal receive operation. The pseudo-receive handles the port rights in the message header as if they were in the message body. They are not reversed. After the pseudo-receive, the message is ready to be resent. If the message is not resent, note that out-of-line memory regions may have moved and some port rights may have changed names. The pseudo-receive operation may encounter resource shortages. This is similar to a @code{MACH_RCV_BODY_ERROR} return code from a receive operation. When this happens, the normal send return codes are augmented with the @code{MACH_MSG_IPC_SPACE}, @code{MACH_MSG_VM_SPACE}, @code{MACH_MSG_IPC_KERNEL}, and @code{MACH_MSG_VM_KERNEL} bits to indicate the nature of the resource shortage. The queueing of a message carrying receive rights may create a circular loop of receive rights and messages, which can never be received. For example, a message carrying a receive right can be sent to that receive right. This situation is not an error, but the kernel will garbage-collect such loops, destroying the messages and ports involved. @node Message Receive @subsection Message Receive The receive operation dequeues a message from a port. The receiving task acquires the port rights and out-of-line memory regions carried in the message. The @code{rcv_name} argument specifies a port or port set from which to receive. If a port is specified, the caller must possess the receive right for the port and the port must not be a member of a port set. If no message is present, then the call blocks, subject to the @code{MACH_RCV_TIMEOUT} option. If a port set is specified, the call will receive a message sent to any of the member ports. It is permissible for the port set to have no member ports, and ports may be added and removed while a receive from the port set is in progress. The received message can come from any of the member ports which have messages, with the proviso that a member port with messages will not be indefinitely starved. The @code{msgh_local_port} field in the received message header specifies from which port in the port set the message came. The @code{rcv_size} argument specifies the size of the caller's message buffer. The @code{mach_msg} call will not receive a message larger than @code{rcv_size}. Messages that are too large are destroyed, unless the @code{MACH_RCV_LARGE} option is used. The destination and reply ports are reversed in a received message header. The @code{msgh_local_port} field names the destination port, from which the message was received, and the @code{msgh_remote_port} field names the reply port right. The bits in @code{msgh_bits} are also reversed. The @code{MACH_MSGH_BITS_LOCAL} bits have the value @code{MACH_MSG_TYPE_PORT_SEND} if the message was sent to a send right, and the value @code{MACH_MSG_TYPE_PORT_SEND_ONCE} if was sent to a send-once right. The @code{MACH_MSGH_BITS_REMOTE} bits describe the reply port right. A received message can contain port rights and out-of-line memory. The @code{msgh_local_port} field does not receive a port right; the act of receiving the message destroys the send or send-once right for the destination port. The msgh_remote_port field does name a received port right, the reply port right, and the message body can carry port rights and memory if @code{MACH_MSGH_BITS_COMPLEX} is present in msgh_bits. Received port rights and memory should be consumed or deallocated in some fashion. In almost all cases, @code{msgh_local_port} will specify the name of a receive right, either @code{rcv_name} or if @code{rcv_name} is a port set, a member of @code{rcv_name}. If other threads are concurrently manipulating the receive right, the situation is more complicated. If the receive right is renamed during the call, then @code{msgh_local_port} specifies the right's new name. If the caller loses the receive right after the message was dequeued from it, then @code{mach_msg} will proceed instead of returning @code{MACH_RCV_PORT_DIED}. If the receive right was destroyed, then @code{msgh_local_port} specifies @code{MACH_PORT_DEAD}. If the receive right still exists, but isn't held by the caller, then @code{msgh_local_port} specifies @code{MACH_PORT_NULL}. Received messages are stamped with a sequence number, taken from the port from which the message was received. (Messages received from a port set are stamped with a sequence number from the appropriate member port.) Newly created ports start with a zero sequence number, and the sequence number is reset to zero whenever the port's receive right moves between tasks. When a message is dequeued from the port, it is stamped with the port's sequence number and the port's sequence number is then incremented. The dequeue and increment operations are atomic, so that multiple threads receiving messages from a port can use the @code{msgh_seqno} field to reconstruct the original order of the messages. These options modify @code{MACH_RCV_MSG}. If @code{MACH_RCV_MSG} is not also specified, they are ignored. @table @code @item MACH_RCV_TIMEOUT The timeout argument should specify a maximum time (in milliseconds) for the call to block before giving up.@footnote{If MACH_RCV_TIMEOUT is used without MACH_RCV_INTERRUPT, then the timeout duration might not be accurate. When the call is interrupted and automatically retried, the original timeout is used. If interrupts occur frequently enough, the timeout interval might never expire.} If no message arrives before the timeout interval elapses, then the call returns @code{MACH_RCV_TIMED_OUT}. A zero timeout is legitimate. @item MACH_RCV_NOTIFY The notify argument should specify a receive right for a notify port. If receiving the reply port creates a new port right in the caller, then the notify port is used to request a dead-name notification for the new port right. @item MACH_RCV_INTERRUPT If specified, the @code{mach_msg} call will return @code{MACH_RCV_INTERRUPTED} if a software interrupt aborts the call. Otherwise, the receive operation will be retried. @item MACH_RCV_LARGE If the message is larger than @code{rcv_size}, then the message remains queued instead of being destroyed. The call returns @code{MACH_RCV_TOO_LARGE} and the actual size of the message is returned in the @code{msgh_size} field of the message header. @end table The receive operation can generate the following return codes. These return codes imply that the call did not dequeue a message: @table @code @item MACH_RCV_INVALID_NAME The specified @code{rcv_name} was invalid. @item MACH_RCV_IN_SET The specified port was a member of a port set. @item MACH_RCV_TIMED_OUT The timeout interval expired. @item MACH_RCV_INTERRUPTED A software interrupt occurred. @item MACH_RCV_PORT_DIED The caller lost the rights specified by @code{rcv_name}. @item MACH_RCV_PORT_CHANGED @code{rcv_name} specified a receive right which was moved into a port set during the call. @item MACH_RCV_TOO_LARGE When using @code{MACH_RCV_LARGE}, and the message was larger than @code{rcv_size}. The message is left queued, and its actual size is returned in the @code{msgh_size} field of the message buffer. @end table These return codes imply that a message was dequeued and destroyed: @table @code @item MACH_RCV_HEADER_ERROR A resource shortage prevented the reception of the port rights in the message header. @item MACH_RCV_INVALID_NOTIFY When using @code{MACH_RCV_NOTIFY}, the notify argument did not denote a valid receive right. @item MACH_RCV_TOO_LARGE When not using @code{MACH_RCV_LARGE}, a message larger than @code{rcv_size} was dequeued and destroyed. @end table In these situations, when a message is dequeued and then destroyed, the reply port and all port rights and memory in the message body are destroyed. However, the caller receives the message's header, with all fields correct, including the destination port but excepting the reply port, which is @code{MACH_PORT_NULL}. These return codes imply that a message was received: @table @code @item MACH_RCV_BODY_ERROR A resource shortage prevented the reception of a port right or out-of-line memory region in the message body. The message header, including the reply port, is correct. The kernel attempts to transfer all port rights and memory regions in the body, and only destroys those that can't be transferred. @item MACH_RCV_INVALID_DATA The specified message buffer was not writable. The calling task did successfully receive the port rights and out-of-line memory regions in the message. @item MACH_MSG_SUCCESS A message was received. @end table Resource shortages can occur after a message is dequeued, while transferring port rights and out-of-line memory regions to the receiving task. The @code{mach_msg} call returns @code{MACH_RCV_HEADER_ERROR} or @code{MACH_RCV_BODY_ERROR} in this situation. These return codes always carry extra bits (bitwise-ored) that indicate the nature of the resource shortage: @table @code @item MACH_MSG_IPC_SPACE There was no room in the task's IPC name space for another port name. @item MACH_MSG_VM_SPACE There was no room in the task's VM address space for an out-of-line memory region. @item MACH_MSG_IPC_KERNEL A kernel resource shortage prevented the reception of a port right. @item MACH_MSG_VM_KERNEL A kernel resource shortage prevented the reception of an out-of-line memory region. @end table If a resource shortage prevents the reception of a port right, the port right is destroyed and the caller sees the name @code{MACH_PORT_NULL}. If a resource shortage prevents the reception of an out-of-line memory region, the region is destroyed and the caller receives a zero address. In addition, the @code{msgt_size} (@code{msgtl_size}) field in the data's type descriptor is changed to zero. If a resource shortage prevents the reception of out-of-line memory carrying port rights, then the port rights are always destroyed if the memory region can not be received. A task never receives port rights or memory regions that it isn't told about. @node Atomicity @subsection Atomicity The @code{mach_msg} call handles port rights in a message header atomically. Port rights and out-of-line memory in a message body do not enjoy this atomicity guarantee. The message body may be processed front-to-back, back-to-front, first out-of-line memory then port rights, in some random order, or even atomically. For example, consider sending a message with the destination port specified as @code{MACH_MSG_TYPE_MOVE_SEND} and the reply port specified as @code{MACH_MSG_TYPE_COPY_SEND}. The same send right, with one user-reference, is supplied for both the @code{msgh_remote_port} and @code{msgh_local_port} fields. Because @code{mach_msg} processes the message header atomically, this succeeds. If @code{msgh_remote_port} were processed before @code{msgh_local_port}, then @code{mach_msg} would return @code{MACH_SEND_INVALID_REPLY} in this situation. On the other hand, suppose the destination and reply port are both specified as @code{MACH_MSG_TYPE_MOVE_SEND}, and again the same send right with one user-reference is supplied for both. Now the send operation fails, but because it processes the header atomically, mach_msg can return either @code{MACH_SEND_INVALID_DEST} or @code{MACH_SEND_INVALID_REPLY}. For example, consider receiving a message at the same time another thread is deallocating the destination receive right. Suppose the reply port field carries a send right for the destination port. If the deallocation happens before the dequeuing, then the receiver gets @code{MACH_RCV_PORT_DIED}. If the deallocation happens after the receive, then the @code{msgh_local_port} and the @code{msgh_remote_port} fields both specify the same right, which becomes a dead name when the receive right is deallocated. If the deallocation happens between the dequeue and the receive, then the @code{msgh_local_port} and @code{msgh_remote_port} fields both specify @code{MACH_PORT_DEAD}. Because the header is processed atomically, it is not possible for just one of the two fields to hold @code{MACH_PORT_DEAD}. The @code{MACH_RCV_NOTIFY} option provides a more likely example. Suppose a message carrying a send-once right reply port is received with @code{MACH_RCV_NOTIFY} at the same time the reply port is destroyed. If the reply port is destroyed first, then @code{msgh_remote_port} specifies @code{MACH_PORT_DEAD} and the kernel does not generate a dead-name notification. If the reply port is destroyed after it is received, then @code{msgh_remote_port} specifies a dead name for which the kernel generates a dead-name notification. It is not possible to receive the reply port right and have it turn into a dead name before the dead-name notification is requested; as part of the message header the reply port is received atomically. @node Port Manipulation Interface @section Port Manipulation Interface This section describes the interface to create, destroy and manipulate ports, port rights and port sets. @cindex IPC space port @cindex port representing an IPC space @deftp {Data type} ipc_space_t This is a @code{task_t} (and as such a @code{mach_port_t}), which holds a port name associated with a port that represents an IPC space in the kernel. An IPC space is used by the kernel to manage the port names and rights available to a task. The IPC space doesn't get a port name of its own. Instead the port name of the task containing the IPC space is used to name the IPC space of the task (as is indicated by the fact that the type of @code{ipc_space_t} is actually @code{task_t}). The IPC spaces of tasks are the only ones accessible outside of the kernel. @end deftp @menu * Port Creation:: How to create new ports and port sets. * Port Destruction:: How to destroy ports and port sets. * Port Names:: How to query and manipulate port names. * Port Rights:: How to work with port rights. * Ports and other Tasks:: How to move rights between tasks. * Receive Rights:: How to work with receive rights. * Port Sets:: How to work with port sets. * Request Notifications:: How to request notifications for events. @c * Inherited Ports:: How to work with the inherited system ports. @end menu @node Port Creation @subsection Port Creation @deftypefun kern_return_t mach_port_allocate (@w{ipc_space_t @var{task}}, @w{mach_port_right_t @var{right}}, @w{mach_port_t *@var{name}}) The @code{mach_port_allocate} function creates a new right in the specified task. The new right's name is returned in @var{name}, which may be any name that wasn't in use. The @var{right} argument takes the following values: @table @code @item MACH_PORT_RIGHT_RECEIVE @code{mach_port_allocate} creates a port. The new port is not a member of any port set. It doesn't have any extant send or send-once rights. Its make-send count is zero, its sequence number is zero, its queue limit is @code{MACH_PORT_QLIMIT_DEFAULT}, and it has no queued messages. @var{name} denotes the receive right for the new port. @var{task} does not hold send rights for the new port, only the receive right. @code{mach_port_insert_right} and @code{mach_port_extract_right} can be used to convert the receive right into a combined send/receive right. @item MACH_PORT_RIGHT_PORT_SET @code{mach_port_allocate} creates a port set. The new port set has no members. @item MACH_PORT_RIGHT_DEAD_NAME @code{mach_port_allocate} creates a dead name. The new dead name has one user reference. @end table The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_VALUE} if @var{right} was invalid, @code{KERN_NO_SPACE} if there was no room in @var{task}'s IPC name space for another right and @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory. The @code{mach_port_allocate} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun mach_port_t mach_reply_port () The @code{mach_reply_port} system call creates a reply port in the calling task. @code{mach_reply_port} creates a port, giving the calling task the receive right for the port. The call returns the name of the new receive right. This is very much like creating a receive right with the @code{mach_port_allocate} call, with two differences. First, @code{mach_reply_port} is a system call and not an RPC (which requires a reply port). Second, the port created by @code{mach_reply_port} may be optimized for use as a reply port. The function returns @code{MACH_PORT_NULL} if a resource shortage prevented the creation of the receive right. @end deftypefun @deftypefun kern_return_t mach_port_allocate_name (@w{ipc_space_t @var{task}}, @w{mach_port_right_t @var{right}}, @w{mach_port_t @var{name}}) The function @code{mach_port_allocate_name} creates a new right in the specified task, with a specified name for the new right. @var{name} must not already be in use for some right, and it can't be the reserved values @code{MACH_PORT_NULL} and @code{MACH_PORT_DEAD}. The @var{right} argument takes the following values: @table @code @item MACH_PORT_RIGHT_RECEIVE @code{mach_port_allocate_name} creates a port. The new port is not a member of any port set. It doesn't have any extant send or send-once rights. Its make-send count is zero, its sequence number is zero, its queue limit is @code{MACH_PORT_QLIMIT_DEFAULT}, and it has no queued messages. @var{name} denotes the receive right for the new port. @var{task} does not hold send rights for the new port, only the receive right. @code{mach_port_insert_right} and @code{mach_port_extract_right} can be used to convert the receive right into a combined send/receive right. @item MACH_PORT_RIGHT_PORT_SET @code{mach_port_allocate_name} creates a port set. The new port set has no members. @item MACH_PORT_RIGHT_DEAD_NAME @code{mach_port_allocate_name} creates a new dead name. The new dead name has one user reference. @end table The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_VALUE} if @var{right} was invalid or @var{name} was @code{MACH_PORT_NULL} or @code{MACH_PORT_DEAD}, @code{KERN_NAME_EXISTS} if @var{name} was already in use for a port right and @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory. The @code{mach_port_allocate_name} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Port Destruction @subsection Port Destruction @deftypefun kern_return_t mach_port_deallocate (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}) The function @code{mach_port_deallocate} releases a user reference for a right in @var{task}'s IPC name space. It allows a task to release a user reference for a send or send-once right without failing if the port has died and the right is now actually a dead name. If @var{name} denotes a dead name, send right, or send-once right, then the right loses one user reference. If it only had one user reference, then the right is destroyed. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right and @code{KERN_INVALID_RIGHT} if @var{name} denoted an invalid right. The @code{mach_port_deallocate} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_destroy (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}) The function @code{mach_port_destroy} deallocates all rights denoted by a name. The name becomes immediately available for reuse. For most purposes, @code{mach_port_mod_refs} and @code{mach_port_deallocate} are preferable. If @var{name} denotes a port set, then all members of the port set are implicitly removed from the port set. If @var{name} denotes a receive right that is a member of a port set, the receive right is implicitly removed from the port set. If there is a port-destroyed request registered for the port, then the receive right is not actually destroyed, but instead is sent in a port-destroyed notification to the backup port. If there is no registered port-destroyed request, remaining messages queued to the port are destroyed and extant send and send-once rights turn into dead names. If those send and send-once rights have dead-name requests registered, then dead-name notifications are generated for them. If @var{name} denotes a send-once right, then the send-once right is used to produce a send-once notification for the port. If @var{name} denotes a send-once, send, and/or receive right, and it has a dead-name request registered, then the registered send-once right is used to produce a port-deleted notification for the name. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right. The @code{mach_port_destroy} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Port Names @subsection Port Names @deftypefun kern_return_t mach_port_names (@w{ipc_space_t @var{task}}, @w{mach_port_array_t *@var{names}}, @w{mach_msg_type_number_t *@var{ncount}}, @w{mach_port_type_array_t *@var{types}}, @w{mach_msg_type_number_t *@var{tcount}}) The function @code{mach_port_names} returns information about @var{task}'s port name space. For each name, it also returns what type of rights @var{task} holds. (The same information returned by @code{mach_port_type}.) @var{names} and @var{types} are arrays that are automatically allocated when the reply message is received. The user should @code{vm_deallocate} them when the data is no longer needed. @code{mach_port_names} will return in @var{names} the names of the ports, port sets, and dead names in the task's port name space, in no particular order and in @var{ncount} the number of names returned. It will return in @var{types} the type of each corresponding name, which indicates what kind of rights the task holds with that name. @var{tcount} should be the same as @var{ncount}. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory. The @code{mach_port_names} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_type (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_type_t *@var{ptype}}) The function @code{mach_port_type} returns information about @var{task}'s rights for a specific name in its port name space. The returned @var{ptype} is a bitmask indicating what rights @var{task} holds for the port, port set or dead name. The bitmask is composed of the following bits: @table @code @item MACH_PORT_TYPE_SEND The name denotes a send right. @item MACH_PORT_TYPE_RECEIVE The name denotes a receive right. @item MACH_PORT_TYPE_SEND_ONCE The name denotes a send-once right. @item MACH_PORT_TYPE_PORT_SET The name denotes a port set. @item MACH_PORT_TYPE_DEAD_NAME The name is a dead name. @item MACH_PORT_TYPE_DNREQUEST A dead-name request has been registered for the right. @item MACH_PORT_TYPE_MAREQUEST A msg-accepted request for the right is pending. @item MACH_PORT_TYPE_COMPAT The port right was created in the compatibility mode. @end table The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid and @code{KERN_INVALID_NAME} if @var{name} did not denote a right. The @code{mach_port_type} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_rename (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{old_name}}, @w{mach_port_t @var{new_name}}) The function @code{mach_port_rename} changes the name by which a port, port set, or dead name is known to @var{task}. @var{old_name} is the original name and @var{new_name} the new name for the port right. @var{new_name} must not already be in use, and it can't be the distinguished values @code{MACH_PORT_NULL} and @code{MACH_PORT_DEAD}. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{old_name} did not denote a right, @code{KERN_INVALID_VALUE} if @var{new_name} was @code{MACH_PORT_NULL} or @code{MACH_PORT_DEAD}, @code{KERN_NAME_EXISTS} if @code{new_name} already denoted a right and @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory. The @code{mach_port_rename} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Port Rights @subsection Port Rights @deftypefun kern_return_t mach_port_get_refs (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_right_t @var{right}}, @w{mach_port_urefs_t *@var{refs}}) The function @code{mach_port_get_refs} returns the number of user references a task has for a right. The @var{right} argument takes the following values: @itemize @bullet @item @code{MACH_PORT_RIGHT_SEND} @item @code{MACH_PORT_RIGHT_RECEIVE} @item @code{MACH_PORT_RIGHT_SEND_ONCE} @item @code{MACH_PORT_RIGHT_PORT_SET} @item @code{MACH_PORT_RIGHT_DEAD_NAME} @end itemize If @var{name} denotes a right, but not the type of right specified, then zero is returned. Otherwise a positive number of user references is returned. Note that a name may simultaneously denote send and receive rights. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_VALUE} if @var{right} was invalid and @code{KERN_INVALID_NAME} if @var{name} did not denote a right. The @code{mach_port_get_refs} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_mod_refs (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_right_t @var{right}}, @w{mach_port_delta_t @var{delta}}) The function @code{mach_port_mod_refs} requests that the number of user references a task has for a right be changed. This results in the right being destroyed, if the number of user references is changed to zero. The task holding the right is @var{task}, @var{name} should denote the specified right. @var{right} denotes the type of right being modified. @var{delta} is the signed change to the number of user references. The @var{right} argument takes the following values: @itemize @bullet @item @code{MACH_PORT_RIGHT_SEND} @item @code{MACH_PORT_RIGHT_RECEIVE} @item @code{MACH_PORT_RIGHT_SEND_ONCE} @item @code{MACH_PORT_RIGHT_PORT_SET} @item @code{MACH_PORT_RIGHT_DEAD_NAME} @end itemize The number of user references for the right is changed by the amount @var{delta}, subject to the following restrictions: port sets, receive rights, and send-once rights may only have one user reference. The resulting number of user references can't be negative. If the resulting number of user references is zero, the effect is to deallocate the right. For dead names and send rights, there is an implementation-defined maximum number of user references. If the call destroys the right, then the effect is as described for @code{mach_port_destroy}, with the exception that @code{mach_port_destroy} simultaneously destroys all the rights denoted by a name, while @code{mach_port_mod_refs} can only destroy one right. The name will be available for reuse if it only denoted the one right. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_VALUE} if @var{right} was invalid or the user-reference count would become negative, @code{KERN_INVALID_NAME} if @var{name} did not denote a right, @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but not the specified right and @code{KERN_UREFS_OVERFLOW} if the user-reference count would overflow. The @code{mach_port_mod_refs} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Ports and other Tasks @subsection Ports and other Tasks @deftypefun kern_return_t mach_port_insert_right (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_t @var{right}}, @w{mach_msg_type_name_t @var{right_type}}) The function @var{mach_port_insert_right} inserts into @var{task} the caller's right for a port, using a specified name for the right in the target task. The specified @var{name} can't be one of the reserved values @code{MACH_PORT_NULL} or @code{MACH_PORT_DEAD}. The @var{right} can't be @code{MACH_PORT_NULL} or @code{MACH_PORT_DEAD}. The argument @var{right_type} specifies a right to be inserted and how that right should be extracted from the caller. It should be a value appropriate for @var{msgt_name}; see @code{mach_msg}. @c XXX cross ref If @var{right_type} is @code{MACH_MSG_TYPE_MAKE_SEND}, @code{MACH_MSG_TYPE_MOVE_SEND}, or @code{MACH_MSG_TYPE_COPY_SEND}, then a send right is inserted. If the target already holds send or receive rights for the port, then @var{name} should denote those rights in the target. Otherwise, @var{name} should be unused in the target. If the target already has send rights, then those send rights gain an additional user reference. Otherwise, the target gains a send right, with a user reference count of one. If @var{right_type} is @code{MACH_MSG_TYPE_MAKE_SEND_ONCE} or @code{MACH_MSG_TYPE_MOVE_SEND_ONCE}, then a send-once right is inserted. The name should be unused in the target. The target gains a send-once right. If @var{right_type} is @code{MACH_MSG_TYPE_MOVE_RECEIVE}, then a receive right is inserted. If the target already holds send rights for the port, then name should denote those rights in the target. Otherwise, name should be unused in the target. The receive right is moved into the target task. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_VALUE} if @var{right} was not a port right or @var{name} was @code{MACH_PORT_NULL} or @code{MACH_PORT_DEAD}, @code{KERN_NAME_EXISTS} if @var{name} already denoted a right, @code{KERN_INVALID_CAPABILITY} if @var{right} was @code{MACH_PORT_NULL} or @code{MACH_PORT_DEAD} @code{KERN_RIGHT_EXISTS} if @var{task} already had rights for the port, with a different name, @code{KERN_UREFS_OVERFLOW} if the user-reference count would overflow and @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory. The @code{mach_port_insert_right} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_extract_right (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_msg_type_name_t @var{desired_type}}, @w{mach_port_t *@var{right}}, @w{mach_msg_type_name_t *@var{acquired_type}}) The function @var{mach_port_extract_right} extracts a port right from the target @var{task} and returns it to the caller as if the task sent the right voluntarily, using @var{desired_type} as the value of @var{msgt_name}. @xref{Mach Message Call}. The returned value of @var{acquired_type} will be @code{MACH_MSG_TYPE_PORT_SEND} if a send right is extracted, @code{MACH_MSG_TYPE_PORT_RECEIVE} if a receive right is extracted, and @code{MACH_MSG_TYPE_PORT_SEND_ONCE} if a send-once right is extracted. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right, @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but an invalid one, @code{KERN_INVALID_VALUE} if @var{desired_type} was invalid. The @code{mach_port_extract_right} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Receive Rights @subsection Receive Rights @deftp {Data type} mach_port_seqno_t The @code{mach_port_seqno_t} data type is an @code{unsigned int} which contains the sequence number of a port. @end deftp @deftp {Data type} mach_port_mscount_t The @code{mach_port_mscount_t} data type is an @code{unsigned int} which contains the make-send count for a port. @end deftp @deftp {Data type} mach_port_msgcount_t The @code{mach_port_msgcount_t} data type is an @code{unsigned int} which contains a number of messages. @end deftp @deftp {Data type} mach_port_rights_t The @code{mach_port_rights_t} data type is an @code{unsigned int} which contains a number of rights for a port. @end deftp @deftp {Data type} mach_port_status_t This structure contains some status information about a port, which can be queried with @code{mach_port_get_receive_status}. It has the following members: @table @code @item mach_port_t mps_pset The containing port set. @item mach_port_seqno_t mps_seqno The sequence number. @item mach_port_mscount_t mps_mscount The make-send count. @item mach_port_msgcount_t mps_qlimit The maximum number of messages in the queue. @item mach_port_msgcount_t mps_msgcount The current number of messages in the queue. @item mach_port_rights_t mps_sorights The number of send-once rights that exist. @item boolean_t mps_srights @code{TRUE} if send rights exist. @item boolean_t mps_pdrequest @code{TRUE} if port-deleted notification is requested. @item boolean_t mps_nsrequest @code{TRUE} if no-senders notification is requested. @end table @end deftp @deftypefun kern_return_t mach_port_get_receive_status (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_status_t *@var{status}}) The function @code{mach_port_get_receive_status} returns the current status of the specified receive right. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right and @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but not a receive right. The @code{mach_port_get_receive_status} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_set_mscount (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_mscount_t @var{mscount}}) The function @code{mach_port_set_mscount} changes the make-send count of @var{task}'s receive right named @var{name} to @var{mscount}. All values for @var{mscount} are valid. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right and @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but not a receive right. The @code{mach_port_set_mscount} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_set_qlimit (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_msgcount_t @var{qlimit}}) The function @code{mach_port_set_qlimit} changes the queue limit @var{task}'s receive right named @var{name} to @var{qlimit}. Valid values for @var{qlimit} are between zero and @code{MACH_PORT_QLIMIT_MAX}, inclusive. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right, @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but not a receive right and @code{KERN_INVALID_VALUE} if @var{qlimit} was invalid. The @code{mach_port_set_qlimit} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_set_seqno (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_seqno_t @var{seqno}}) The function @code{mach_port_set_seqno} changes the sequence number @var{task}'s receive right named @var{name} to @var{seqno}. All sequence number values are valid. The next message received from the port will be stamped with the specified sequence number. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right and @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but not a receive right. The @code{mach_port_set_seqno} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Port Sets @subsection Port Sets @deftypefun kern_return_t mach_port_get_set_status (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_port_array_t *@var{members}}, @w{mach_msg_type_number_t *@var{count}}) The function @code{mach_port_get_set_status} returns the members of a port set. @var{members} is an array that is automatically allocated when the reply message is received. The user should @code{vm_deallocate} it when the data is no longer needed. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right, @code{KERN_INVALID_RIGHT} if @var{name} denoted a right, but not a receive right and @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory. The @code{mach_port_get_set_status} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t mach_port_move_member (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{member}}, @w{mach_port_t @var{after}}) The function @var{mach_port_move_member} moves the receive right @var{member} into the port set @var{after}. If the receive right is already a member of another port set, it is removed from that set first (the whole operation is atomic). If the port set is @code{MACH_PORT_NULL}, then the receive right is not put into a port set, but removed from its current port set. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_NAME} if @var{member} or @var{after} did not denote a right, @code{KERN_INVALID_RIGHT} if @var{member} denoted a right, but not a receive right or @var{after} denoted a right, but not a port set, and @code{KERN_NOT_IN_SET} if @var{after} was @code{MACH_PORT_NULL}, but @code{member} wasn't currently in a port set. The @code{mach_port_move_member} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @node Request Notifications @subsection Request Notifications @deftypefun kern_return_t mach_port_request_notification (@w{ipc_space_t @var{task}}, @w{mach_port_t @var{name}}, @w{mach_msg_id_t @var{variant}}, @w{mach_port_mscount_t @var{sync}}, @w{mach_port_t @var{notify}}, @w{mach_msg_type_name_t @var{notify_type}}, @w{mach_port_t *@var{previous}}) The function @code{mach_port_request_notification} registers a request for a notification and supplies the send-once right @var{notify} to which the notification will be sent. The @var{notify_type} denotes the IPC type for the send-once right, which can be @code{MACH_MSG_TYPE_MAKE_SEND_ONCE} or @code{MACH_MSG_TYPE_MOVE_SEND_ONCE}. It is an atomic swap, returning the previously registered send-once right (or @code{MACH_PORT_NULL} for none) in @var{previous}. A previous notification request may be cancelled by providing @code{MACH_PORT_NULL} for @var{notify}. The @var{variant} argument takes the following values: @table @code @item MACH_NOTIFY_PORT_DESTROYED @var{sync} must be zero. The @var{name} must specify a receive right, and the call requests a port-destroyed notification for the receive right. If the receive right were to have been destroyed, say by @code{mach_port_destroy}, then instead the receive right will be sent in a port-destroyed notification to the registered send-once right. @item MACH_NOTIFY_DEAD_NAME The call requests a dead-name notification. @var{name} specifies send, receive, or send-once rights for a port. If the port is destroyed (and the right remains, becoming a dead name), then a dead-name notification which carries the name of the right will be sent to the registered send-once right. If @var{notify} is not null and sync is non-zero, the name may specify a dead name, and a dead-name notification is immediately generated. Whenever a dead-name notification is generated, the user reference count of the dead name is incremented. For example, a send right with two user refs has a registered dead-name request. If the port is destroyed, the send right turns into a dead name with three user refs (instead of two), and a dead-name notification is generated. If the name is made available for reuse, perhaps because of @code{mach_port_destroy} or @code{mach_port_mod_refs}, or the name denotes a send-once right which has a message sent to it, then the registered send-once right is used to generate a port-deleted notification. @item MACH_NOTIFY_NO_SENDERS The call requests a no-senders notification. @var{name} must specify a receive right. If @var{notify} is not null, and the receive right's make-send count is greater than or equal to the sync value, and it has no extant send rights, than an immediate no-senders notification is generated. Otherwise the notification is generated when the receive right next loses its last extant send right. In either case, any previously registered send-once right is returned. The no-senders notification carries the value the port's make-send count had when it was generated. The make-send count is incremented whenever @code{MACH_MSG_TYPE_MAKE_SEND} is used to create a new send right from the receive right. The make-send count is reset to zero when the receive right is carried in a message. @end table The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_TASK} if @var{task} was invalid, @code{KERN_INVALID_VALUE} if @var{variant} was invalid, @code{KERN_INVALID_NAME} if @var{name} did not denote a right, @code{KERN_INVALID_RIGHT} if @var{name} denoted an invalid right and @code{KERN_INVALID_CAPABILITY} if @var{notify} was invalid. When using @code{MACH_NOTIFY_PORT_DESTROYED}, the function returns @code{KERN_INVALID_VALUE} if @var{sync} wasn't zero. When using @code{MACH_NOTIFY_DEAD_NAME}, the function returns @code{KERN_RESOURCE_SHORTAGE} if the kernel ran out of memory, @code{KERN_INVALID_ARGUMENT} if @var{name} denotes a dead name, but @var{sync} is zero or @var{notify} is @code{MACH_PORT_NULL}, and @code{KERN_UREFS_OVERFLOW} if @var{name} denotes a dead name, but generating an immediate dead-name notification would overflow the name's user-reference count. The @code{mach_port_request_notification} call is actually an RPC to @var{task}, normally a send right for a task port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @c The inherited ports concept is not used in the Hurd, @c and so the _SLOT macros are not defined in GNU Mach. @c @node Inherited Ports @c @subsection Inherited Ports @c @deftypefun kern_return_t mach_ports_register (@w{task_t @var{target_task}, @w{port_array_t @var{init_port_set}}, @w{int @var{init_port_array_count}}) @c @deftypefunx kern_return_t mach_ports_lookup (@w{task_t @var{target_task}, @w{port_array_t *@var{init_port_set}}, @w{int *@var{init_port_array_count}}) @c @code{mach_ports_register} manipulates the inherited ports array, @c @code{mach_ports_lookup} is used to acquire specific parent ports. @c @var{target_task} is the task to be affected. @var{init_port_set} is an @c array of system ports to be registered, or returned. Although the array @c size is given as variable, the kernel will only accept a limited number @c of ports. @var{init_port_array_count} is the number of ports returned @c in @var{init_port_set}. @c @code{mach_ports_register} registers an array of well-known system ports @c with the kernel on behalf of a specific task. Currently the ports to be @c registered are: the port to the Network Name Server, the port to the @c Environment Manager, and a port to the Service server. These port @c values must be placed in specific slots in the init_port_set. The slot @c numbers are given by the global constants defined in @file{mach_init.h}: @c @code{NAME_SERVER_SLOT}, @code{ENVIRONMENT_SLOT}, and @c @code{SERVICE_SLOT}. These ports may later be retrieved with @c @code{mach_ports_lookup}. @c When a new task is created (see @code{task_create}), the child task will @c be given access to these ports. Only port send rights may be @c registered. Furthermore, the number of ports which may be registered is @c fixed and given by the global constant @code{MACH_PORT_SLOTS_USED} @c Attempts to register too many ports will fail. @c It is intended that this mechanism be used only for task initialization, @c and then only by runtime support modules. A parent task has three @c choices in passing these system ports to a child task. Most commonly it @c can do nothing and its child will inherit access to the same @c @var{init_port_set} that the parent has; or a parent task may register a @c set of ports it wishes to have passed to all of its children by calling @c @code{mach_ports_register} using its task port; or it may make necessary @c modifications to the set of ports it wishes its child to see, and then @c register those ports using the child's task port prior to starting the @c child's thread(s). The @code{mach_ports_lookup} call which is done by @c @code{mach_init} in the child task will acquire these initial ports for @c the child. @c Tasks other than the Network Name Server and the Environment Mangager @c should not need access to the Service port. The Network Name Server port @c is the same for all tasks on a given machine. The Environment port is @c the only port likely to have different values for different tasks. @c Since the number of ports which may be registered is limited, ports @c other than those used by the runtime system to initialize a task should @c be passed to children either through an initial message, or through the @c Network Name Server for public ports, or the Environment Manager for @c private ports. @c The function returns @code{KERN_SUCCESS} if the memory was allocated, @c and @code{KERN_INVALID_ARGUMENT} if an attempt was made to register more @c ports than the current kernel implementation allows. @c @end deftypefun @node Virtual Memory Interface @chapter Virtual Memory Interface @cindex virtual memory map port @cindex port representing a virtual memory map @deftp {Data type} vm_task_t This is a @code{task_t} (and as such a @code{mach_port_t}), which holds a port name associated with a port that represents a virtual memory map in the kernel. An virtual memory map is used by the kernel to manage the address space of a task. The virtual memory map doesn't get a port name of its own. Instead the port name of the task provided with the virtual memory is used to name the virtual memory map of the task (as is indicated by the fact that the type of @code{vm_task_t} is actually @code{task_t}). The virtual memory maps of tasks are the only ones accessible outside of the kernel. @end deftp @menu * Memory Allocation:: Allocation of new virtual memory. * Memory Deallocation:: Freeing unused virtual memory. * Data Transfer:: Reading, writing and copying memory. * Memory Attributes:: Tweaking memory regions. * Mapping Memory Objects:: How to map memory objects. * Memory Statistics:: How to get statistics about memory usage. @end menu @node Memory Allocation @section Memory Allocation @deftypefun kern_return_t vm_allocate (@w{vm_task_t @var{target_task}}, @w{vm_address_t *@var{address}}, @w{vm_size_t @var{size}}, @w{boolean_t @var{anywhere}}) The function @code{vm_allocate} allocates a region of virtual memory, placing it in the specified @var{task}'s address space. The starting address is @var{address}. If the @var{anywhere} option is false, an attempt is made to allocate virtual memory starting at this virtual address. If this address is not at the beginning of a virtual page, it will be rounded down to one. If there is not enough space at this address, no memory will be allocated. If the @var{anywhere} option is true, the input value of this address will be ignored, and the space will be allocated wherever it is available. In either case, the address at which memory was actually allocated will be returned in @var{address}. @var{size} is the number of bytes to allocate (rounded by the system in a machine dependent way to an integral number of virtual pages). If @var{anywhere} is true, the kernel should find and allocate any region of the specified size, and return the address of the resulting region in address address, rounded to a virtual page boundary if there is sufficient space. The physical memory is not actually allocated until the new virtual memory is referenced. By default, the kernel rounds all addresses down to the nearest page boundary and all memory sizes up to the nearest page size. The global variable @code{vm_page_size} contains the page size. @code{mach_task_self} returns the value of the current task port which should be used as the @var{target_task} argument in order to allocate memory in the caller's address space. For languages other than C, these values can be obtained by the calls @code{vm_statistics} and @code{mach_task_self}. Initially, the pages of allocated memory will be protected to allow all forms of access, and will be inherited in child tasks as a copy. Subsequent calls to @code{vm_protect} and @code{vm_inherit} may be used to change these properties. The allocated region is always zero-filled. The function returns @code{KERN_SUCCESS} if the memory was successfully allocated, @code{KERN_INVALID_ADDRESS} if an invalid address was specified and @code{KERN_NO_SPACE} if there was not enough space left to satisfy the request. @end deftypefun @node Memory Deallocation @section Memory Deallocation @deftypefun kern_return_t vm_deallocate (@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}) @code{vm_deallocate} relinquishes access to a region of a @var{task}'s address space, causing further access to that memory to fail. This address range will be available for reallocation. @var{address} is the starting address, which will be rounded down to a page boundary. @var{size} is the number of bytes to deallocate, which will be rounded up to give a page boundary. Note, that because of the rounding to virtual page boundaries, more than @var{size} bytes may be deallocated. Use @code{vm_page_size} or @code{vm_statistics} to find out the current virtual page size. This call may be used to deallocte memory that was passed to a task in a message (via out of line data). In that case, the rounding should cause no trouble, since the region of memory was allocated as a set of pages. The @code{vm_deallocate} call affects only the task specified by the @var{target_task}. Other tasks which may have access to this memory may continue to reference it. The function returns @code{KERN_SUCCESS} if the memory was successfully deallocated and @code{KERN_INVALID_ADDRESS} if an invalid or non-allocated address was specified. @end deftypefun @node Data Transfer @section Data Transfer @deftypefun kern_return_t vm_read (@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}, @w{vm_offset_t *@var{data}}, @w{mach_msg_type_number_t *@var{data_count}}) The function @code{vm_read} allows one task's virtual memory to be read by another task. The @var{target_task} is the task whose memory is to be read. @var{address} is the first address to be read and must be on a page boundary. @var{size} is the number of bytes of data to be read and must be an integral number of pages. @var{data} is the array of data copied from the given task, and @var{data_count} is the size of the data array in bytes (will be an integral number of pages). Note that the data array is returned in a newly allocated region; the task reading the data should @code{vm_deallocate} this region when it is done with the data. The function returns @code{KERN_SUCCESS} if the memory was successfully read, @code{KERN_INVALID_ADDRESS} if an invalid or non-allocated address was specified or there was not @var{size} bytes of data following the address, @code{KERN_INVALID_ARGUMENT} if the address does not start on a page boundary or the size is not an integral number of pages, @code{KERN_PROTECTION_FAILURE} if the address region in the target task is protected against reading and @code{KERN_NO_SPACE} if there was not enough room in the callers virtual memory to allocate space for the data to be returned. @end deftypefun @deftypefun kern_return_t vm_write (@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_offset_t @var{data}}, @w{mach_msg_type_number_t @var{data_count}}) The function @code{vm_write} allows a task to write to the vrtual memory of @var{target_task}. @var{address} is the starting address in task to be affected. @var{data} is an array of bytes to be written, and @var{data_count} the size of the @var{data} array. The current implementation requires that @var{address}, @var{data} and @var{data_count} all be page-aligned. Otherwise, @code{KERN_INVALID_ARGUMENT} is returned. The function returns @code{KERN_SUCCESS} if the memory was successfully written, @code{KERN_INVALID_ADDRESS} if an invalid or non-allocated address was specified or there was not @var{data_count} bytes of allocated memory starting at @var{address} and @code{KERN_PROTECTION_FAILURE} if the address region in the target task is protected against writing. @end deftypefun @deftypefun kern_return_t vm_copy (@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{source_address}}, @w{vm_size_t @var{count}}, @w{vm_offset_t @var{dest_address}}) The function @code{vm_copy} causes the source memory range to be copied to the destination address. The source and destination memory ranges may overlap. The destination address range must already be allocated and writable; the source range must be readable. @code{vm_copy} is equivalent to @code{vm_read} followed by @code{vm_write}. The current implementation requires that @var{address}, @var{data} and @var{data_count} all be page-aligned. Otherwise, @code{KERN_INVALID_ARGUMENT} is returned. The function returns @code{KERN_SUCCESS} if the memory was successfully written, @code{KERN_INVALID_ADDRESS} if an invalid or non-allocated address was specified or there was insufficient memory allocated at one of the addresses and @code{KERN_PROTECTION_FAILURE} if the destination region was not writable or the source region was not readable. @end deftypefun @node Memory Attributes @section Memory Attributes @deftypefun kern_return_t vm_region (@w{vm_task_t @var{target_task}}, @w{vm_address_t *@var{address}}, @w{vm_size_t *@var{size}}, @w{vm_prot_t *@var{protection}}, @w{vm_prot_t *@var{max_protection}}, @w{vm_inherit_t *@var{inheritance}}, @w{boolean_t *@var{shared}}, @w{memory_object_name_t *@var{object_name}}, @w{vm_offset_t *@var{offset}}) The function @code{vm_region} returns a description of the specified region of @var{target_task}'s virtual address space. @code{vm_region} begins at @var{address} and looks forward through memory until it comes to an allocated region. If address is within a region, then that region is used. Various bits of information about the region are returned. If @var{address} was not within a region, then @var{address} is set to the start of the first region which follows the incoming value. In this way an entire address space can be scanned. The @var{size} returned is the size of the located region in bytes. @var{protection} is the current protection of the region, @var{max_protection} is the maximum allowable protection for this region. @var{inheritance} is the inheritance attribute for this region. @var{shared} tells if the region is shared or not. The port @var{object_name} identifies the memory object associated with this region, and @var{offset} is the offset into the pager object that this region begins at. @c XXX cross ref pager_init The function returns @code{KERN_SUCCESS} if the memory region was successfully located and the information returned and @code{KERN_NO_SPACE} if there is no region at or above @var{address} in the specified task. @end deftypefun @deftypefun kern_return_t vm_protect (@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}, @w{boolean_t @var{set_maximum}}, @w{vm_prot_t @var{new_protection}}) The function @code{vm_protect} sets the virtual memory access privileges for a range of allocated addresses in @var{target_task}'s virtual address space. The protection argument describes a combination of read, write, and execute accesses that should be @emph{permitted}. @var{address} is the starting address, which will be rounded down to a page boundary. @var{size} is the size in bytes of the region for which protection is to change, and will be rounded up to give a page boundary. If @var{set_maximum} is set, make the protection change apply to the maximum protection associated with this address range; otherwise, the current protection on this range is changed. If the maximum protection is reduced below the current protection, both will be changed to reflect the new maximum. @var{new_protection} is the new protection value for this region; a set of: @code{VM_PROT_READ}, @code{VM_PROT_WRITE}, @code{VM_PROT_EXECUTE}. The enforcement of virtual memory protection is machine-dependent. Nominally read access requires @code{VM_PROT_READ} permission, write access requires @code{VM_PROT_WRITE} permission, and execute access requires @code{VM_PROT_EXECUTE} permission. However, some combinations of access rights may not be supported. In particular, the kernel interface allows write access to require @code{VM_PROT_READ} and @code{VM_PROT_WRITE} permission and execute access to require @code{VM_PROT_READ} permission. The function returns @code{KERN_SUCCESS} if the memory was successfully protected, @code{KERN_INVALID_ADDRESS} if an invalid or non-allocated address was specified and @code{KERN_PROTECTION_FAILURE} if an attempt was made to increase the current or maximum protection beyond the existing maximum protection value. @end deftypefun @deftypefun kern_return_t vm_inherit (@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}, @w{vm_inherit_t @var{new_inheritance}}) The function @code{vm_inherit} specifies how a region of @var{target_task}'s address space is to be passed to child tasks at the time of task creation. Inheritance is an attribute of virtual pages, so @var{address} to start from will be rounded down to a page boundary and @var{size}, the size in bytes of the region for wihch inheritance is to change, will be rounded up to give a page boundary. How this memory is to be inherited in child tasks is specified by @var{new_inheritance}. Inheritance is specified by using one of these following three values: @table @code @item VM_INHERIT_SHARE Child tasks will share this memory with this task. @item VM_INHERIT_COPY Child tasks will receive a copy of this region. @item VM_INHERIT_NONE This region will be absent from child tasks. @end table Setting @code{vm_inherit} to @code{VM_INHERIT_SHARE} and forking a child task is the only way two Mach tasks can share physical memory. Remember that all the theads of a given task share all the same memory. The function returns @code{KERN_SUCCESS} if the memory inheritance was successfully set and @code{KERN_INVALID_ADDRESS} if an invalid or non-allocated address was specified. @end deftypefun @deftypefun kern_return_t vm_wire (@w{host_priv_t @var{host_priv}}, @w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}, @w{vm_prot_t @var{access}}) The function @code{vm_wire} allows privileged applications to control memory pageability. @var{host_priv} is the privileged host port for the host on which @var{target_task} resides. @var{address} is the starting address, which will be rounded down to a page boundary. @var{size} is the size in bytes of the region for which protection is to change, and will be rounded up to give a page boundary. @var{access} specifies the types of accesses that must not cause page faults. The semantics of a successful @code{vm_wire} operation are that memory in the specified range will not cause page faults for any accesses included in access. Data memory can be made non-pageable (wired) with a access argument of @code{VM_PROT_READ | VM_PROT_WRITE}. A special case is that @code{VM_PROT_NONE} makes the memory pageable. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_HOST} if @var{host_priv} was not the privileged host port, @code{KERN_INVALID_TASK} if @var{task} was not a valid task, @code{KERN_INVALID_VALUE} if @var{access} specified an invalid access mode, @code{KERN_FAILURE} if some memory in the specified range is not present or has an inappropriate protection value, and @code{KERN_INVALID_ARGUMENT} if unwiring (@var{access} is @code{VM_PROT_NONE}) and the memory is not already wired. The @code{vm_wire} call is actually an RPC to @var{host_priv}, normally a send right for a privileged host port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @end deftypefun @deftypefun kern_return_t vm_machine_attribute (@w{vm_task_t @var{task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}, @w{vm_prot_t @var{access}}, @w{vm_machine_attribute_t @var{attribute}}, @w{vm_machine_attribute_val_t @var{value}}) The function @code{vm_machine_attribute} specifies machine-specific attributes for a VM mapping, such as cachability, migrability, replicability. This is used on machines that allow the user control over the cache (this is the case for MIPS architectures) or placement of memory pages as in NUMA architectures (Non-Uniform Memory Access time) such as the IBM ACE multiprocessor. Machine-specific attributes can be consider additions to the machine-independent ones such as protection and inheritance, but they are not guaranteed to be supported by any given machine. Moreover, implementations of Mach on new architectures might find the need for new attribute types and or values besides the ones defined in the initial implementation. The types currently defined are @table @code @item MATTR_CACHE Controls caching of memory pages @item MATTR_MIGRATE Controls migrability of memory pages @item MATTR_REPLICATE Controls replication of memory pages @end table Corresponding values, and meaning of a specific call to @code{vm_machine_attribute} @table @code @item MATTR_VAL_ON Enables the attribute. Being enabled is the default value for any applicable attribute. @item MATTR_VAL_OFF Disables the attribute, making memory non-cached, or non-migratable, or non-replicatable. @item MATTR_VAL_GET Returns the current value of the attribute for the memory segment. If the attribute does not apply uniformly to the given range the value returned applies to the initial portion of the segment only. @item MATTR_VAL_CACHE_FLUSH Flush the memory pages from the Cache. The size value in this case might be meaningful even if not a multiple of the page size, depending on the implementation. @item MATTR_VAL_ICACHE_FLUSH Same as above, applied to the Instruction Cache alone. @item MATTR_VAL_DCACHE_FLUSH Same as above, applied to the Data Cache alone. @end table The function returns @code{KERN_SUCCESS} if call succeeded, and @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task, or @var{address} and @var{size} do not define a valid address range in task, or @var{attribute} is not a valid attribute type, or it is not implemented, or @var{value} is not a permissible value for attribute. @end deftypefun @node Mapping Memory Objects @section Mapping Memory Objects @deftypefun kern_return_t vm_map (@w{vm_task_t @var{target_task}}, @w{vm_address_t *@var{address}}, @w{vm_size_t @var{size}}, @w{vm_address_t @var{mask}}, @w{boolean_t @var{anywhere}}, @w{memory_object_t @var{memory_object}}, @w{vm_offset_t @var{offset}}, @w{boolean_t @var{copy}}, @w{vm_prot_t @var{cur_protection}}, @w{vm_prot_t @var{max_protection}}, @w{vm_inherit_t @var{inheritance}}) The function @code{vm_map} maps a region of virtual memory at the specified address, for which data is to be supplied by the given memory object, starting at the given offset within that object. In addition to the arguments used in @code{vm_allocate}, the @code{vm_map} call allows the specification of an address alignment parameter, and of the initial protection and inheritance values. @c XXX See the descriptions of vm_allocate, vm_protect , and vm_inherit If the memory object in question is not currently in use, the kernel will perform a @code{memory_object_init} call at this time. If the copy parameter is asserted, the specified region of the memory object will be copied to this address space; changes made to this object by other tasks will not be visible in this mapping, and changes made in this mapping will not be visible to others (or returned to the memory object). The @code{vm_map} call returns once the mapping is established. Completion of the call does not require any action on the part of the memory manager. Warning: Only memory objects that are provided by bona fide memory managers should be used in the @code{vm_map} call. A memory manager must implement the memory object interface described elsewhere in this manual. If other ports are used, a thread that accesses the mapped virtual memory may become permanently hung or may receive a memory exception. @var{target_task} is the task to be affected. The starting address is @var{address}. If the @var{anywhere} option is used, this address is ignored. The address actually allocated will be returned in @var{address}. @var{size} is the number of bytes to allocate (rounded by the system in a machine dependent way). The alignment restriction is specified by @var{mask}. Bits asserted in this mask must not be asserted in the address returned. If @var{anywhere} is set, the kernel should find and allocate any region of the specified size, and return the address of the resulting region in @var{address}. @var{memory_object} is the port that represents the memory object: used by user tasks in @code{vm_map}; used by the make requests for data or other management actions. If this port is @code{MEMORY_OBJECT_NULL}, then zero-filled memory is allocated instead. Within a memory object, @var{offset} specifes an offset in bytes. This must be page aligned. If @var{copy} is set, the range of the memory object should be copied to the target task, rather than mapped read-write. The function returns @code{KERN_SUCCESS} if the object is mapped, @code{KERN_NO_SPACE} if no unused region of the task's virtual address space that meets the address, size, and alignment criteria could be found, and @code{KERN_INVALID_ARGUMENT} if an invalid argument was provided. @end deftypefun @node Memory Statistics @section Memory Statistics @deftp {Data type} vm_statistics_data_t This structure is returned in @var{vm_stats} by the @code{vm_statistics} function and provides virtual memory statistics for the system. It has the following members: @table @code @item long pagesize The page size in bytes. @item long free_count The number of free pages. @item long active_count The umber of active pages. @item long inactive_count The number of inactive pages. @item long wire_count The number of pages wired down. @item long zero_fill_count The number of zero filled pages. @item long reactivations The number of reactivated pages. @item long pageins The number of pageins. @item long pageouts The number of pageouts. @item long faults The number of faults. @item long cow_faults The number of copy-on-writes. @item long lookups The number of object cache lookups. @item long hits The number of object cache hits. @end table @end deftp @deftypefun kern_return_t vm_statistics (@w{vm_task_t @var{target_task}}, @w{vm_statistics_data_t *@var{vm_stats}}) The function @code{vm_statistics} returns the statistics about the kernel's use of virtual memory since the kernel was booted. @code{pagesize} can also be found as a global variable @code{vm_page_size} which is set at task initialization and remains constant for the life of the task. @end deftypefun @node External Memory Management @chapter External Memory Management @menu * Memory Object Server:: The basics of external memory management. * Memory Object Creation:: How new memory objects are created. * Memory Object Termination:: How memory objects are terminated. * Memory Objects and Data:: Data transfer to and from memory objects. * Memory Object Locking:: How memory objects are locked. * Memory Object Attributes:: Manipulating attributes of memory objects. * Default Memory Manager:: Setting and using the default memory manager. @end menu @node Memory Object Server @section Memory Object Server @deftypefun boolean_t memory_object_server (@w{msg_header_t *@var{in_msg}}, @w{msg_header_t *@var{out_msg}}) @deftypefunx boolean_t memory_object_default_server (@w{msg_header_t *@var{in_msg}}, @w{msg_header_t *@var{out_msg}}) @deftypefunx boolean_t seqnos_memory_object_server (@w{msg_header_t *@var{in_msg}}, @w{msg_header_t *@var{out_msg}}) @deftypefunx boolean_t seqnos_memory_object_default_server (@w{msg_header_t *@var{in_msg}}, @w{msg_header_t *@var{out_msg}}) A memory manager is a server task that responds to specific messages from the kernel in order to handle memory management functions for the kernel. In order to isolate the memory manager from the specifics of message formatting, the remote procedure call generator produces a procedure, @code{memory_object_server}, to handle a received message. This function does all necessary argument handling, and actually calls one of the following functions: @code{memory_object_init}, @code{memory_object_data_write}, @code{memory_object_data_return}, @code{memory_object_data_request}, @code{memory_object_data_unlock}, @code{memory_object_lock_completed}, @code{memory_object_copy}, @code{memory_object_terminate}. The @strong{default memory manager} may get two additional requests from the kernel: @code{memory_object_create} and @code{memory_object_data_initialize}. The remote procedure call generator produces a procedure @code{memory_object_default_server} to handle those functions specific to the default memory manager. The @code{seqnos_memory_object_server} and @code{seqnos_memory_object_default_server} differ from @code{memory_object_server} and @code{memory_object_default_server} in that they supply message sequence numbers to the server interfaces. They call the @code{seqnos_memory_object_*} functions, which complement the @code{memory_object_*} set of functions. The return value from the @code{memory_object_server} function indicates that the message was appropriate to the memory management interface (returning @code{TRUE}), or that it could not handle this message (returning @code{FALSE}). The @var{in_msg} argument is the message that has been received from the kernel. The @var{out_msg} is a reply message, but this is not used for this server. The function returns @code{TRUE} to indicate that the message in question was applicable to this interface, and that the appropriate routine was called to interpret the message. It returns @code{FALSE} to indicate that the message did not apply to this interface, and that no other action was taken. @end deftypefun @node Memory Object Creation @section Memory Object Creation @deftypefun kern_return_t memory_object_init (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{memory_object_name_t @var{memory_object_name}}, @w{vm_size_t @var{memory_object_page_size}}) @deftypefunx kern_return_t seqnos_memory_object_init (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{memory_object_name_t @var{memory_object_name}}, @w{vm_size_t @var{memory_object_page_size}}) The function @code{memory_object_init} serves as a notification that the kernel has been asked to map the given memory object into a task's virtual address space. Additionally, it provides a port on which the memory manager may issue cache management requests, and a port which the kernel will use to name this data region. In the event that different each will perform a @code{memory_object_init} call with new request and name ports. The virtual page size that is used by the calling kernel is included for planning purposes. When the memory manager is prepared to accept requests for data for this object, it must call @code{memory_object_ready} with the attribute. Otherwise the kernel will not process requests on this object. To reject all mappings of this object, the memory manager may use @code{memory_object_destroy}. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{memory_object_name} is a port used by the kernel to refer to the memory object data in reponse to @code{vm_region} calls. @code{memory_object_page_size} is the page size to be used by this kernel. All data sizes in calls involving this kernel must be an integral multiple of the page size. Note that different kernels, indicated by a different @code{memory_control}, may have different page sizes. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_ready (@w{memory_object_control_t @var{memory_control}}, @w{boolean_t @var{may_cache_object}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}) The function @code{memory_object_ready} informs the kernel that the memory manager is ready to receive data or unlock requests on behalf of the clients. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. If @var{may_cache_object} is set, the kernel may keep data associated with this memory object, even after virtual memory references to it are gone. @var{copy_strategy} tells how the kernel should copy regions of the associated memory object. There are three possible caching strategies: @code{MEMORY_OBJECT_COPY_NONE} which specifies that nothing special should be done when data in the object is copied; @code{MEMORY_OBJECT_COPY_CALL} which specifies that the memory manager should be notified via a @code{memory_object_copy} call before any part of the object is copied; and @code{MEMORY_OBJECT_COPY_DELAY} which guarantees that the memory manager does not externally modify the data so that the kernel can use its normal copy-on-write algorithms. @code{MEMORY_OBJECT_COPY_DELAY} is the strategy most commonly used. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @node Memory Object Termination @section Memory Object Termination @deftypefun kern_return_t memory_object_terminate (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{memory_object_name_t @var{memory_object_name}}) @deftypefunx kern_return_t seqnos_memory_object_terminate (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{memory_object_name_t @var{memory_object_name}}) The function @code{memory_object_terminate} indicates that the kernel has completed its use of the given memory object. All rights to the memory object control and name ports are included, so that the memory manager can destroy them (using @code{mach_port_deallocate}) after doing appropriate bookkeeping. The kernel will terminate a memory object only after all address space mappings of that memory object have been deallocated, or upon explicit request by the memory manager. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{memory_object_name} is a port used by the kernel to refer to the memory object data in reponse to @code{vm_region} calls. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_destroy (@w{memory_object_control_t @var{memory_control}}, @w{kern_return_t @var{reason}}) The function @code{memory_object_destroy} tells the kernel to shut down the memory object. As a result of this call the kernel will no longer support paging activity or any @code{memory_object} calls on this object, and all rights to the memory object port, the memory control port and the memory name port will be returned to the memory manager in a memory_object_terminate call. If the memory manager is concerned that any modified cached data be returned to it before the object is terminated, it should call @code{memory_object_lock_request} with @var{should_flush} set and a lock value of @code{VM_PROT_WRITE} before making this call. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. @var{reason} is an error code indicating why the object must be destroyed. @c The error code is currently ingnored. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @node Memory Objects and Data @section Memory Objects and Data @deftypefun kern_return_t memory_object_data_return (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}, @w{boolean_t @var{dirty}}, @w{boolean_t @var{kernel_copy}}) @deftypefunx kern_return_t seqnos_memory_object_data_return (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}, @w{boolean_t @var{dirty}}, @w{boolean_t @var{kernel_copy}}) The function @code{memory_object_data_return} provides the memory manager with data that has been modified while cached in physical memory. Once the memory manager no longer needs this data (e.g., it has been written to another storage medium), it should be deallocated using @code{vm_deallocate}. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. This will be page aligned. @var{data} is the data which has been modified while cached in physical memory. @var{data_count} is the amount of data to be written, in bytes. This will be an integral number of memory object pages. The kernel will also use this call to return precious pages. If an unmodified precious age is returned, @var{dirty} is set to @code{FALSE}, otherwise it is @code{TRUE}. If @var{kernel_copy} is @code{TRUE}, the kernel kept a copy of the page. Precious data remains precious if the kernel keeps a copy. The indication that the kernel kept a copy is only a hint if the data is not precious; the cleaned copy may be discarded without further notifying the manager. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_data_request (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{length}}, @w{vm_prot_t @var{desired_access}}) @deftypefunx kern_return_t seqnos_memory_object_data_request (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{length}}, @w{vm_prot_t @var{desired_access}}) The function @code{memory_object_data_request} is a request for data from the specified memory object, for at least the access specified. The memory manager is expected to return at least the specified data, with as much access as it can allow, using @code{memory_object_data_supply}. If the memory manager is unable to provide the data (for example, because of a hardware error), it may use the @code{memory_object_data_error} call. The @code{memory_object_data_unavailable} call may be used to tell the kernel to supply zero-filled memory for this region. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. This will be page aligned. @var{length} is the number of bytes of data, starting at @var{offset}, to which this call refers. This will be an integral number of memory object pages. @var{desired_access} is a protection value describing the memory access modes which must be permitted on the specified cached data. One or more of: @code{VM_PROT_READ}, @code{VM_PROT_WRITE} or @code{VM_PROT_EXECUTE}. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_data_supply (@w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}, @w{vm_prot_t @var{lock_value}}, @w{boolean_t @var{precious}}, @w{mach_port_t @var{reply}}) The function @code{memory_object_data_supply} supplies the kernel with data for the specified memory object. Ordinarily, memory managers should only provide data in reponse to @code{memory_object_data_request} calls from the kernel (but they may provide data in advance as desired). When data already held by this kernel is provided again, the new data is ignored. The kernel may not provide any data (or protection) consistency among pages with different virtual page alignments within the same object. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. @var{offset} is an offset within a memory object in bytes. This must be page aligned. @var{data} is the data that is being provided to the kernel. This is a pointer to the data. @var{data_count} is the amount of data to be provided. Only whole virtual pages of data can be accepted; partial pages will be discarded. @var{lock_value} is a protection value indicating those forms of access that should @strong{not} be permitted to the specified cached data. The lock values must be one or more of the set: @code{VM_PROT_NONE}, @code{VM_PROT_READ}, @code{VM_PROT_WRITE}, @code{VM_PROT_EXECUTE} and @code{VM_PROT_ALL} as defined in @file{mach/vm_prot.h}. If @var{precious} is @code{FALSE}, the kernel treats the data as a temporary and may throw it away if it hasn't been changed. If the @var{precious} value is @code{TRUE}, the kernel treats its copy as a data repository and promises to return it to the manager; the manager may tell the kernel to throw it away instead by flushing and not cleaning the data (see @code{memory_object_lock_request}). If @var{reply_to} is not @code{MACH_PORT_NULL}, the kernel will send a completion message to the provided port (see @code{memory_object_supply_completed}). This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @deftypefun kern_return_t memory_object_supply_completed (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}, @w{kern_return_t @var{result}}, @w{vm_offset_t @var{error_offset}}) @deftypefunx kern_return_t seqnos_memory_object_supply_completed (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}, @w{kern_return_t @var{result}}, @w{vm_offset_t @var{error_offset}}) The function @code{memory_object_supply_completed} indicates that a previous @code{memory_object_data_supply} has been completed. Note that this call is made on whatever port was specified in the @code{memory_object_data_supply} call; that port need not be the memory object port itself. No reply is expected after this call. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. @var{length} is the length of the data covered by the lock request. The @var{result} parameter indicates what happened during the supply. If it is not @code{KERN_SUCCESS}, then @var{error_offset} identifies the first offset at which a problem occurred. The pagein operation stopped at this point. Note that the only failures reported by this mechanism are @code{KERN_MEMORY_PRESENT}. All other failures (invalid argument, error on pagein of supplied data in manager's address space) cause the entire operation to fail. @end deftypefun @deftypefun kern_return_t memory_object_data_error (@w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{size}}, @w{kern_return_t @var{reason}}) The function @code{memory_object_data_error} indicates that the memory manager cannot return the data requested for the given region, specifying a reason for the error. This is typically used when a hardware error is encountered. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. @var{offset} is an offset within a memory object in bytes. This must be page aligned. @var{data} is the data that is being provided to the kernel. This is a pointer to the data. @var{size} is the amount of cached data (starting at @var{offset}) to be handled. This must be an integral number of the memory object page size. @var{reason} is an error code indicating what type of error occured. @c The error code is currently ingnored. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @deftypefun kern_return_t memory_object_data_unavailable (@w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{size}}, @w{kern_return_t @var{reason}}) The function @code{memory_object_data_unavailable} indicates that the memory object does not have data for the given region and that the kernel should provide the data for this range. The memory manager may use this call in three different situations. @enumerate @item The object was created by @code{memory_object_create} and the kernel has not yet provided data for this range (either via a @code{memory_object_data_initialize}, @code{memory_object_data_write} or a @code{memory_object_data_return} for the object. @item The object was created by an @code{memory_object_data_copy} and the kernel should copy this region from the original memory object. @item The object is a normal user-created memory object and the kernel should supply unlocked zero-filled pages for the range. @end enumerate The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. @var{offset} is an offset within a memory object, in bytes. This must be page aligned. @var{size} is the amount of cached data (starting at @var{offset}) to be handled. This must be an integral number of the memory object page size. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @deftypefun kern_return_t memory_object_copy (@w{memory_object_t @var{old_memory_object}}, @w{memory_object_control_t @var{old_memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}, @w{memory_object_t @var{new_memory_object}}) @deftypefunx kern_return_t seqnos_memory_object_copy (@w{memory_object_t @var{old_memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{old_memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}, @w{memory_object_t @var{new_memory_object}}) The function @code{memory_object_copy} indicates that a copy has been made of the specified range of the given original memory object. This call includes only the new memory object itself; a @code{memory_object_init} call will be made on the new memory object after the currently cached pages of the original object are prepared. After the memory manager receives the init call, it must reply with the @code{memory_object_ready} call to assert the "ready" attribute. The kernel will use the new memory object, control and name ports to refer to the new copy. This call is made when the original memory object had the caching parameter set to @code{MEMORY_OBJECT_COPY_CALL} and a user of the object has asked the kernel to copy it. Cached pages from the original memory object at the time of the copy operation are handled as follows: Readable pages may be silently copied to the new memory object (with all access permissions). Pages not copied are locked to prevent write access. The new memory object is @strong{temporary}, meaning that the memory manager should not change its contents or allow the memory object to be mapped in another client. The memory manager may use the @code{memory_object_data_unavailable} call to indicate that the appropriate pages of the original memory object may be used to fulfill the data request. The argument @var{old_memory_object} is the port that represents the old memory object data. @var{old_memory_control} is the kernel port for the old object. @var{offset} is the offset within a memory object to which this call refers. This will be page aligned. @var{length} is the number of bytes of data, starting at @var{offset}, to which this call refers. This will be an integral number of memory object pages. @var{new_memory_object} is a new memory object created by the kernel; see synopsis for further description. Note that all port rights (including receive rights) are included for the new memory object. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun The remaining interfaces in this section are obsolet. @deftypefun kern_return_t memory_object_data_write (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}) @deftypefunx kern_return_t seqnos_memory_object_data_write (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}) The function @code{memory_object_data_write} provides the memory manager with data that has been modified while cached in physical memory. It is the old form of @code{memory_object_data_return}. Once the memory manager no longer needs this data (e.g., it has been written to another storage medium), it should be deallocated using @code{vm_deallocate}. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. This will be page aligned. @var{data} is the data which has been modified while cached in physical memory. @var{data_count} is the amount of data to be written, in bytes. This will be an integral number of memory object pages. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_data_provided (@w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}, @w{vm_prot_t @var{lock_value}}) The function @code{memory_object_data_provided} supplies the kernel with data for the specified memory object. It is the old form of @code{memory_object_data_supply}. Ordinarily, memory managers should only provide data in reponse to @code{memory_object_data_request} calls from the kernel. The @var{lock_value} specifies what type of access will not be allowed to the data range. The lock values must be one or more of the set: @code{VM_PROT_NONE}, @code{VM_PROT_READ}, @code{VM_PROT_WRITE}, @code{VM_PROT_EXECUTE} and @code{VM_PROT_ALL} as defined in @file{mach/vm_prot.h}. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. @var{offset} is an offset within a memory object in bytes. This must be page aligned. @var{data} is the data that is being provided to the kernel. This is a pointer to the data. @var{data_count} is the amount of data to be provided. This must be an integral number of memory object pages. @var{lock_value} is a protection value indicating those forms of access that should @strong{not} be permitted to the specified cached data. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @node Memory Object Locking @section Memory Object Locking @deftypefun kern_return_t memory_object_lock_request (@w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{size}}, @w{memory_object_return_t @var{should_clean}}, @w{boolean_t @var{should_flush}}, @w{vm_prot_t @var{lock_value}}, @w{mach_port_t @var{reply_to}}) The function @code{memory_object_lock_request} allows a memory manager to make cache management requests. As specified in arguments to the call, the kernel will: @itemize @item clean (i.e., write back using @code{memory_object_data_supply} or @code{memory_object_data_write}) any cached data which has been modified since the last time it was written @item flush (i.e., remove any uses of) that data from memory @item lock (i.e., prohibit the specified uses of) the cached data @end itemize Locks applied to cached data are not cumulative; new lock values override previous ones. Thus, data may also be unlocked using this primitive. The lock values must be one or more of the following values: @code{VM_PROT_NONE}, @code{VM_PROT_READ}, @code{VM_PROT_WRITE}, @code{VM_PROT_EXECUTE} and @code{VM_PROT_ALL} as defined in @file{mach/vm_prot.h}. Only data which is cached at the time of this call is affected. When a running thread requires a prohibited access to cached data, the kernel will issue a @code{memory_object_data_unlock} call specifying the forms of access required. Once all of the actions requested by this call have been completed, the kernel issues a @code{memory_object_lock_completed} call on the specified reply port. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. @var{offset} is an offset within a memory object, in bytes. This must be page aligned. @var{size} is the amount of cached data (starting at @var{offset}) to be handled. This must be an integral number of the memory object page size. If @var{should_clean} is set, modified data should be written back to the memory manager. If @var{should_flush} is set, the specified cached data should be invalidated, and all uses of that data should be revoked. @var{lock_value} is a protection value indicating those forms of access that should @strong{not} be permitted to the specified cached data. @var{reply_to} is a port on which a @code{memory_object_lock_comleted} call should be issued, or @code{MACH_PORT_NULL} if no acknowledgement is desired. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @deftypefun kern_return_t memory_object_lock_completed (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}) @deftypefunx kern_return_t seqnos_memory_object_lock_completed (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}) The function @code{memory_object_lock_completed} indicates that a previous @code{memory_object_lock_request} has been completed. Note that this call is made on whatever port was specified in the @code{memory_object_lock_request} call; that port need not be the memory object port itself. No reply is expected after this call. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. @var{length} is the length of the data covered by the lock request. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_data_unlock (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}, @w{vm_prot_t @var{desired_access}}) @deftypefunx kern_return_t seqnos_memory_object_data_unlock (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{length}}, @w{vm_prot_t @var{desired_access}}) The function @code{memory_object_data_unlock} is a request that the memory manager permit at least the desired access to the specified data cached by the kernel. A call to @code{memory_object_lock_request} is expected in response. The argument @var{memory_object} is the port that represents the memory object data, as supplied to the kernel in a @code{vm_map} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. This will be page aligned. @var{length} is the number of bytes of data, starting at @var{offset}, to which this call refers. This will be an integral number of memory object pages. @var{desired_access} a protection value describing the memory access modes which must be permitted on the specified cached data. One or more of: @code{VM_PROT_READ}, @code{VM_PROT_WRITE} or @code{VM_PROT_EXECUTE}. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @node Memory Object Attributes @section Memory Object Attributes @deftypefun kern_return_t memory_object_get_attributes (@w{memory_object_control_t @var{memory_control}}, @w{boolean_t *@var{object_ready}}, @w{boolean_t *@var{may_cache_object}}, @w{memory_object_copy_strategy_t *@var{copy_strategy}}) The function @code{memory_object_get_attribute} retrieves the current attributes associated with the memory object. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. If @var{object_ready} is set, the kernel may issue new data and unlock requests on the associated memory object. If @var{may_cache_object} is set, the kernel may keep data associated with this memory object, even after virtual memory references to it are gone. @var{copy_strategy} tells how the kernel should copy regions of the associated memory object. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @deftypefun kern_return_t memory_object_change_attributes (@w{memory_object_control_t @var{memory_control}}, @w{boolean_t @var{may_cache_object}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}, @w{mach_port_t @var{reply_to}}) The function @code{memory_object_change_attribute} sets performance-related attributes for the specified memory object. If the caching attribute is asserted, the kernel is permitted (and encouraged) to maintain cached data for this memory object even after no virtual address space contains this data. There are three possible caching strategies: @code{MEMORY_OBJECT_COPY_NONE} which specifies that nothing special should be done when data in the object is copied; @code{MEMORY_OBJECT_COPY_CALL} which specifies that the memory manager should be notified via a @code{memory_object_copy} call before any part of the object is copied; and @code{MEMORY_OBJECT_COPY_DELAY} which guarantees that the memory manager does not externally modify the data so that the kernel can use its normal copy-on-write algorithms. @code{MEMORY_OBJECT_COPY_DELAY} is the strategy most commonly used. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. If @var{may_cache_object} is set, the kernel may keep data associated with this memory object, even after virtual memory references to it are gone. @var{copy_strategy} tells how the kernel should copy regions of the associated memory object. @var{reply_to} is a port on which a @code{memory_object_change_comleted} call will be issued upon completion of the attribute change, or @code{MACH_PORT_NULL} if no acknowledgement is desired. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @deftypefun kern_return_t memory_object_change_completed (@w{memory_object_t @var{memory_object}}, @w{boolean_t @var{may_cache_object}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}) @deftypefunx kern_return_t seqnos_memory_object_change_completed (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{boolean_t @var{may_cache_object}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}) The function @code{memory_object_change_completed} indicates the completion of an attribute change call. @c Warning: This routine does NOT contain a memory_object_control_t because @c the memory_object_change_attributes call may cause memory object @c termination (by uncaching the object). This would yield an invalid @c port. @end deftypefun The following interface is obsoleted by @code{memory_object_ready} and @code{memory_object_change_attributes}. If the old form @code{memory_object_set_attributes} is used to make a memory object ready, the kernel will write back data using the old @code{memory_object_data_write} interface rather than @code{memory_object_data_return}.. @deftypefun kern_return_t memory_object_set_attributes (@w{memory_object_control_t @var{memory_control}}, @w{boolean @var{object_ready}}, @w{boolean_t @var{may_cache_object}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}) The function @code{memory_object_set_attribute} controls how the the memory object. The kernel will only make data or unlock requests when the ready attribute is asserted. If the caching attribute is asserted, the kernel is permitted (and encouraged) to maintain cached data for this memory object even after no virtual address space contains this data. There are three possible caching strategies: @code{MEMORY_OBJECT_COPY_NONE} which specifies that nothing special should be done when data in the object is copied; @code{MEMORY_OBJECT_COPY_CALL} which specifies that the memory manager should be notified via a @code{memory_object_copy} call before any part of the object is copied; and @code{MEMORY_OBJECT_COPY_DELAY} which guarantees that the memory manager does not externally modify the data so that the kernel can use its normal copy-on-write algorithms. @code{MEMORY_OBJECT_COPY_DELAY} is the strategy most commonly used. The argument @var{memory_control} is the port, provided by the kernel in a @code{memory_object_init} call, to which cache management requests may be issued. If @var{object_ready} is set, the kernel may issue new data and unlock requests on the associated memory object. If @var{may_cache_object} is set, the kernel may keep data associated with this memory object, even after virtual memory references to it are gone. @var{copy_strategy} tells how the kernel should copy regions of the associated memory object. This routine does not receive a reply message (and consequently has no return value), so only message transmission errors apply. @end deftypefun @node Default Memory Manager @section Default Memory Manager @deftypefun kern_return_t vm_set_default_memory_manager (@w{host_t @var{host}}, @w{mach_port_t *@var{default_manager}}) The function @code{vm_set_default_memory_manager} sets the kernel's default memory manager. It sets the port to which newly-created temporary memory objects are delivered by @code{memory_object_create} to the host. The old memory manager port is returned. If @var{default_manager} is @code{MACH_PORT_NULL} then this routine just returns the current default manager port without changing it. The argument @var{host} is a task port to the kernel whose default memory manager is to be changed. @var{default_manager} is an in/out parameter. As input, @var{default_manager} is the port that the new memory manager is listening on for @code{memory_object_create} calls. As output, it is the old default memory manager's port. The function returns @code{KERN_SUCCESS} if the new memory manager is installed, and @code{KERN_INVALID_ARGUMENT} if this task does not have the privileges required for this call. @end deftypefun @deftypefun kern_return_t memory_object_create (@w{memory_object_t @var{old_memory_object}}, @w{memory_object_t @var{new_memory_object}}, @w{vm_size_t @var{new_object_size}}, @w{memory_object_control_t @var{new_control}}, @w{memory_object_name_t @var{new_name}}, @w{vm_size_t @var{new_page_size}}) @deftypefunx kern_return_t seqnos_memory_object_create (@w{memory_object_t @var{old_memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_t @var{new_memory_object}}, @w{vm_size_t @var{new_object_size}}, @w{memory_object_control_t @var{new_control}}, @w{memory_object_name_t @var{new_name}}, @w{vm_size_t @var{new_page_size}}) The function @code{memory_object_create} is a request that the given memory manager accept responsibility for the given memory object created by the kernel. This call will only be made to the system @strong{default memory manager}. The memory object in question initially consists of zero-filled memory; only memory pages that are actually written will ever be provided to @code{memory_object_data_request} calls, the default memory manager must use @code{memory_object_data_unavailable} for any pages that have not previously been written. No reply is expected after this call. Since this call is directed to the default memory manager, the kernel assumes that it will be ready to handle data requests to this object and does not need the confirmation of a @code{memory_object_set_attributes} call. The argument @var{old_memory_object} is a memory object provided by the default memory manager on which the kernel can make @code{memory_object_create} calls. @var{new_memory_object} is a new memory object created by the kernel; see synopsis for further description. Note that all port rights (including receive rights) are included for the new memory object. @var{new_object_size} is the maximum size of the new object. @var{new_control} is a port, created by the kernel, on which a memory manager may issue cache management requests for the new object. @var{new_name} a port used by the kernel to refer to the new memory object data in response to @code{vm_region} calls. @var{new_page_size} is the page size to be used by this kernel. All data sizes in calls involving this kernel must be an integral multiple of the page size. Note that different kernels, indicated by different a @code{memory_control}, may have different page sizes. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @deftypefun kern_return_t memory_object_data_initialize (@w{memory_object_t @var{memory_object}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}) @deftypefunx kern_return_t seqnos_memory_object_data_initialize (@w{memory_object_t @var{memory_object}}, @w{mach_port_seqno_t @var{seqno}}, @w{memory_object_control_t @var{memory_control}}, @w{vm_offset_t @var{offset}}, @w{vm_offset_t @var{data}}, @w{vm_size_t @var{data_count}}) The function @code{memory_object_data_initialize} provides the memory manager with initial data for a kernel-created memory object. If the memory manager already has been supplied data (by a previous @code{memory_object_data_initialize}, @code{memory_object_data_write} or @code{memory_object_data_return}), then this data should be ignored. Otherwise, this call behaves exactly as does @code{memory_object_data_return} on memory objects created by the kernel via @code{memory_object_create} and thus will only be made to default memory managers. This call will not be made on objects created via @code{memory_object_copy}. The argument @var{memory_object} the port that represents the memory object data, as supplied by the kernel in a @code{memory_object_create} call. @var{memory_control} is the request port to which a response is requested. (In the event that a memory object has been supplied to more than one the kernel that has made the request.) @var{offset} is the offset within a memory object to which this call refers. This will be page aligned. @var{data} os the data which has been modified while cached in physical memory. @var{data_count} is the amount of data to be written, in bytes. This will be an integral number of memory object pages. The function should return @code{KERN_SUCCESS}, but since this routine is called by the kernel, which does not wait for a reply message, this value is ignored. @end deftypefun @node Threads and Tasks @chapter Threads and Tasks @menu * Thread Interface:: Manipulating threads. * Task Interface:: Manipulating tasks. * Profiling:: Profiling threads and tasks. @end menu @node Thread Interface @section Thread Interface @cindex thread port @cindex port representing a thread @deftp {Data type} thread_t This is a @code{mach_port_t} and used to hold the port name of a thread port that represents the thread. Manipulations of the thread are implemented as remote procedure calls to the thread port. A thread can get a port to itself with the @code{mach_thread_self} system call. @end deftp @menu * Thread Creation:: Creating new threads. * Thread Termination:: Terminating existing threads. * Thread Information:: How to get informations on threads. * Thread Settings:: How to set threads related informations. * Thread Execution:: How to control the thread's machine state. * Scheduling:: Operations on thread scheduling. * Thread Special Ports:: How to handle the thread's special ports. * Exceptions:: Managing exceptions. @end menu @node Thread Creation @subsection Thread Creation @deftypefun kern_return_t thread_create (@w{task_t @var{parent_task}}, @w{thread_t *@var{child_thread}}) The function @code{thread_create} creates a new thread within the task specified by @var{parent_task}. The new thread has no processor state, and has a suspend count of 1. To get a new thread to run, first @code{thread_create} is called to get the new thread's identifier, (@var{child_thread}). Then @code{thread_set_state} is called to set a processor state, and finally @code{thread_resume} is called to get the thread scheduled to execute. When the thread is created send rights to its thread kernel port are given to it and returned to the caller in @var{child_thread}. The new thread's exception port is set to @code{MACH_PORT_NULL}. The function returns @code{KERN_SUCCESS} if a new thread has been created, @code{KERN_INVALID_ARGUMENT} if @var{parent_task} is not a valid task and @code{KERN_RESOURCE_SHORTAGE} if some critical kernel resource is not available. @end deftypefun @node Thread Termination @subsection Thread Termination @deftypefun kern_return_t thread_terminate (@w{thread_t @var{target_thread}}) The function @code{thread_terminate} destroys the thread specified by @var{target_thread}. The function returns @code{KERN_SUCCESS} if the thread has been killed and @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread. @end deftypefun @node Thread Information @subsection Thread Information @deftypefun thread_t mach_thread_self () The @code{mach_thread_self} system call returns the calling thread's thread port. @code{mach_thread_self} has an effect equivalent to receiving a send right for the thread port. @code{mach_thread_self} returns the name of the send right. In particular, successive calls will increase the calling task's user-reference count for the send right. @c author{marcus} As a special exception, the kernel will overrun the user reference count of the thread name port, so that this function can not fail for that reason. Because of this, the user should not deallocate the port right if an overrun might have happened. Otherwise the reference count could drop to zero and the send right be destroyed while the user still expects to be able to use it. As the kernel does not make use of the number of extant send rights anyway, this is safe to do (the thread port itself is not destroyed, even when there are no send rights anymore). The function returns @code{MACH_PORT_NULL} if a resource shortage prevented the reception of the send right or if the thread port is currently null and @code{MACH_PORT_DEAD} if the thread port is currently dead. @end deftypefun @deftypefun kern_return_t thread_info (@w{thread_t @var{target_thread}}, @w{int @var{flavor}}, @w{thread_info_t @var{thread_info}}, @w{mach_msg_type_number_t *@var{thread_infoCnt}}) The function @code{thread_info} returns the selected information array for a thread, as specified by @var{flavor}. @var{thread_info} is an array of integers that is supplied by the caller and returned filled with specified information. @var{thread_infoCnt} is supplied as the maximum number of integers in @var{thread_info}. On return, it contains the actual number of integers in @var{thread_info}. The maximum number of integers returned by any flavor is @code{THREAD_INFO_MAX}. The type of information returned is defined by @var{flavor}, which can be one of the following: @table @code @item THREAD_BASIC_INFO The function returns basic information about the thread, as defined by @code{thread_basic_info_t}. This includes the user and system time, the run state, and scheduling priority. The number of integers returned is @code{THREAD_BASIC_INFO_COUNT}. @item THREAD_SCHED_INFO The function returns information about the schduling policy for the thread as defined by @code{thread_sched_info_t}. The number of integers returned is @code{THREAD_SCHED_INFO_COUNT}. @end table The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread or @var{flavor} is not recognized. The function returns @code{MIG_ARRAY_TOO_LARGE} if the returned info array is too large for @var{thread_info}. In this case, @var{thread_info} is filled as much as possible and @var{thread_infoCnt} is set to the number of elements that would have been returned if there were enough room. @end deftypefun @deftp {Data type} {struct thread_basic_info} This structure is returned in @var{thread_info} by the @code{thread_info} function and provides basic information about the thread. You can cast a variable of type @code{thread_info_t} to a pointer of this type if you provided it as the @var{thread_info} parameter for the @code{THREAD_BASIC_INFO} flavor of @code{thread_info}. It has the following members: @table @code @item time_value_t user_time user run time @item time_value_t system_time system run time @item int cpu_usage Scaled cpu usage percentage. The scale factor is @code{TH_USAGE_SCALE}. @item int base_priority The base scheduling priority of the thread. @item int cur_priority The current scheduling priority of the thread. @item integer_t run_state The run state of the thread. The possible vlues of this field are: @table @code @item TH_STATE_RUNNING The thread is running normally. @item TH_STATE_STOPPED The thread is suspended. @item TH_STATE_WAITING The thread is waiting normally. @item TH_STATE_UNINTERRUPTIBLE The thread is in an uninterruptible wait. @item TH_STATE_HALTED The thread is halted at a clean point. @end table @item flags Various flags. The possible values of this field are: @table @code @item TH_FLAGS_SWAPPED The thread is swapped out. @item TH_FLAGS_IDLE The thread is an idle thread. @end table @item int suspend_count The suspend count for the thread. @item int sleep_time The number of seconds that the thread has been sleeping. @item time_value_t creation_time The time stamp of creation. @end table @end deftp @deftp {Data type} thread_basic_info_t This is a pointer to a @code{struct thread_basic_info}. @end deftp @deftp {Data type} {struct thread_sched_info} This structure is returned in @var{thread_info} by the @code{thread_info} function and provides schedule information about the thread. You can cast a variable of type @code{thread_info_t} to a pointer of this type if you provided it as the @var{thread_info} parameter for the @code{THREAD_SCHED_INFO} flavor of @code{thread_info}. It has the following members: @table @code @item int policy The scheduling policy of the thread, @ref{Scheduling Policy}. @item integer_t data Policy-dependent scheduling information, @ref{Scheduling Policy}. @item int base_priority The base scheduling priority of the thread. @item int max_priority The maximum scheduling priority of the thread. @item int cur_priority The current scheduling priority of the thread. @item int depressed @code{TRUE} if the thread is depressed. @item int depress_priority The priority the thread was depressed from. @end table @end deftp @deftp {Data type} thread_sched_info_t This is a pointer to a @code{struct thread_sched_info}. @end deftp @node Thread Settings @subsection Thread Settings @deftypefun kern_return_t thread_wire (@w{host_priv_t @var{host_priv}}, @w{thread_t @var{thread}}, @w{boolean_t @var{wired}}) The function @code{thread_wire} controls the VM privilege level of the thread @var{thread}. A VM-privileged thread never waits inside the kernel for memory allocation from the kernel's free list of pages or for allocation of a kernel stack. Threads that are part of the default pageout path should be VM-privileged, to prevent system deadlocks. Threads that are not part of the default pageout path should not be VM-privileged, to prevent the kernel's free list of pages from being exhausted. The functions returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_ARGUMENT} if @var{host_priv} or @var{thread} was invalid. The @code{thread_wire} call is actually an RPC to @var{host_priv}, normally a send right for a privileged host port, but potentially any send right. In addition to the normal diagnostic return codes from the call's server (normally the kernel), the call may return @code{mach_msg} return codes. @c See also: vm_wire(2), vm_set_default_memory_manager(2). @end deftypefun @node Thread Execution @subsection Thread Execution @deftypefun kern_return_t thread_suspend (@w{thread_t @var{target_thread}}) Increments the thread's suspend count and prevents the thread from executing any more user level instructions. In this context a user level instruction is either a machine instruction executed in user mode or a system trap instruction including page faults. Thus if a thread is currently executing within a system trap the kernel code may continue to execute until it reaches the system return code or it may supend within the kernel code. In either case, when the thread is resumed the system trap will return. This could cause unpredictible results if the user did a suspend and then altered the user state of the thread in order to change its direction upon a resume. The call @code{thread_abort} is provided to allow the user to abort any system call that is in progress in a predictable way. The suspend count may become greater than one with the effect that it will take more than one resume call to restart the thread. The function returns @code{KERN_SUCCESS} if the thread has been suspended and @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread. @end deftypefun @deftypefun kern_return_t thread_resume (@w{thread_t @var{target_thread}}) Decrements the threads's suspend count. If the count becomes zero the thread is resumed. If it is still positive, the thread is left suspended. The suspend count may not become negative. The function returns @code{KERN_SUCCESS} if the thread has been resumed, @code{KERN_FAILURE} if the suspend count is already zero and @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread. @end deftypefun @deftypefun kern_return_t thread_abort (@w{thread_t @var{target_thread}}) The function @code{thread_abort} aborts the kernel primitives: @code{mach_msg}, @code{msg_send}, @code{msg_receive} and @code{msg_rpc} and page-faults, making the call return a code indicating that it was interrupted. The call is interrupted whether or not the thread (or task containing it) is currently suspended. If it is supsended, the thread receives the interupt when it is resumed. A thread will retry an aborted page-fault if its state is not modified before it is resumed. @code{msg_send} returns @code{SEND_INTERRUPTED}; @code{msg_receive} returns @code{RCV_INTERRUPTED}; @code{msg_rpc} returns either @code{SEND_INTERRUPTED} or @code{RCV_INTERRUPTED}, depending on which half of the RPC was interrupted. The main reason for this primitive is to allow one thread to cleanly stop another thread in a manner that will allow the future execution of the target thread to be controlled in a predictable way. @code{thread_suspend} keeps the target thread from executing any further instructions at the user level, including the return from a system call. @code{thread_get_state}/@code{thread_set_state} allows the examination or modification of the user state of a target thread. However, if a suspended thread was executing within a system call, it also has associated with it a kernel state. This kernel state can not be modified by @code{thread_set_state} with the result that when the thread is resumed the system call may return changing the user state and possibly user memory. @code{thread_abort} aborts the kernel call from the target thread's point of view by resetting the kernel state so that the thread will resume execution at the system call return with the return code value set to one of the interrupted codes. The system call itself will either be entirely completed or entirely aborted, depending on the precise moment at which the abort was received. Thus if the thread's user state has been changed by @code{thread_set_state}, it will not be modified by any unexpected system call side effects. For example to simulate a Unix signal, the following sequence of calls may be used: @enumerate @item @code{thread_suspend}: Stops the thread. @item @code{thread_abort}: Interrupts any system call in progress, setting the return value to `interrupted'. Since the thread is stopped, it will not return to user code. @item @code{thread_set_state}: Alters thread's state to simulate a procedure call to the signal handler @item @code{thread_resume}: Resumes execution at the signal handler. If the thread's stack has been correctly set up, the thread may return to the interrupted system call. (Of course, the code to push an extra stack frame and change the registers is VERY machine-dependent.) @end enumerate Calling @code{thread_abort} on a non-suspended thread is pretty risky, since it is very difficult to know exactly what system trap, if any, the thread might be executing and whether an interrupt return would cause the thread to do something useful. The function returns @code{KERN_SUCCESS} if the thread received an interrupt and @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread. @end deftypefun @deftypefun kern_return_t thread_get_state (@w{thread_t @var{target_thread}}, @w{int @var{flavor}}, @w{thread_state_t @var{old_state}}, @w{mach_msg_type_number_t *@var{old_stateCnt}}) The function @code{thread_get_state} returns the execution state (e.g. the machine registers) of @var{target_thread} as specified by @var{flavor}. The @var{old_state} is an array of integers that is provided by the caller and returned filled with the specified information. @var{old_stateCnt} is input set to the maximum number of integers in @var{old_state} and returned equal to the actual number of integers in @var{old_state}. @var{target_thread} may not be @code{mach_thread_self()}. The definition of the state structures can be found in @file{machine/thread_status.h}. The function returns @code{KERN_SUCCESS} if the state has been returned, @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread or is @code{mach_thread_self} or @var{flavor} is unrecogized for this machine. The function returns @code{MIG_ARRAY_TOO_LARGE} if the returned state is too large for @var{old_state}. In this case, @var{old_state} is filled as much as possible and @var{old_stateCnt} is set to the number of elements that would have been returned if there were enough room. @end deftypefun @deftypefun kern_return_t thread_set_state (@w{thread_t @var{target_thread}}, @w{int @var{flavor}}, @w{thread_state_t @var{new_state}}, @w{mach_msg_type_number_t @var{new_state_count}}) The function @code{thread_set_state} sets the execution state (e.g. the machine registers) of @var{target_thread} as specified by @var{flavor}. The @var{new_state} is an array of integers. @var{new_state_count} is the number of elements in @var{new_state}. The entire set of registers is reset. This will do unpredictable things if @var{target_thread} is not suspended. @var{target_thread} may not be @code{mach_thread_self}. The definition of the state structures can be found in @file{machine/thread_status.h}. The function returns @code{KERN_SUCCESS} if the state has been set and @code{KERN_INVALID_ARGUMENT} if @var{target_thread} is not a thread or is @code{mach_thread_self} or @var{flavor} is unrecogized for this machine. @end deftypefun @node Scheduling @subsection Scheduling @menu * Thread Priority:: Changing the priority of a thread. * Hand-Off Scheduling:: Switching to a new thread. * Scheduling Policy:: Setting the scheduling policy. @end menu @node Thread Priority @subsubsection Thread Priority Threads have three priorities associated with them by the system, a priority, a maximum priority, and a scheduled priority. The scheduled priority is used to make scheduling decisions about the thread. It is determined from the priority by the policy (for timesharing, this means adding an increment derived from cpu usage). The priority can be set under user control, but may never exceed the maximum priority. Changing the maximum priority requires presentation of the control port for the thread's processor set; since the control port for the default processor set is privileged, users cannot raise their maximum priority to unfairly compete with other users on that set. Newly created threads obtain their priority from their task and their max priority from the thread. @deftypefun kern_return_t thread_priority (@w{thread_t @var{thread}}, @w{int @var{prority}}, @w{boolean_t @var{set_max}}) The function @code{thread_priority} changes the priority and optionally the maximum priority of @var{thread}. Priorities range from 0 to 31, where lower numbers denote higher priorities. If the new priority is higher than the priority of the current thread, preemption may occur as a result of this call. The maximum priority of the thread is also set if @var{set_max} is @code{TRUE}. This call will fail if @var{priority} is greater than the current maximum priority of the thread. As a result, this call can only lower the value of a thread's maximum priority. The functions returns @code{KERN_SUCCESS} if the operation completed successfully, @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread or @var{priority} is out of range (not in 0..31), and @code{KERN_FAILURE} if the requested operation would violate the thread's maximum priority (thread_priority). @end deftypefun @deftypefun kern_return_t thread_max_priority (@w{thread_t @var{thread}}, @w{processor_set_t @var{processor_set}}, @w{int @var{priority}}) The function @code{thread_max_priority} changes the maximum priority of the thread. Because it requires presentation of the corresponding processor set port, this call can reset the maximum priority to any legal value. The functions returns @code{KERN_SUCCESS} if the operation completed successfully, @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread or @var{processor_set} is not a control port for a processor set or @var{priority} is out of range (not in 0..31), and @code{KERN_FAILURE} if the thread is not assigned to the processor set whose control port was presented. @end deftypefun @node Hand-Off Scheduling @subsubsection Hand-Off Scheduling @deftypefun kern_return_t thread_switch (@w{thread_t @var{new_thread}}, @w{int @var{option}}, @w{int @var{time}}) The function @code{thread_switch} provides low-level access to the scheduler's context switching code. @var{new_thread} is a hint that implements hand-off scheduling. The operating system will attempt to switch directly to the new thread (by passing the normal logic that selects the next thread to run) if possible. Since this is a hint, it may be incorrect; it is ignored if it doesn't specify a thread on the same host as the current thread or if that thread can't be switched to (i.e., not runnable or already running on another processor). In this case, the normal logic to select the next thread to run is used; the current thread may continue running if there is no other appropriate thread to run. Options for @var{option} are defined in @file{mach/thread_switch.h} and specify the interpretation of @var{time}. The possible values for @var{option} are: @table @code @item SWITCH_OPTION_NONE No options, the time argument is ignored. @item SWITCH_OPTION_WAIT The thread is blocked for the specified time. This can be aborted by @code{thread_abort}. @item SWITCH_OPTION_DEPRESS The thread's priority is depressed to the lowest possible value for the specified time. This can be aborted by @code{thread_depress_abort}. This depression is independent of operations that change the thread's priority (e.g. @code{thread_priority} will not abort the depression). The minimum time and units of time can be obtained as the @code{min_timeout} value from @code{host_info}. The depression is also aborted when the current thread is next run (either via hand­off scheduling or because the processor set has nothing better to do). @end table @code{thread_switch} is often called when the current thread can proceed no further for some reason; the various options and arguments allow information about this reason to be transmitted to the kernel. The @var{new_thread} argument (handoff scheduling) is useful when the identity of the thread that must make progress before the current thread runs again is known. The @code{WAIT} option is used when the amount of time that the current thread must wait before it can do anything useful can be estimated and is fairly long. The @code{DEPRESS} option is used when the amount of time that must be waited is fairly short, especially when the identity of the thread that is being waited for is not known. Users should beware of calling @code{thread_switch} with an invalid hint (e.g. @code{MACH_PORT_NULL}) and no option. Because the time-sharing scheduler varies the priority of threads based on usage, this may result in a waste of cpu time if the thread that must be run is of lower priority. The use of the @code{DEPRESS} option in this situation is highly recommended. @code{thread_switch} ignores policies. Users relying on the preemption semantics of a fixed time policy should be aware that @code{thread_switch} ignores these semantics; it will run the specified @var{new_thread} indepent of its priority and the priority of any other threads that could be run instead. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread or @var{option} is not a recognized option, and @code{KERN_FAILURE} if @code{kern_depress_abort} failed because the thread was not depressed. @end deftypefun @deftypefun kern_return_t thread_depress_abort (@w{thread_t @var{thread}}) The function @code{thread_depress_abort} cancels any priority depression for @var{thread} caused by a @code{swtch_pri} or @code{thread_switch} call. The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a valid thread. @end deftypefun @deftypefun boolean_t swtch () @c XXX Clear up wording. The system trap @code{swtch} attempts to switch the current thread off the processor. The return value indicates if more than the current thread is running in the processor set. This is useful for lock management routines. The call returns @code{FALSE} if the thread is justified in becoming a resource hog by continuing to spin because there's nothing else useful that the processor could do. @code{TRUE} is returned if the thread should make one more check on the lock and then be a good citizen and really suspend. @end deftypefun @deftypefun boolean_t swtch_pri (@w{int @var{priority}}) The system trap @code{swtch_pri} attempts to switch the current thread off the processor as @code{swtch} does, but depressing the priority of the thread to the minimum possible value during the time. @var{priority} is not used currently. The return value is as for @code{swtch}. @end deftypefun @node Scheduling Policy @subsubsection Scheduling Policy @deftypefun kern_return_t thread_policy (@w{thread_t @var{thread}}, @w{int @var{policy}}, @w{int @var{data}}) The function @code{thread_policy} changes the scheduling policy for @var{thread} to @var{policy}. @var{data} is policy-dependent scheduling information. There are currently two supported policies: @code{POLICY_TIMESHARE} and @code{POLICY_FIXEDPRI} defined in @file{mach/policy.h}; this file is included by @file{mach.h}. @var{data} is meaningless for timesharing, but is the quantum to be used (in milliseconds) for the fixed priority policy. To be meaningful, this quantum must be a multiple of the basic system quantum (min_quantum) which can be obtained from @code{host_info}. The system will always round up to the next multiple of the quantum. Processor sets may restrict the allowed policies, so this call will fail if the processor set to which @var{thread} is currently assigned does not permit @var{policy}. The function returns @code{KERN_SUCCESS} if the call succeeded. @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread or @var{policy} is not a recognized policy, and @code{KERN_FAILURE} if the processor set to which @var{thread} is currently assigned does not permit @var{policy}. @end deftypefun @node Thread Special Ports @subsection Thread Special Ports @deftypefun kern_return_t thread_get_special_port (@w{thread_t @var{thread}}, @w{int @var{which_port}}, @w{mach_port_t *@var{special_port}}) The function @code{thread_get_special_port} returns send rights to one of a set of special ports for the thread specified by @var{thread}. The possible values for @var{which_port} are @code{THREAD_KERNEL_PORT} and @code{THREAD_EXCEPTION_PORT}. A thread also has access to its task's special ports. The function returns @code{KERN_SUCCESS} if the port was returned and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread or @var{which_port} is an invalid port selector. @end deftypefun @deftypefun kern_return_t thread_get_kernel_port (@w{thread_t @var{thread}}, @w{mach_port_t *@var{kernel_port}}) The function @code{thread_get_kernel_port} is equivalent to the function @code{thread_get_special_port} with the @var{which_port} argument set to @code{THREAD_KERNEL_PORT}. @end deftypefun @deftypefun kern_return_t thread_get_exception_port (@w{thread_t @var{thread}}, @w{mach_port_t *@var{exception_port}}) The function @code{thread_get_exception_port} is equivalent to the function @code{thread_get_special_port} with the @var{which_port} argument set to @code{THREAD_EXCEPTION_PORT}. @end deftypefun @deftypefun kern_return_t thread_set_special_port (@w{thread_t @var{thread}}, @w{int @var{which_port}}, @w{mach_port_t @var{special_port}}) The function @code{thread_set_special_port} sets one of a set of special ports for the thread specified by @var{thread}. The possible values for @var{which_port} are @code{THREAD_KERNEL_PORT} and @code{THREAD_EXCEPTION_PORT}. A thread also has access to its task's special ports. The function returns @code{KERN_SUCCESS} if the port was set and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread or @var{which_port} is an invalid port selector. @end deftypefun @deftypefun kern_return_t thread_set_kernel_port (@w{thread_t @var{thread}}, @w{mach_port_t @var{kernel_port}}) The function @code{thread_set_kernel_port} is equivalent to the function @code{thread_set_special_port} with the @var{which_port} argument set to @code{THREAD_KERNEL_PORT}. @end deftypefun @deftypefun kern_return_t thread_set_exception_port (@w{thread_t @var{thread}}, @w{mach_port_t @var{exception_port}}) The function @code{thread_set_exception_port} is equivalent to the function @code{thread_set_special_port} with the @var{which_port} argument set to @code{THREAD_EXCEPTION_PORT}. @end deftypefun @node Exceptions @subsection Exceptions @deftypefun kern_return_t catch_exception_raise (@w{mach_port_t @var{exception_port}}, @w{thread_t @var{thread}}, @w{task_t @var{task}}, @w{int @var{exception}}, @w{int @var{code}}, @w{int @var{subcode}}) XXX Fixme @end deftypefun @deftypefun kern_return_t exception_raise (@w{mach_port_t @var{exception_port}}, @w{mach_port_t @var{thread}}, @w{mach_port_t @var{task}}, @w{integer_t @var{exception}}, @w{integer_t @var{code}}, @w{integer_t @var{subcode}}) XXX Fixme @end deftypefun @deftypefun kern_return_t evc_wait (@w{unsigned int @var{event}}) @c XXX This is for user space drivers, the description is incomplete. The system trap @code{evc_wait} makes the calling thread wait for the event specified by @var{event}. The call returns @code{KERN_SUCCESS} if the event has occured, @code{KERN_NO_SPACE} if another thread is waiting for the same event and @code{KERN_INVALID_ARGUMENT} if the event object is invalid. @end deftypefun @node Task Interface @section Task Interface @cindex task port @cindex port representing a task @deftp {Data type} task_t This is a @code{mach_port_t} and used to hold the port name of a task port that represents the thread. Manipulations of the task are implemented as remote procedure calls to the task port. A task can get a port to itself with the @code{mach_task_self} system call. The task port name is also used to identify the task's IPC space (@pxref{Port Manipulation Interface}) and the task's virtual memory map (@pxref{Virtual Memory Interface}). @end deftp @menu * Task Creation:: Creating tasks. * Task Termination:: Terminating tasks. * Task Information:: Informations on tasks. * Task Execution:: Thread scheduling in a task. * Task Special Ports:: How to get and set the task's special ports. * Syscall Emulation:: How to emulate system calls. @end menu @node Task Creation @subsection Task Creation @deftypefun kern_return_t task_create (@w{task_t @var{parent_task}}, @w{boolean_t @var{inherit_memory}}, @w{task_t *@var{child_task}}) The function @code{task_create} creates a new task from @var{parent_task}; the resulting task (@var{child_task}) acquires shared or copied parts of the parent's address space (see @code{vm_inherit}). The child task initially contains no threads. If @var{inherit_memory} is set, the child task's address space is built from the parent task according to its memory inheritance values; otherwise, the child task is given an empty address space. The child task gets the three special ports created or copied for it at task creation. The @code{TASK_KERNEL_PORT} is created and send rights for it are given to the child and returned to the caller. @c The following is only relevant if MACH_IPC_COMPAT is used. @c The @code{TASK_NOTIFY_PORT} is created and receive, ownership and send rights @c for it are given to the child. The caller has no access to it. The @code{TASK_BOOTSTRAP_PORT} and the @code{TASK_EXCEPTION_PORT} are inherited from the parent task. The new task can get send rights to these ports with the call @code{task_get_special_port}. The function returns @code{KERN_SUCCESS} if a new task has been created, @code{KERN_INVALID_ARGUMENT} if @var{parent_task} is not a valid task port and @code{KERN_RESOURCE_SHORTAGE} if some critical kernel resource is unavailable. @end deftypefun @node Task Termination @subsection Task Termination @deftypefun kern_return_t task_terminate (@w{task_t @var{target_task}}) The function @code{task_terminate} destroys the task specified by @var{target_task} and all its threads. All resources that are used only by this task are freed. Any port to which this task has receive and ownership rights is destroyed. The function returns @code{KERN_SUCCESS} if the task has been killed, @code{KERN_INVALID_ARGUMENT} if @var{target_task} is not a task. @end deftypefun @node Task Information @subsection Task Information @deftypefun task_t mach_task_self () The @code{mach_task_self} system call returns the calling thread's task port. @code{mach_task_self} has an effect equivalent to receiving a send right for the task port. @code{mach_task_self} returns the name of the send right. In particular, successive calls will increase the calling task's user-reference count for the send right. As a special exception, the kernel will overrun the user reference count of the task name port, so that this function can not fail for that reason. Because of this, the user should not deallocate the port right if an overrun might have happened. Otherwise the reference count could drop to zero and the send right be destroyed while the user still expects to be able to use it. As the kernel does not make use of the number of extant send rights anyway, this is safe to do (the task port itself is not destroyed, even when there are no send rights anymore). The funcion returns @code{MACH_PORT_NULL} if a resource shortage prevented the reception of the send right, @code{MACH_PORT_NULL} if the task port is currently null, @code{MACH_PORT_DEAD} if the task port is currently dead. @end deftypefun @deftypefun kern_return_t task_threads (@w{task_t @var{target_task}}, @w{thread_array_t *@var{thread_list}}, @w{mach_msg_type_number_t *@var{thread_count}}) The function @code{task_threads} gets send rights to the kernel port for each thread contained in @var{target_task}. @var{thread_list} is an array that is created as a result of this call. The caller may wish to @code{vm_deallocate} this array when the data is no longer needed. The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{target_task} is not a task. @end deftypefun @deftypefun kern_return_t task_info (@w{task_t @var{target_task}}, @w{int @var{flavor}}, @w{task_info_t @var{task_info}}, @w{mach_msg_type_number_t *@var{task_info_count}}) The function @code{task_info} returns the selected information array for a task, as specified by @var{flavor}. @var{task_info} is an array of integers that is supplied by the caller, and filled with specified information. @var{task_info_count} is supplied as the maximum number of integers in @var{task_info}. On return, it contains the actual number of integers in @var{task_info}. The maximum number of integers returned by any flavor is @code{TASK_INFO_MAX}. The type of information returned is defined by @var{flavor}, which can be one of the following: @table @code @item TASK_BASIC_INFO The function returns basic information about the task, as defined by @code{task_basic_info_t}. This includes the user and system time and memory consumption. The number of integers returned is @code{TASK_BASIC_INFO_COUNT}. @item TASK_EVENTS_INFO The function returns information about events for the task as defined by @code{thread_sched_info_t}. This includes statistics about virtual memory and IPC events like pageouts, pageins and messages sent and received. The number of integers returned is @code{TASK_EVENTS_INFO_COUNT}. @item TASK_THREAD_TIMES_INFO The function returns information about the total time for live threads as defined by @code{task_thread_times_info_t}. The number of integers returned is @code{TASK_THREAD_TIMES_INFO_COUNT}. @end table The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{target_task} is not a thread or @var{flavor} is not recognized. The function returns @code{MIG_ARRAY_TOO_LARGE} if the returned info array is too large for @var{task_info}. In this case, @var{task_info} is filled as much as possible and @var{task_infoCnt} is set to the number of elements that would have been returned if there were enough room. @end deftypefun @deftp {Data type} {struct task_basic_info} This structure is returned in @var{task_info} by the @code{task_info} function and provides basic information about the task. You can cast a variable of type @code{task_info_t} to a pointer of this type if you provided it as the @var{task_info} parameter for the @code{TASK_BASIC_INFO} flavor of @code{task_info}. It has the following members: @table @code @item integer_t suspend_count suspend count for task @item integer_t base_priority base scheduling priority @item vm_size_t virtual_size number of virtual pages @item vm_size_t resident_size number of resident pages @item time_value_t user_time total user run time for terminated threads @item time_value_t system_time total system run time for terminated threads @item time_value_t creation_time creation time stamp @end table @end deftp @deftp {Data type} task_basic_info_t This is a pointer to a @code{struct task_basic_info}. @end deftp @deftp {Data type} {struct task_events_info} This structure is returned in @var{task_info} by the @code{task_info} function and provides event statistics for the task. You can cast a variable of type @code{task_info_t} to a pointer of this type if you provided it as the @var{task_info} parameter for the @code{TASK_EVENTS_INFO} flavor of @code{task_info}. It has the following members: @table @code @item natural_t faults number of page faults @item natural_t zero_fills number of zero fill pages @item natural_t reactivations number of reactivated pages @item natural_t pageins number of actual pageins @item natural_t cow_faults number of copy-on-write faults @item natural_t messages_sent number of messages sent @item natural_t messages_received number of messages received @end table @end deftp @deftp {Data type} task_events_info_t This is a pointer to a @code{struct task_events_info}. @end deftp @deftp {Data type} {struct task_thread_times_info} This structure is returned in @var{task_info} by the @code{task_info} function and provides event statistics for the task. You can cast a variable of type @code{task_info_t} to a pointer of this type if you provided it as the @var{task_info} parameter for the @code{TASK_THREAD_TIMES_INFO} flavor of @code{task_info}. It has the following members: @table @code @item time_value_t user_time total user run time for live threads @item time_value_t system_time total system run time for live threads @end table @end deftp @deftp {Data type} task_thread_times_info_t This is a pointer to a @code{struct task_thread_times_info}. @end deftp @node Task Execution @subsection Task Execution @deftypefun kern_return_t task_suspend (@w{task_t @var{target_task}}) The function @code{task_suspend} increments the task's suspend count and stops all threads in the task. As long as the suspend count is positive newly created threads will not run. This call does not return until all threads are suspended. The count may become greater than one, with the effect that it will take more than one resume call to restart the task. The function returns @code{KERN_SUCCESS} if the task has been suspended and @code{KERN_INVALID_ARGUMENT} if @var{target_task} is not a task. @end deftypefun @deftypefun kern_return_t task_resume (@w{task_t @var{target_task}}) The function @code{task_resume} decrements the task's suspend count. If it becomes zero, all threads with zero suspend counts in the task are resumed. The count may not become negative. The function returns @code{KERN_SUCCESS} if the task has been resumed, @code{KERN_FAILURE} if the suspend count is already at zero and @code{KERN_INVALID_ARGUMENT} if @var{target_task} is not a task. @end deftypefun @c XXX Should probably be in the "Scheduling" node of the Thread Interface. @deftypefun kern_return_t task_priority (@w{task_t @var{task}}, @w{int @var{priority}}, @w{boolean_t @var{change_threads}}) The priority of a task is used only for creation of new threads; a new thread's priority is set to the enclosing task's priority. @code{task_priority} changes this task priority. It also sets the priorities of all threads in the task to this new priority if @var{change_threads} is @code{TRUE}. Existing threads are not affected otherwise. If this priority change violates the maximum priority of some threads, as many threads as possible will be changed and an error code will be returned. The function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task, or @var{priority} is not a valid priority and @code{KERN_FAILURE} if @var{change_threads} was @code{TRUE} and the attempt to change the priority of at least one existing thread failed because the new priority would have exceeded that thread's maximum priority. @end deftypefun @deftypefun kern_return_t task_ras_control (@w{task_t @var{target_task}}, @w{vm_address_t @var{start_pc}}, @w{vm_address_t @var{end_pc}}, @w{int @var{flavor}}) The function @code{task_ras_control} manipulates a task's set of restartable atomic sequences. If a sequence is installed, and any thread in the task is preempted within the range [@var{start_pc},@var{end_pc}], then the thread is resumed at @var{start_pc}. This enables applications to build atomic sequences which, when executed to completion, will have executed atomically. Restartable atomic sequences are intended to be used on systems that do not have hardware support for low-overhead atomic primitives. As a thread can be rolled-back, the code in the sequence should have no side effects other than a final store at @var{end_pc}. The kernel does not guarantee that the sequence is restartable. It assumes the application knows what it's doing. A task may have a finite number of atomic sequences that is defined at compile time. The flavor specifices the particular operation that should be applied to this restartable atomic sequence. Possible values for flavor can be: @table @code @item TASK_RAS_CONTROL_PURGE_ALL Remove all registered sequences for this task. @item TASK_RAS_CONTROL_PURGE_ONE Remove the named registered sequence for this task. @item TASK_RAS_CONTROL_PURGE_ALL_AND_INSTALL_ONE Atomically remove all registered sequences and install the named sequence. @item TASK_RAS_CONTROL_INSTALL_ONE Install this sequence. @end table The function returns @code{KERN_SUCCESS} if the operation has been performed, @code{KERN_INVALID_ADDRESS} if the @var{start_pc} or @var{end_pc} values are not a valid address for the requested operation (for example, it is invalid to purge a sequence that has not been registered), @code{KERN_RESOURCE_SHORTAGE} if an attempt was made to install more restartable atomic sequences for a task than can be supported by the kernel, @code{KERN_INVALID_VALUE} if a bad flavor was specified, @code{KERN_INVALID_ARGUMENT} if @var{target_task} is not a task and @code{KERN_FAILURE} if the call is not not supported on this configuration. @end deftypefun @node Task Special Ports @subsection Task Special Ports @deftypefun kern_return_t task_get_special_port (@w{task_t @var{task}}, @w{int @var{which_port}}, @w{mach_port_t *@var{special_port}}) The function @code{task_get_special_port} returns send rights to one of a set of special ports for the task specified by @var{task}. The special ports associated with a task are the kernel port (@code{TASK_KERNEL_PORT}), the bootstrap port (@code{TASK_BOOTSTRAP_PORT}) and the exception port (@code{TASK_EXCEPTION_PORT}). The bootstrap port is a port to which a task may send a message requesting other system service ports. This port is not used by the kernel. The task's exception port is the port to which messages are sent by the kernel when an exception occurs and the thread causing the exception has no exception port of its own. The following macros to call @code{task_get_special_port} for a specific port are defined in @code{mach/task_special_ports.h}: @code{task_get_exception_port} and @code{task_get_bootstrap_port}. The function returns @code{KERN_SUCCESS} if the port was returned and @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task or @var{which_port} is an invalid port selector. @end deftypefun @deftypefun kern_return_t task_get_kernel_port (@w{task_t @var{task}}, @w{mach_port_t *@var{kernel_port}}) The function @code{task_get_kernel_port} is equivalent to the function @code{task_get_special_port} with the @var{which_port} argument set to @code{TASK_KERNEL_PORT}. @end deftypefun @deftypefun kern_return_t task_get_exception_port (@w{task_t @var{task}}, @w{mach_port_t *@var{exception_port}}) The function @code{task_get_exception_port} is equivalent to the function @code{task_get_special_port} with the @var{which_port} argument set to @code{TASK_EXCEPTION_PORT}. @end deftypefun @deftypefun kern_return_t task_get_bootstrap_port (@w{task_t @var{task}}, @w{mach_port_t *@var{bootstrap_port}}) The function @code{task_get_bootstrap_port} is equivalent to the function @code{task_get_special_port} with the @var{which_port} argument set to @code{TASK_BOOTSTRAP_PORT}. @end deftypefun @deftypefun kern_return_t task_set_special_port (@w{task_t @var{task}}, @w{int @var{which_port}}, @w{mach_port_t @var{special_port}}) The function @code{thread_set_special_port} sets one of a set of special ports for the task specified by @var{task}. The special ports associated with a task are the kernel port (@code{TASK_KERNEL_PORT}), the bootstrap port (@code{TASK_BOOTSTRAP_PORT}) and the exception port (@code{TASK_EXCEPTION_PORT}). The bootstrap port is a port to which a thread may send a message requesting other system service ports. This port is not used by the kernel. The task's exception port is the port to which messages are sent by the kernel when an exception occurs and the thread causing the exception has no exception port of its own. The function returns @code{KERN_SUCCESS} if the port was set and @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task or @var{which_port} is an invalid port selector. @end deftypefun @deftypefun kern_return_t task_set_kernel_port (@w{task_t @var{task}}, @w{mach_port_t @var{kernel_port}}) The function @code{task_set_kernel_port} is equivalent to the function @code{task_set_special_port} with the @var{which_port} argument set to @code{TASK_KERNEL_PORT}. @end deftypefun @deftypefun kern_return_t task_set_exception_port (@w{task_t @var{task}}, @w{mach_port_t @var{exception_port}}) The function @code{task_set_exception_port} is equivalent to the function @code{task_set_special_port} with the @var{which_port} argument set to @code{TASK_EXCEPTION_PORT}. @end deftypefun @deftypefun kern_return_t task_set_bootstrap_port (@w{task_t @var{task}}, @w{mach_port_t @var{bootstrap_port}}) The function @code{task_set_bootstrap_port} is equivalent to the function @code{task_set_special_port} with the @var{which_port} argument set to @code{TASK_BOOTSTRAP_PORT}. @end deftypefun @node Syscall Emulation @subsection Syscall Emulation @deftypefun kern_return_t task_get_emulation_vector (@w{task_t @var{task}}, @w{int *@var{vector_start}}, @w{emulation_vector_t *@var{emulation_vector}}, @w{mach_msg_type_number_t *@var{emulation_vector_count}}) The function @code{task_get_emulation_vector} gets the user-level handler entry points for all emulated system calls. @c XXX Fixme @end deftypefun @deftypefun kern_return_t task_set_emulation_vector (@w{task_t @var{task}}, @w{int @var{vector_start}}, @w{emulation_vector_t @var{emulation_vector}}, @w{mach_msg_type_number_t @var{emulation_vector_count}}) The function @code{task_set_emulation_vector} establishes user-level handlers for the specified system calls. Non-emulated system calls are specified with an entry of @code{EML_ROUTINE_NULL}. System call emulation handlers are inherited by the childs of @var{task}. @c XXX Fixme @end deftypefun @deftypefun kern_return_t task_set_emulation (@w{task_t @var{task}}, @w{vm_address_t @var{routine_entry_pt}}, @w{int @var{routine_number}}) The function @code{task_set_emulation} establishes a user-level handler for the specified system call. System call emulation handlers are inherited by the childs of @var{task}. @c XXX Fixme @end deftypefun @c XXX Fixme datatype emulation_vector_t @node Profiling @section Profiling @deftypefun kern_return_t task_enable_pc_sampling (@w{task_t @var{task}}, @w{int *@var{ticks}}, @w{sampled_pc_flavor_t @var{flavor}}) @deftypefunx kern_return_t thread_enable_pc_sampling (@w{thread_t @var{thread}}, @w{int *@var{ticks}}, @w{sampled_pc_flavor_t @var{flavor}}) The function @code{task_enable_pc_sampling} enables PC sampling for @var{task}, the function @code{thread_enable_pc_sampling} enables PC sampling for @var{thread}. The kernel's idea of clock granularity is returned in @var{ticks} in usecs. (this value should not be trusted). The sampling flavor is specified by @var{flavor}. The function returns @code{KERN_SUCCESS} if the operation is completed successfully and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a valid thread. @end deftypefun @deftypefun kern_return_t task_disable_pc_sampling (@w{task_t @var{task}}, @w{int *@var{sample_count}}) @deftypefunx kern_return_t thread_disable_pc_sampling (@w{thread_t @var{thread}}, @w{int *@var{sample_count}}) The function @code{task_disable_pc_sampling} disables PC sampling for @var{task}, the function @code{thread_disable_pc_sampling} disables PC sampling for @var{thread}. The number of sample elements in the kernel for the thread is returned in @var{sample_count}. The function returns @code{KERN_SUCCESS} if the operation is completed successfully and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a valid thread. @end deftypefun @deftypefun kern_return_t task_get_sampled_pcs (@w{task_t @var{task}}, @w{sampled_pc_seqno_t *@var{seqno}}, @w{sampled_pc_array_t @var{sampled_pcs}}, @w{mach_msg_type_number_t *@var{sample_count}}) @deftypefunx kern_return_t thread_get_sampled_pcs (@w{thread_t @var{thread}}, @w{sampled_pc_seqno_t *@var{seqno}}, @w{sampled_pc_array_t @var{sampled_pcs}}, @w{int *@var{sample_count}}) The function @code{task_get_sampled_pcs} extracts the PC samples for @var{task}, the function @code{thread_get_sampled_pcs} extracts the PC samples for @var{thread}. @var{seqno} is the sequence number of the sampled PCs. This is useful for determining when a collector thread has missed a sample. The sampled PCs for the thread are returned in @var{sampled_pcs}. @var{sample_count} contains the number of sample elements returned. The function returns @code{KERN_SUCCESS} if the operation is completed successfully, @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a valid thread and @code{KERN_FAILURE} if @var{thread} is not sampled. @end deftypefun @deftp {Data type} sampled_pc_t This structure is returned in @var{sampled_pcs} by the @code{thread_get_sampled_pcs} and @code{task_get_sampled_pcs} functions and provides pc samples for threads or tasks. It has the following members: @table @code @item natural_t id A thread-specific unique identifier. @item vm_offset_t pc A pc value. @item sampled_pc_flavor_t sampletype The type of the sample as per flavor. @end table @end deftp @deftp {Data type} sampled_pc_flavor_t This data type specifies a pc sample flavor, either as argument passed in @var{flavor} to the @code{thread_enable_pc_sample} and @code{thread_disable_pc_sample} functions, or as member @code{sampletype} in the @code{sample_pc_t} data type. The flavor is a bitwise-or of the possible flavors defined in @file{mach/pc_sample.h}: @table @code @item SAMPLED_PC_PERIODIC default @item SAMPLED_PC_VM_ZFILL_FAULTS zero filled fault @item SAMPLED_PC_VM_REACTIVATION_FAULTS reactivation fault @item SAMPLED_PC_VM_PAGEIN_FAULTS pagein fault @item SAMPLED_PC_VM_COW_FAULTS copy-on-write fault @item SAMPLED_PC_VM_FAULTS_ANY any fault @item SAMPLED_PC_VM_FAULTS the bitwise-or of @code{SAMPLED_PC_VM_ZFILL_FAULTS}, @code{SAMPLED_PC_VM_REACTIVATION_FAULTS}, @code{SAMPLED_PC_VM_PAGEIN_FAULTS} and @code{SAMPLED_PC_VM_COW_FAULTS}. @end table @end deftp @c XXX sampled_pc_array_t, sampled_pc_seqno_t @node Host Interface @chapter Host Interface @cindex host interface This section describes the Mach interface to a host executing a Mach kernel. The interface allows to query statistics about a host and control its behaviour. A host is represented by two ports, a name port @var{host} used to query information about the host accessible to everyone, and a control port @var{host_priv} used to manipulate it. For example, you can query the current time using the name port, but to change the time you need to send a message to the host control port. Everything described in this section is declared in the header file @file{mach.h}. @menu * Host Ports:: Ports representing a host. * Host Information:: Retrieval of information about a host. * Host Time:: Operations on the time as seen by a host. * Host Reboot:: Rebooting the system. @end menu @node Host Ports @section Host Ports @cindex host ports @cindex ports representing a host @cindex host name port @deftp {Data type} host_t This is a @code{mach_port_t} and used to hold the port name of a host name port (or short: host port). Any task can get a send right to the name port of the host running the task using the @code{mach_host_self} system call. The name port can be used query information about the host, for example the current time. @end deftp @deftypefun host_t mach_host_self () The @code{mach_host_self} system call returns the calling thread's host name port. It has an effect equivalent to receiving a send right for the host port. @code{mach_host_self} returns the name of the send right. In particular, successive calls will increase the calling task's user-reference count for the send right. As a special exception, the kernel will overrun the user reference count of the host name port, so that this function can not fail for that reason. Because of this, the user should not deallocate the port right if an overrun might have happened. Otherwise the reference count could drop to zero and the send right be destroyed while the user still expects to be able to use it. As the kernel does not make use of the number of extant send rights anyway, this is safe to do (the host port itself is never destroyed). The function returns @code{MACH_PORT_NULL} if a resource shortage prevented the reception of the send right. This function is also available in @file{mach/mach_traps.h}. @end deftypefun @cindex host control port @deftp {Data type} host_priv_t This is a @code{mach_port_t} and used to hold the port name of a privileged host control port. A send right to the host control port is inserted into the first task at bootstrap (@pxref{Modules}). This is the only way to get access to the host control port in Mach, so the initial task has to preserve the send right carefully, moving a copy of it to other privileged tasks if necessary and denying access to unprivileged tasks. @end deftp @node Host Information @section Host Information @deftypefun kern_return_t host_info (@w{host_t @var{host}}, @w{int @var{flavor}}, @w{host_info_t @var{host_info}}, @w{mach_msg_type_number_t *@var{host_info_count}}) The @code{host_info} function returns various information about @var{host}. @var{host_info} is an array of integers that is supplied by the caller. It will be filled with the requested information. @var{host_info_count} is supplied as the maximum number of integers in @var{host_info}. On return, it contains the actual number of integers in @var{host_info}. The maximum number of integers returned by any flavor is @code{HOST_INFO_MAX}. The type of information returned is defined by @var{flavor}, which can be one of the following: @table @code @item HOST_BASIC_INFO The function returns basic information about the host, as defined by @code{host_basic_info_t}. This includes the number of processors, their type, and the amount of memory installed in the system. The number of integers returned is @code{HOST_BASIC_INFO_COUNT}. For how to get more information about the processor, see @ref{Processor Interface}. @item HOST_PROCESSOR_SLOTS The function returns the numbers of the slots with active processors in them. The number of integers returned can be up to @code{max_cpus}, as returned by the @code{HOST_BASIC_INFO} flavor of @code{host_info}. @item HOST_SCHED_INFO The function returns information of interest to schedulers as defined by @code{host_sched_info_t}. The number of integers returned is @code{HOST_SCHED_INFO_COUNT}. @end table The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{host} is not a host or @var{flavor} is not recognized. The function returns @code{MIG_ARRAY_TOO_LARGE} if the returned info array is too large for @var{host_info}. In this case, @var{host_info} is filled as much as possible and @var{host_info_count} is set to the number of elements that would be returned if there were enough room. @c BUGS Availability limited. Systems without this call support a @c host_info call with an incompatible calling sequence. @end deftypefun @deftp {Data type} {struct host_basic_info} A pointer to this structure is returned in @var{host_info} by the @code{host_info} function and provides basic information about the host. You can cast a variable of type @code{host_info_t} to a pointer of this type if you provided it as the @var{host_info} parameter for the @code{HOST_BASIC_INFO} flavor of @code{host_info}. It has the following members: @table @code @item int max_cpus The maximum number of possible processors for which the kernel is configured. @item int avail_cpus The number of cpus currently available. @item vm_size_t memory_size The size of physical memory in bytes. @item cpu_type_t cpu_type The type of the master processor. @item cpu_subtype_t cpu_subtype The subtype of the master processor. @end table The type and subtype of the individual processors are also available by @code{processor_info}, see @ref{Processor Interface}. @end deftp @deftp {Data type} host_basic_info_t This is a pointer to a @code{struct host_basic_info}. @end deftp @deftp {Data type} {struct host_sched_info} A pointer to this structure is returned in @var{host_info} by the @code{host_info} function and provides information of interest to schedulers. You can cast a variable of type @code{host_info_t} to a pointer of this type if you provided it as the @var{host_info} parameter for the @code{HOST_SCHED_INFO} flavor of @code{host_info}. It has the following members: @table @code @item int min_timeout The minimum timeout and unit of time in milliseconds. @item int min_quantum The minimum quantum and unit of quantum in milliseconds. @end table @end deftp @deftp {Data type} host_sched_info_t This is a pointer to a @code{struct host_sched_info}. @end deftp @deftypefun kern_return_t host_kernel_version (@w{host_t @var{host}}, @w{kernel_version_t *@var{version}}) The @code{host_kernel_version} function returns the version string compiled into the kernel executing on @var{host} at the time it was built in the character string @var{version}. This string describes the version of the kernel. The constant @code{KERNEL_VERSION_MAX} should be used to dimension storage for the returned string if the @code{kernel_version_t} declaration is not used. If the version string compiled into the kernel is longer than @code{KERNEL_VERSION_MAX}, the result is truncated and not necessarily null-terminated. If @var{host} is not a valid send right to a host port, the function returns @code{KERN_INVALID_ARGUMENT}. If @var{version} points to inaccessible memory, it returns @code{KERN_INVALID_ADDRESS}, and @code{KERN_SUCCESS} otherwise. @end deftypefun @deftypefun kern_return_t host_get_boot_info (@w{host_priv_t @var{host_priv}}, @w{kernel_boot_info_t @var{boot_info}}) The @code{host_get_boot_info} function returns the boot-time information string supplied by the operator to the kernel executing on @var{host_priv} in the character string @var{boot_info}. The constant @code{KERNEL_BOOT_INFO_MAX} should be used to dimension storage for the returned string if the @code{kernel_boot_info_t} declaration is not used. If the boot-time information string supplied by the operator is longer than @code{KERNEL_BOOT_INFO_MAX}, the result is truncated and not necessarily null-terminated. @end deftypefun @node Host Time @section Host Time @deftp {Data type} time_value_t This is the representation of a time in Mach. It is a @code{struct time_value} and consists of the following members: @table @code @item integer_t seconds The number of seconds. @item integer_t microseconds The number of microseconds. @end table @end deftp The number of microseconds should always be smaller than @code{TIME_MICROS_MAX} (100000). A time with this property is @dfn{normalized}. Normalized time values can be manipulated with the following macros: @defmac time_value_add_usec (@w{time_value_t *@var{val}}, @w{integer_t *@var{micros}}) Add @var{micros} microseconds to @var{val}. If @var{val} is normalized and @var{micros} smaller than @code{TIME_MICROS_MAX}, @var{val} will be normalized afterwards. @end defmac @defmac time_value_add (@w{time_value_t *@var{result}}, @w{time_value_t *@var{addend}}) Add the values in @var{addend} to @var{result}. If both are normalized, @var{result} will be normalized afterwards. @end defmac A variable of type @code{time_value_t} can either represent a duration or a fixed point in time. In the latter case, it shall be interpreted as the number of seconds and microseconds after the epoch 1. Jan 1970. @deftypefun kern_return_t host_get_time (@w{host_t @var{host}}, @w{time_value_t *@var{current_time}}) Get the current time as seen by @var{host}. On success, the time passed since the epoch is returned in @var{current_time}. @end deftypefun @deftypefun kern_return_t host_set_time (@w{host_priv_t @var{host_priv}}, @w{time_value_t @var{new_time}}) Set the time of @var{host_priv} to @var{new_time}. @end deftypefun @deftypefun kern_return_t host_adjust_time (@w{host_priv_t @var{host_priv}}, @w{time_value_t @var{new_adjustment}}, @w{time_value_t *@var{old_adjustment}}) Arrange for the current time as seen by @var{host_priv} to be gradually changed by the adjustment value @var{new_adjustment}, and return the old adjustment value in @var{old_adjustment}. @end deftypefun For efficiency, the current time is available through a mapped-time interface. @deftp {Data type} mapped_time_value_t This structure defines the mapped-time interface. It has the following members: @table @code @item integer_t seconds The number of seconds. @item integer_t microseconds The number of microseconds. @item integer_t check_seconds This is a copy of the seconds value, which must be checked to protect against a race condition when reading out the two time values. @end table @end deftp Here is an example how to read out the current time using the mapped-time interface: @c XXX Complete the example. @example do @{ secs = mtime->seconds; usecs = mtime->microseconds; @} while (secs != mtime->check_seconds); @end example @node Host Reboot @section Host Reboot @deftypefun kern_return_t host_reboot (@w{host_priv_t @var{host_priv}}, @w{int @var{options}}) Reboot the host specified by @var{host_priv}. The argument @var{options} specifies the flags. The available flags are defined in @file{sys/reboot.h}: @table @code @item RB_HALT Do not reboot, but halt the machine. @item RB_DEBUGGER Do not reboot, but enter kernel debugger from user space. @end table If successful, the function might not return. @end deftypefun @node Processors and Processor Sets @chapter Processors and Processor Sets This section describes the Mach interface to processor sets and individual processors. The interface allows to group processors into sets and control the processors and processor sets. A processor is not a central part of the interface. It is mostly of relevance as a part of a processor set. Threads are always assigned to processor sets, and all processors in a set are equally involved in executing all threads assigned to that set. The processor set is represented by two ports, a name port @var{processor_set_name} used to query information about the host accessible to everyone, and a control port @var{processor_set} used to manipulate it. @menu * Processor Set Interface:: How to work with processor sets. * Processor Interface:: How to work with individual processors. @end menu @node Processor Set Interface @section Processor Set Interface @menu * Processor Set Ports:: Ports representing a processor set. * Processor Set Access:: How the processor sets are accessed. * Processor Set Creation:: How new processor sets are created. * Processor Set Destruction:: How processor sets are destroyed. * Tasks and Threads on Sets:: Assigning tasks, threads to processor sets. * Processor Set Priority:: Specifying the priority of a processor set. * Processor Set Policy:: Changing the processor set policies. * Processor Set Info:: Obtaining information about a processor set. @end menu @node Processor Set Ports @subsection Processor Set Ports @cindex processor set ports @cindex ports representing a processor set @cindex processor set name port @cindex port representing a processor set name @deftp {Data type} processor_set_name_t This is a @code{mach_port_t} and used to hold the port name of a processor set name port that names the processor set. Any task can get a send right to name port of a processor set. The processor set name port allows to get information about the processor set. @end deftp @cindex processor set port @deftp {Data type} processor_set_t This is a @code{mach_port_t} and used to hold the port name of a privileged processor set control port that represents the processor set. Operations on the processor set are implemented as remote procedure calls to the processor set port. The processor set port allows to manipulate the processor set. @end deftp @node Processor Set Access @subsection Processor Set Access @deftypefun kern_return_t host_processor_sets (@w{host_t @var{host}}, @w{processor_set_name_array_t *@var{processor_sets}}, @w{mach_msg_type_number_t *@var{processor_sets_count}}) The function @code{host_processor_sets} gets send rights to the name port for each processor set currently assigned to @var{host}. @code{host_processor_set_priv} can be used to obtain the control ports from these if desired. @var{processor_sets} is an array that is created as a result of this call. The caller may wish to @code{vm_deallocate} this array when the data is no longer needed. @var{processor_sets_count} is set to the number of processor sets in the @var{processor_sets}. This function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{host} is not a host. @end deftypefun @deftypefun kern_return_t host_processor_set_priv (@w{host_priv_t @var{host_priv}}, @w{processor_set_name_t @var{set_name}}, @w{processor_set_t *@var{set}}) The function @code{host_processor_set_priv} allows a privileged application to obtain the control port @var{set} for an existing processor set from its name port @var{set_name}. The privileged host port @var{host_priv} is required. This function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{host_priv} is not a valid host control port. @end deftypefun @deftypefun kern_return_t processor_set_default (@w{host_t @var{host}}, @w{processor_set_name_t *@var{default_set}}) The function @code{processor_set_default} returns the default processor set of @var{host} in @var{default_set}. The default processor set is used by all threads, tasks, and processors that are not explicitly assigned to other sets. processor_set_default returns a port that can be used to obtain information about this set (e.g. how many threads are assigned to it). This port cannot be used to perform operations on that set. This function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_ARGUMENT} if @var{host} is not a host and @code{KERN_INVALID_ADDRESS} if @var{default_set} points to inaccessible memory. @end deftypefun @node Processor Set Creation @subsection Processor Set Creation @deftypefun kern_return_t processor_set_create (@w{host_t @var{host}}, @w{processor_set_t *@var{new_set}}, @w{processor_set_name_t *@var{new_name}}) The function @code{processor_set_create} creates a new processor set on @var{host} and returns the two ports associated with it. The port returned in @var{new_set} is the actual port representing the set. It is used to perform operations such as assigning processors, tasks, or threads. The port returned in @var{new_name} identifies the set, and is used to obtain information about the set. This function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_ARGUMENT} if @var{host} is not a host, @code{KERN_INVALID_ADDRESS} if @var{new_set} or @var{new_name} points to inaccessible memory and @code{KERN_FAILURE} is the operating system does not support processor allocation. @end deftypefun @node Processor Set Destruction @subsection Processor Set Destruction @deftypefun kern_return_t processor_set_destroy (@w{processor_set_t @var{processor_set}}) The function @code{processor_set_destroy} destroys the specified processor set. Any assigned processors, tasks, or threads are reassigned to the default set. The object port for the processor set is required (not the name port). The default processor set cannot be destroyed. This function returns @code{KERN_SUCCESS} if the set was destroyed, @code{KERN_FAILURE} if an attempt was made to destroy the default processor set, or the operating system does not support processor allocation, and @code{KERN_INVALID_ARGUMENT} if @var{processor_set} is not a valid processor set control port. @end deftypefun @node Tasks and Threads on Sets @subsection Tasks and Threads on Sets @deftypefun kern_return_t processor_set_tasks (@w{processor_set_t @var{processor_set}}, @w{task_array_t *@var{task_list}}, @w{mach_msg_type_number_t *@var{task_count}}) The function @code{processor_set_tasks} gets send rights to the kernel port for each task currently assigned to @var{processor_set}. @var{task_list} is an array that is created as a result of this call. The caller may wish to @code{vm_deallocate} this array when the data is no longer needed. @var{task_count} is set to the number of tasks in the @var{task_list}. This function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{processor_set} is not a processor set. @end deftypefun @deftypefun kern_return_t processor_set_threads (@w{processor_set_t @var{processor_set}}, @w{thread_array_t *@var{thread_list}}, @w{mach_msg_type_number_t *@var{thread_count}}) The function @code{processor_set_thread} gets send rights to the kernel port for each thread currently assigned to @var{processor_set}. @var{thread_list} is an array that is created as a result of this call. The caller may wish to @code{vm_deallocate} this array when the data is no longer needed. @var{thread_count} is set to the number of threads in the @var{thread_list}. This function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{processor_set} is not a processor set. @end deftypefun @deftypefun kern_return_t task_assign (@w{task_t @var{task}}, @w{processor_set_t @var{processor_set}}, @w{boolean_t @var{assign_threads}}) The function @code{task_assign} assigns @var{task} the set @var{processor_set}. This assignment is for the purposes of determining the initial assignment of newly created threads in task. Any previous assignment of the task is nullified. Existing threads within the task are also reassigned if @var{assign_threads} is @code{TRUE}. They are not affected if it is @code{FALSE}. This function returns @code{KERN_SUCCESS} if the assignment has been performed and @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task, or @var{processor_set} is not a processor set on the same host as @var{task}. @end deftypefun @deftypefun kern_return_t task_assign_default (@w{task_t @var{task}}, @w{boolean_t @var{assign_threads}}) The function @code{task_assign_default} is a variant of @code{task_assign} that assigns the task to the default processor set on that task's host. This variant exists because the control port for the default processor set is privileged and not ususally available to users. This function returns @code{KERN_SUCCESS} if the assignment has been performed and @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task. @end deftypefun @deftypefun kern_return_t task_get_assignment (@w{task_t @var{task}}, @w{processor_set_name_t *@var{assigned_set}}) The function @code{task_get_assignment} returns the name of the processor set to which the thread is currently assigned in @var{assigned_set}. This port can only be used to obtain information about the processor set. This function returns @code{KERN_SUCCESS} if the assignment has been performed, @code{KERN_INVALID_ADDRESS} if @var{processor_set} points to inaccessible memory, and @code{KERN_INVALID_ARGUMENT} if @var{task} is not a task. @end deftypefun @deftypefun kern_return_t thread_assign (@w{thread_t @var{thread}}, @w{processor_set_t @var{processor_set}}) The function @code{thread_assign} assigns @var{thread} the set @var{processor_set}. After the assignment is completed, the thread only executes on processors assigned to the designated processor set. If there are no such processors, then the thread is unable to execute. Any previous assignment of the thread is nullified. Unix system call compatibility code may temporarily force threads to execute on the master processor. This function returns @code{KERN_SUCCESS} if the assignment has been performed and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread, or @var{processor_set} is not a processor set on the same host as @var{thread}. @end deftypefun @deftypefun kern_return_t thread_assign_default (@w{thread_t @var{thread}}) The function @code{thread_assign_default} is a variant of @code{thread_assign} that assigns the thread to the default processor set on that thread's host. This variant exists because the control port for the default processor set is privileged and not ususally available to users. This function returns @code{KERN_SUCCESS} if the assignment has been performed and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread. @end deftypefun @deftypefun kern_return_t thread_get_assignment (@w{thread_t @var{thread}}, @w{processor_set_name_t *@var{assigned_set}}) The function @code{thread_get_assignment} returns the name of the processor set to which the thread is currently assigned in @var{assigned_set}. This port can only be used to obtain information about the processor set. This function returns @code{KERN_SUCCESS} if the assignment has been performed, @code{KERN_INVALID_ADDRESS} if @var{processor_set} points to inaccessible memory, and @code{KERN_INVALID_ARGUMENT} if @var{thread} is not a thread. @end deftypefun @node Processor Set Priority @subsection Processor Set Priority @deftypefun kern_return_t processor_set_max_priority (@w{processor_set_t @var{processor_set}}, @w{int @var{max_priority}}, @w{boolean_t @var{change_threads}}) The function @code{processor_set_max_priority} is used to set the maximum priority for a processor set. The priority of a processor set is used only for newly created threads (thread's maximum priority is set to processor set's) and the assignment of threads to the set (thread's maximum priority is reduced if it exceeds the set's maximum priority, thread's priority is similarly reduced). @code{processor_set_max_priority} changes this priority. It also sets the maximum priority of all threads assigned to the processor set to this new priority if @var{change_threads} is @code{TRUE}. If this maximum priority is less than the priorities of any of these threads, their priorities will also be set to this new value. This function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{processor_set} is not a processor set or @var{priority} is not a valid priority. @end deftypefun @node Processor Set Policy @subsection Processor Set Policy @deftypefun kern_return_t processor_set_policy_enable (@w{processor_set_t @var{processor_set}}, @w{int @var{policy}}) @deftypefunx kern_return_t processor_set_policy_disable (@w{processor_set_t @var{processor_set}}, @w{int @var{policy}}, @w{boolean_t @var{change_threads}}) Processor sets may restrict the scheduling policies to be used for threads assigned to them. These two calls provide the mechanism for designating permitted and forbidden policies. The current set of permitted policies can be obtained from @code{processor_set_info}. Timesharing may not be forbidden by any processor set. This is a compromise to reduce the complexity of the assign operation; any thread whose policy is forbidden by the target processor set has its policy reset to timesharing. If the @var{change_threads} argument to @code{processor_set_policy_disable} is true, threads currently assigned to this processor set and using the newly disabled policy will have their policy reset to timesharing. @file{mach/policy.h} contains the allowed policies; it is included by @file{mach.h}. Not all policies (e.g. fixed priority) are supported by all systems. This function returns @code{KERN_SUCCESS} if the operation was completed successfully and @code{KERN_INVALID_ARGUMENT} if @var{processor_set} is not a processor set or @var{policy} is not a valid policy, or an attempt was made to disable timesharing. @end deftypefun @node Processor Set Info @subsection Processor Set Info @deftypefun kern_return_t processor_set_info (@w{processor_set_name_t @var{set_name}}, @w{int @var{flavor}}, @w{host_t *@var{host}}, @w{processor_set_info_t @var{processor_set_info}}, @w{mach_msg_type_number_t *@var{processor_set_info_count}}) The function @code{processor_set_info} returns the selected information array for a processor set, as specified by @var{flavor}. @var{host} is set to the host on which the processor set resides. This is the non-privileged host port. @var{processor_set_info} is an array of integers that is supplied by the caller and returned filled with specified information. @var{processor_set_info_count} is supplied as the maximum number of integers in @var{processor_set_info}. On return, it contains the actual number of integers in @var{processor_set_info}. The maximum number of integers returned by any flavor is @code{PROCESSOR_SET_INFO_MAX}. The type of information returned is defined by @var{flavor}, which can be one of the following: @table @code @item PROCESSOR_SET_BASIC_INFO The function returns basic information about the processor set, as defined by @code{processor_set_basic_info_t}. This includes the number of tasks and threads assigned to the processor set. The number of integers returned is @code{PROCESSOR_SET_BASIC_INFO_COUNT}. @item PROCESSOR_SET_SCHED_INFO The function returns information about the schduling policy for the processor set as defined by @code{processor_set_sched_info_t}. The number of integers returned is @code{PROCESSOR_SET_SCHED_INFO_COUNT}. @end table Some machines may define additional (machine-dependent) flavors. The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{processor_set} is not a processor set or @var{flavor} is not recognized. The function returns @code{MIG_ARRAY_TOO_LARGE} if the returned info array is too large for @var{processor_set_info}. In this case, @var{processor_set_info} is filled as much as possible and @var{processor_set_info_count} is set to the number of elements that would have been returned if there were enough room. @end deftypefun @deftp {Data type} {struct processor_set_basic_info} This structure is returned in @var{processor_set_info} by the @code{processor_set_info} function and provides basic information about the processor set. You can cast a variable of type @code{processor_set_info_t} to a pointer of this type if you provided it as the @var{processor_set_info} parameter for the @code{PROCESSOR_SET_BASIC_INFO} flavor of @code{processor_set_info}. It has the following members: @table @code @item int processor_count number of processors @item int task_count number of tasks @item int thread_count number of threads @item int load_average scaled load average @item int mach_factor scaled mach factor @end table @end deftp @deftp {Data type} processor_set_basic_info_t This is a pointer to a @code{struct processor_set_basic_info}. @end deftp @deftp {Data type} {struct processor_set_sched_info} This structure is returned in @var{processor_set_info} by the @code{processor_set_info} function and provides schedule information about the processor set. You can cast a variable of type @code{processor_set_info_t} to a pointer of this type if you provided it as the @var{processor_set_info} parameter for the @code{PROCESSOR_SET_SCHED_INFO} flavor of @code{processor_set_info}. It has the following members: @table @code @item int policies allowed policies @item int max_priority max priority for new threads @end table @end deftp @deftp {Data type} processor_set_sched_info_t This is a pointer to a @code{struct processor_set_sched_info}. @end deftp @node Processor Interface @section Processor Interface @cindex processor port @cindex port representing a processor @deftp {Data type} processor_t This is a @code{mach_port_t} and used to hold the port name of a processor port that represents the processor. Operations on the processor are implemented as remote procedure calls to the processor port. @end deftp @menu * Hosted Processors:: Getting a list of all processors on a host. * Processor Control:: Starting, stopping, controlling processors. * Processors and Sets:: Combining processors into processor sets. * Processor Info:: Obtaining information on processors. @end menu @node Hosted Processors @subsection Hosted Processors @deftypefun kern_return_t host_processors (@w{host_priv_t @var{host_priv}}, @w{processor_array_t *@var{processor_list}}, @w{mach_msg_type_number_t *@var{processor_count}}) The function @code{host_processors} gets send rights to the processor port for each processor existing on @var{host_priv}. This is the privileged port that allows its holder to control a processor. @var{processor_list} is an array that is created as a result of this call. The caller may wish to @code{vm_deallocate} this array when the data is no longer needed. @var{processor_count} is set to the number of processors in the @var{processor_list}. This function returns @code{KERN_SUCCESS} if the call succeeded, @code{KERN_INVALID_ARGUMENT} if @var{host_priv} is not a privileged host port, and @code{KERN_INVALID_ADDRESS} if @var{processor_count} points to inaccessible memory. @end deftypefun @node Processor Control @subsection Processor Control @deftypefun kern_return_t processor_start (@w{processor_t @var{processor}}) @deftypefunx kern_return_t processor_exit (@w{processor_t @var{processor}}) @deftypefunx kern_return_t processor_control (@w{processor_t @var{processor}}, @w{processor_info_t *@var{cmd}}, @w{mach_msg_type_number_t @var{count}}) Some multiprocessors may allow privileged software to control processors. The @code{processor_start}, @code{processor_exit}, and @code{processor_control} operations implement this. The interpretation of the command in @var{cmd} is machine dependent. A newly started processor is assigned to the default processor set. An exited processor is removed from the processor set to which it was assigned and ceases to be active. @var{count} contains the length of the command @var{cmd} as a number of ints. Availability limited. All of these operations are machine-dependent. They may do nothing. The ability to restart an exited processor is also machine-dependent. This function returns @code{KERN_SUCCESS} if the operation was performed, @code{KERN_FAILURE} if the operation was not performed (a likely reason is that it is not supported on this processor), @code{KERN_INVALID_ARGUMENT} if @var{processor} is not a processor, and @code{KERN_INVALID_ADDRESS} if @var{cmd} points to inaccessible memory. @end deftypefun @node Processors and Sets @subsection Processors and Sets @deftypefun kern_return_t processor_assign (@w{processor_t @var{processor}}, @w{processor_set_t @var{processor_set}}, @w{boolean_t @var{wait}}) The function @code{processor_assign} assigns @var{processor} to the the set @var{processor_set}. After the assignment is completed, the processor only executes threads that are assigned to that processor set. Any previous assignment of the processor is nullified. The master processor cannot be reassigned. All processors take clock interrupts at all times. The @var{wait} argument indicates whether the caller should wait for the assignment to be completed or should return immediately. Dedicated kernel threads are used to perform processor assignment, so setting wait to @code{FALSE} allows assignment requests to be queued and performed faster, especially if the kernel has more than one dedicated internal thread for processor assignment. Redirection of other device interrupts away from processors assigned to other than the default processor set is machine-dependent. Intermediaries that interpose on ports must be sure to interpose on both ports involved in this call if they interpose on either. This function returns @code{KERN_SUCCESS} if the assignment has been performed, @code{KERN_INVALID_ARGUMENT} if @var{processor} is not a processor, or @var{processor_set} is not a processor set on the same host as @var{processor}. @end deftypefun @deftypefun kern_return_t processor_get_assignment (@w{processor_t @var{processor}}, @w{processor_set_name_t *@var{assigned_set}}) The function @code{processor_get_assignment} obtains the current assignment of a processor. The name port of the processor set is returned in @var{assigned_set}. @end deftypefun @node Processor Info @subsection Processor Info @deftypefun kern_return_t processor_info (@w{processor_t @var{processor}}, @w{int @var{flavor}}, @w{host_t *@var{host}}, @w{processor_info_t @var{processor_info}}, @w{mach_msg_type_number_t *@var{processor_info_count}}) The function @code{processor_info} returns the selected information array for a processor, as specified by @var{flavor}. @var{host} is set to the host on which the processor set resides. This is the non-privileged host port. @var{processor_info} is an array of integers that is supplied by the caller and returned filled with specified information. @var{processor_info_count} is supplied as the maximum number of integers in @var{processor_info}. On return, it contains the actual number of integers in @var{processor_info}. The maximum number of integers returned by any flavor is @code{PROCESSOR_INFO_MAX}. The type of information returned is defined by @var{flavor}, which can be one of the following: @table @code @item PROCESSOR_BASIC_INFO The function returns basic information about the processor, as defined by @code{processor_basic_info_t}. This includes the slot number of the processor. The number of integers returned is @code{PROCESSOR_BASIC_INFO_COUNT}. @end table Machines which require more configuration information beyond the slot number are expected to define additional (machine-dependent) flavors. The function returns @code{KERN_SUCCESS} if the call succeeded and @code{KERN_INVALID_ARGUMENT} if @var{processor} is not a processor or @var{flavor} is not recognized. The function returns @code{MIG_ARRAY_TOO_LARGE} if the returned info array is too large for @var{processor_info}. In this case, @var{processor_info} is filled as much as possible and @var{processor_infoCnt} is set to the number of elements that would have been returned if there were enough room. @end deftypefun @deftp {Data type} {struct processor_basic_info} This structure is returned in @var{processor_info} by the @code{processor_info} function and provides basic information about the processor. You can cast a variable of type @code{processor_info_t} to a pointer of this type if you provided it as the @var{processor_info} parameter for the @code{PROCESSOR_BASIC_INFO} flavor of @code{processor_info}. It has the following members: @table @code @item cpu_type_t cpu_type cpu type @item cpu_subtype_t cpu_subtype cpu subtype @item boolean_t running is processor running? @item int slot_num slot number @item boolean_t is_master is this the master processor @end table @end deftp @deftp {Data type} processor_basic_info_t This is a pointer to a @code{struct processor_basic_info}. @end deftp @node Device Interface @chapter Device Interface The GNU Mach microkernel provides a simple device interface that allows the user space programs to access the underlying hardware devices. Each device has a unique name, which is a string up to 127 characters long. To open a device, the device master port has to be supplied. The device master port is only available through the bootstrap port. Anyone who has control over the device master port can use all hardware devices. @c XXX FIXME bootstrap port, bootstrap @cindex device port @cindex port representing a device @deftp {Data type} device_t This is a @code{mach_port_t} and used to hold the port name of a device port that represents the device. Operations on the device are implemented as remote procedure calls to the device port. Each device provides a sequence of records. The length of a record is specific to the device. Data can be transferred ``out-of-line'' or ``in-line'' (@pxref{Memory}). @end deftp All constants and functions in this chapter are defined in @file{device/device.h}. @menu * Device Reply Server:: Handling device reply messages. * Device Open:: Opening hardware devices. * Device Close:: Closing hardware devices. * Device Read:: Reading data from the device. * Device Write:: Writing data to the device. * Device Map:: Mapping devices into virtual memory. * Device Status:: Querying and manipulating a device. * Device Filter:: Filtering packets arriving on a device. @end menu @node Device Reply Server @section Device Reply Server Beside the usual synchronous interface, an asynchronous interface is provided. For this, the caller has to receive and handle the reply messages seperately from the function call. @deftypefun boolean_t device_reply_server (@w{msg_header_t *@var{in_msg}}, @w{msg_header_t *@var{out_msg}}) The function @code{device_reply_server} is produced by the remote procedure call generator to handle a received message. This function does all necessary argument handling, and actually calls one of the following functions: @code{ds_device_open_reply}, @code{ds_device_read_reply}, @code{ds_device_read_reply_inband}, @code{ds_device_write_reply} and @code{ds_device_write_reply_inband}. The @var{in_msg} argument is the message that has been received from the kernel. The @var{out_msg} is a reply message, but this is not used for this server. The function returns @code{TRUE} to indicate that the message in question was applicable to this interface, and that the appropriate routine was called to interpret the message. It returns @code{FALSE} to indicate that the message did not apply to this interface, and that no other action was taken. @end deftypefun @node Device Open @section Device Open @deftypefun kern_return_t device_open (@w{mach_port_t @var{master_port}}, @w{dev_mode_t @var{mode}}, @w{dev_name_t @var{name}}, @w{device_t *@var{device}}) The function @code{device_open} opens the device @var{name} and returns a port to it in @var{device}. The open count for the device is incremented by one. If the open count was 0, the open handler for the device is invoked. @var{master_port} must hold the master device port. @var{name} specifies the device to open, and is a string up to 128 characters long. @var{mode} is the open mode. It is a bitwise-or of the following constants: @table @code @item D_READ Request read access for the device. @item D_WRITE Request write access for the device. @item D_NODELAY Do not delay an open. @c XXX Is this really used at all? Maybe for tape drives? What does it mean? @end table The function returns @code{D_SUCCESS} if the device was successfully opened, @code{D_INVALID_OPERATION} if @var{master_port} is not the master device port, @code{D_WOULD_BLOCK} is the device is busy and @code{D_NOWAIT} was specified in mode, @code{D_ALREADY_OPEN} if the device is already open in an incompatible mode and @code{D_NO_SUCH_DEVICE} if @var{name} does not denote a know device. @end deftypefun @deftypefun kern_return_t device_open_request (@w{mach_port_t @var{master_port}}, @w{mach_port_t @var{reply_port}}, @w{dev_mode_t @var{mode}}, @w{dev_name_t @var{name}}) @deftypefunx kern_return_t ds_device_open_reply (@w{mach_port_t @var{reply_port}}, @w{kern_return_t @var{return}}, @w{device_t *@var{device}}) This is the asynchronous form of the @code{device_open} function. @code{device_open_request} performs the open request. The meaning for the parameters is as in @code{device_open}. Additionally, the caller has to supply a reply port to which the @code{ds_device_open_reply} message is sent by the kernel when the open has been performed. The return value of the open operation is stored in @var{return_code}. As neither function receives a reply message, only message transmission errors apply. If no error occurs, @code{KERN_SUCCESS} is returned. @end deftypefun @node Device Close @section Device Close @deftypefun kern_return_t device_close (@w{device_t @var{device}}) The function @code{device_close} decrements the open count of the device by one. If the open count drops to zero, the close handler for the device is called. The device to close is specified by its port @var{device}. The function returns @code{D_SUCCESS} if the device was successfully closed and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port. @end deftypefun @node Device Read @section Device Read @deftypefun kern_return_t device_read (@w{device_t @var{device}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{int @var{bytes_wanted}}, @w{io_buf_ptr_t *@var{data}}, @w{mach_msg_type_number_t *@var{data_count}}) The function @code{device_read} reads @var{bytes_wanted} bytes from @var{device}, and stores them in a buffer allocated with @code{vm_allocate}, which address is returned in @var{data}. The caller must deallocated it if it is no longer needed. The number of bytes actually returned is stored in @var{data_count}. If @var{mode} is @code{D_NOWAIT}, the operation does not block. Otherwise @var{mode} should be 0. @var{recnum} is the record number to be read, its meaning is device specific. The function returns @code{D_SUCCESS} if some data was successfully read, @code{D_WOULD_BLOCK} if no data is currently available and @code{D_NOWAIT} is specified, and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port. @end deftypefun @deftypefun kern_return_t device_read_inband (@w{device_t @var{device}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{int @var{bytes_wanted}}, @w{io_buf_ptr_inband_t *@var{data}}, @w{mach_msg_type_number_t *@var{data_count}}) The @code{device_read_inband} function works as the @code{device_read} function, except that the data is returned ``in-line'' in the reply IPC message (@pxref{Memory}). @end deftypefun @deftypefun kern_return_t device_read_request (@w{device_t @var{device}}, @w{mach_port_t @var{reply_port}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{int @var{bytes_wanted}}) @deftypefunx kern_return_t ds_device_read_reply (@w{mach_port_t @var{reply_port}}, @w{kern_return_t @var{return_code}}, @w{io_buf_ptr_t @var{data}}, @w{mach_msg_type_number_t @var{data_count}}) This is the asynchronous form of the @code{device_read} function. @code{device_read_request} performs the read request. The meaning for the parameters is as in @code{device_read}. Additionally, the caller has to supply a reply port to which the @code{ds_device_read_reply} message is sent by the kernel when the read has been performed. The return value of the read operation is stored in @var{return_code}. As neither function receives a reply message, only message transmission errors apply. If no error occurs, @code{KERN_SUCCESS} is returned. @end deftypefun @deftypefun kern_return_t device_read_request_inband (@w{device_t @var{device}}, @w{mach_port_t @var{reply_port}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{int @var{bytes_wanted}}) @deftypefunx kern_return_t ds_device_read_reply_inband (@w{mach_port_t @var{reply_port}}, @w{kern_return_t @var{return_code}}, @w{io_buf_ptr_t @var{data}}, @w{mach_msg_type_number_t @var{data_count}}) The @code{device_read_request_inband} and @code{ds_device_read_reply_inband} functions work as the @code{device_read_request} and @code{ds_device_read_reply} functions, except that the data is returned ``in-line'' in the reply IPC message (@pxref{Memory}). @end deftypefun @node Device Write @section Device Write @deftypefun kern_return_t device_write (@w{device_t @var{device}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{io_buf_ptr_t @var{data}}, @w{mach_msg_type_number_t @var{data_count}}, @w{int *@var{bytes_written}}) The function @code{device_write} writes @var{data_count} bytes from the buffer @var{data} to @var{device}. The number of bytes actually written is returned in @var{bytes_written}. If @var{mode} is @code{D_NOWAIT}, the function returns without waiting for I/O completion. Otherwise @var{mode} should be 0. @var{recnum} is the record number to be written, its meaning is device specific. The function returns @code{D_SUCCESS} if some data was successfully written and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port or the device is dead or not completely open. @end deftypefun @deftypefun kern_return_t device_write_inband (@w{device_t @var{device}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{int @var{bytes_wanted}}, @w{io_buf_ptr_inband_t *@var{data}}, @w{mach_msg_type_number_t *@var{data_count}}) The @code{device_write_inband} function works as the @code{device_write} function, except that the data is sent ``in-line'' in the request IPC message (@pxref{Memory}). @end deftypefun @deftypefun kern_return_t device_write_request (@w{device_t @var{device}}, @w{mach_port_t @var{reply_port}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{io_buf_ptr_t @var{data}}, @w{mach_msg_type_number_t @var{data_count}}) @deftypefunx kern_return_t ds_device_write_reply (@w{mach_port_t @var{reply_port}}, @w{kern_return_t @var{return_code}}, @w{int @var{bytes_written}}) This is the asynchronous form of the @code{device_write} function. @code{device_write_request} performs the write request. The meaning for the parameters is as in @code{device_write}. Additionally, the caller has to supply a reply port to which the @code{ds_device_write_reply} message is sent by the kernel when the write has been performed. The return value of the write operation is stored in @var{return_code}. As neither function receives a reply message, only message transmission errors apply. If no error occurs, @code{KERN_SUCCESS} is returned. @end deftypefun @deftypefun kern_return_t device_write_request_inband (@w{device_t @var{device}}, @w{mach_port_t @var{reply_port}}, @w{dev_mode_t @var{mode}}, @w{recnum_t @var{recnum}}, @w{io_buf_ptr_t @var{data}}, @w{mach_msg_type_number_t @var{data_count}}) @deftypefunx kern_return_t ds_device_write_reply_inband (@w{mach_port_t @var{reply_port}}, @w{kern_return_t @var{return_code}}, @w{int @var{bytes_written}}) The @code{device_write_request_inband} and @code{ds_device_write_reply_inband} functions work as the @code{device_write_request} and @code{ds_device_write_reply} functions, except that the data is sent ``in-line'' in the request IPC message (@pxref{Memory}). @end deftypefun @node Device Map @section Device Map @deftypefun kern_return_t device_map (@w{device_t @var{device}}, @w{vm_prot_t @var{prot}}, @w{vm_offset_t @var{offset}}, @w{vm_size_t @var{size}}, @w{mach_port_t *@var{pager}}, @w{int @var{unmap}}) The function @code{device_map} creates a new memory manager for @var{device} and returns a port to it in @var{pager}. The memory manager is usable as a memory object in a @code{vm_map} call. The call is device dependant. The protection for the memory object is specified by @var{prot}. The memory object starts at @var{offset} within the device and extends @var{size} bytes. @var{unmap} is currently unused. @c XXX I suppose the caller should set it to 0. The function returns @code{D_SUCCESS} if some data was successfully written and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port or the device is dead or not completely open. @end deftypefun @node Device Status @section Device Status @deftypefun kern_return_t device_set_status (@w{device_t @var{device}}, @w{dev_flavor_t @var{flavor}}, @w{dev_status_t @var{status}}, @w{mach_msg_type_number_t @var{status_count}}) The function @code{device_set_status} sets the status of a device. The possible values for @var{flavor} and their interpretation is device specific. The function returns @code{D_SUCCESS} if some data was successfully written and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port or the device is dead or not completely open. @end deftypefun @deftypefun kern_return_t device_get_status (@w{device_t @var{device}}, @w{dev_flavor_t @var{flavor}}, @w{dev_status_t @var{status}}, @w{mach_msg_type_number_t *@var{status_count}}) The function @code{device_get_status} gets the status of a device. The possible values for @var{flavor} and their interpretation is device specific. The function returns @code{D_SUCCESS} if some data was successfully written and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port or the device is dead or not completely open. @end deftypefun @node Device Filter @section Device Filter @deftypefun kern_return_t device_set_filter (@w{device_t @var{device}}, @w{mach_port_t @var{receive_port}}, @w{mach_msg_type_name_t @var{receive_port_type}}, @w{int @var{priority}}, @w{filter_array_t @var{filter}}, @w{mach_msg_type_number_t @var{filter_count}}) The function @code{device_set_filter} makes it possible to filter out selected data arriving at the device and forward it to a port. @var{filter} is a list of filter commands, which are applied to incoming data to determine if the data should be sent to @var{receive_port}. The IPC type of the send right is specified by @var{receive_port_right}, it is either @code{MACH_MSG_TYPE_MAKE_SEND} or @code{MACH_MSG_TYPE_MOVE_SEND}. The @var{priority} value is used to order multiple filters. There can be up to @code{NET_MAX_FILTER} commands in @var{filter}. The actual number of commands is passed in @var{filter_count}. For the purpose of the filter test, an internal stack is provided. After all commands have been processed, the value on the top of the stack determines if the data is forwarded or the next filter is tried. @c XXX The following description was taken verbatim from the @c kernel_interface.pdf document. Each word of the command list specifies a data (push) operation (high order NETF_NBPO bits) as well as a binary operator (low order NETF_NBPA bits). The value to be pushed onto the stack is chosen as follows. @table @code @item NETF_PUSHLIT Use the next short word of the filter as the value. @item NETF_PUSHZERO Use 0 as the value. @item NETF_PUSHWORD+N Use short word N of the ``data'' portion of the message as the value. @item NETF_PUSHHDR+N Use short word N of the ``header'' portion of the message as the value. @item NETF_PUSHIND+N Pops the top long word from the stack and then uses short word N of the ``data'' portion of the message as the value. @item NETF_PUSHHDRIND+N Pops the top long word from the stack and then uses short word N of the ``header'' portion of the message as the value. @item NETF_PUSHSTK+N Use long word N of the stack (where the top of stack is long word 0) as the value. @item NETF_NOPUSH Don't push a value. @end table The unsigned value so chosen is promoted to a long word before being pushed. Once a value is pushed (except for the case of @code{NETF_NOPUSH}), the top two long words of the stack are popped and a binary operator applied to them (with the old top of stack as the second operand). The result of the operator is pushed on the stack. These operators are: @table @code @item NETF_NOP Don't pop off any values and do no operation. @item NETF_EQ Perform an equal comparison. @item NETF_LT Perform a less than comparison. @item NETF_LE Perform a less than or equal comparison. @item NETF_GT Perform a greater than comparison. @item NETF_GE Perform a greater than or equal comparison. @item NETF_AND Perform a bitise boolean AND operation. @item NETF_OR Perform a bitise boolean inclusive OR operation. @item NETF_XOR Perform a bitise boolean exclusive OR operation. @item NETF_NEQ Perform a not equal comparison. @item NETF_LSH Perform a left shift operation. @item NETF_RSH Perform a right shift operation. @item NETF_ADD Perform an addition. @item NETF_SUB Perform a subtraction. @item NETF_COR Perform an equal comparison. If the comparison is @code{TRUE}, terminate the filter list. Otherwise, pop the result of the comparison off the stack. @item NETF_CAND Perform an equal comparison. If the comparison is @code{FALSE}, terminate the filter list. Otherwise, pop the result of the comparison off the stack. @item NETF_CNOR Perform a not equal comparison. If the comparison is @code{FALSE}, terminate the filter list. Otherwise, pop the result of the comparison off the stack. @item NETF_CNAND Perform a not equal comparison. If the comparison is @code{TRUE}, terminate the filter list. Otherwise, pop the result of the comparison off the stack. The scan of the filter list terminates when the filter list is emptied, or a @code{NETF_C...} operation terminates the list. At this time, if the final value of the top of the stack is @code{TRUE}, then the message is accepted for the filter. @end table The function returns @code{D_SUCCESS} if some data was successfully written, @code{D_INVALID_OPERATION} if @var{receive_port} is not a valid send right, and @code{D_NO_SUCH_DEVICE} if @var{device} does not denote a device port or the device is dead or not completely open. @end deftypefun @node Kernel Debugger @chapter Kernel Debugger The GNU Mach kernel debugger @code{ddb} is a powerful built-in debugger with a gdb like syntax. It is enabled at compile time using the @option{--enable-kdb} option. Whenever you want to enter the debugger while running the kernel, you can press the key combination @key{Ctrl-Alt-D}. @menu * Operation:: Basic architecture of the kernel debugger. * Commands:: Available commands in the kernel debugger. * Variables:: Access of variables from the kernel debugger. * Expressions:: Usage of expressions in the kernel debugger. @end menu @node Operation @section Operation The current location is called @dfn{dot}. The dot is displayed with a hexadecimal format at a prompt. Examine and write commands update dot to the address of the last line examined or the last location modified, and set @dfn{next} to the address of the next location to be examined or changed. Other commands don't change dot, and set next to be the same as dot. The general command syntax is: @example @var{command}[/@var{modifier}] @var{address} [,@var{count}] @end example @kbd{!!} repeats the previous command, and a blank line repeats from the address next with count 1 and no modifiers. Specifying @var{address} sets dot to the address. Omitting @var{address} uses dot. A missing @var{count} is taken to be 1 for printing commands or infinity for stack traces. Current @code{ddb} is enhanced to support multi-thread debugging. A break point can be set only for a specific thread, and the address space or registers of non current thread can be examined or modified if supported by machine dependent routines. For example, @example break/t mach_msg_trap $task11.0 @end example sets a break point at @code{mach_msg_trap} for the first thread of task 11 listed by a @code{show all threads} command. In the above example, @code{$task11.0} is translated to the corresponding thread structure's address by variable translation mechanism described later. If a default target thread is set in a variable @code{$thread}, the @code{$task11.0} can be omitted. In general, if @code{t} is specified in a modifier of a command line, a specified thread or a default target thread is used as a target thread instead of the current one. The @code{t} modifier in a command line is not valid in evaluating expressions in a command line. If you want to get a value indirectly from a specific thread's address space or access to its registers within an expression, you have to specify a default target thread in advance, and to use @code{:t} modifier immediately after the indirect access or the register reference like as follows: @example set $thread $task11.0 print $eax:t *(0x100):tuh @end example No sign extension and indirection @code{size(long, half word, byte)} can be specified with @code{u}, @code{l}, @code{h} and @code{b} respectively for the indirect access. Note: Support of non current space/register access and user space break point depend on the machines. If not supported, attempts of such operation may provide incorrect information or may cause strange behavior. Even if supported, the user space access is limited to the pages resident in the main memory at that time. If a target page is not in the main memory, an error will be reported. @code{ddb} has a feature like a command @code{more} for the output. If an output line exceeds the number set in the @code{$lines} variable, it displays @samp{--db_more--} and waits for a response. The valid responses for it are: @table @kbd @item @key{SPC} one more page @item @key{RET} one more line @item q abort the current command, and return to the command input mode @end table @node Commands @section Commands @table @code @item examine(x) [/@var{modifier}] @var{addr}[,@var{count}] [ @var{thread} ] Display the addressed locations according to the formats in the modifier. Multiple modifier formats display multiple locations. If no format is specified, the last formats specified for this command is used. Address space other than that of the current thread can be specified with @code{t} option in the modifier and @var{thread} parameter. The format characters are @table @code @item b look at by bytes(8 bits) @item h look at by half words(16 bits) @item l look at by long words(32 bits) @item a print the location being displayed @item , skip one unit producing no output @item A print the location with a line number if possible @item x display in unsigned hex @item z display in signed hex @item o display in unsigned octal @item d display in signed decimal @item u display in unsigned decimal @item r display in current radix, signed @item c display low 8 bits as a character. Non-printing characters are displayed as an octal escape code (e.g. '\000'). @item s display the null-terminated string at the location. Non-printing characters are displayed as octal escapes. @item m display in unsigned hex with character dump at the end of each line. The location is also displayed in hex at the beginning of each line. @item i display as an instruction @item I display as an instruction with possible alternate formats depending on the machine: @table @code @item vax don't assume that each external label is a procedure entry mask @item i386 don't round to the next long word boundary @item mips print register contents @end table @end table @item xf Examine forward. It executes an examine command with the last specified parameters to it except that the next address displayed by it is used as the start address. @item xb Examine backward. It executes an examine command with the last specified parameters to it except that the last start address subtracted by the size displayed by it is used as the start address. @item print[/axzodurc] @var{addr1} [ @var{addr2} @dots{} ] Print @var{addr}'s according to the modifier character. Valid formats are: @code{a} @code{x} @code{z} @code{o} @code{d} @code{u} @code{r} @code{c}. If no modifier is specified, the last one specified to it is used. @var{addr} can be a string, and it is printed as it is. For example, @example print/x "eax = " $eax "\necx = " $ecx "\n" @end example will print like @example eax = xxxxxx ecx = yyyyyy @end example @item write[/bhlt] @var{addr} [ @var{thread} ] @var{expr1} [ @var{expr2} @dots{} ] Write the expressions at succeeding locations. The write unit size can be specified in the modifier with a letter b (byte), h (half word) or l(long word) respectively. If omitted, long word is assumed. Target address space can also be specified with @code{t} option in the modifier and @var{thread} parameter. Warning: since there is no delimiter between expressions, strange things may happen. It's best to enclose each expression in parentheses. @item set $@var{variable} [=] @var{expr} Set the named variable or register with the value of @var{expr}. Valid variable names are described below. @item break[/tuTU] @var{addr}[,@var{count}] [ @var{thread1} @dots{} ] Set a break point at @var{addr}. If count is supplied, continues (@var{count}-1) times before stopping at the break point. If the break point is set, a break point number is printed with @samp{#}. This number can be used in deleting the break point or adding conditions to it. @table @code @item t Set a break point only for a specific thread. The thread is specified by @var{thread} parameter, or default one is used if the parameter is omitted. @item u Set a break point in user space address. It may be combined with @code{t} or @code{T} option to specify the non-current target user space. Without @code{u} option, the address is considered in the kernel space, and wrong space address is rejected with an error message. This option can be used only if it is supported by machine dependent routines. @item T Set a break point only for threads in a specific task. It is like @code{t} option except that the break point is valid for all threads which belong to the same task as the specified target thread. @item U Set a break point in shared user space address. It is like @code{u} option, except that the break point is valid for all threads which share the same address space even if @code{t} option is specified. @code{t} option is used only to specify the target shared space. Without @code{t} option, @code{u} and @code{U} have the same meanings. @code{U} is useful for setting a user space break point in non-current address space with @code{t} option such as in an emulation library space. This option can be used only if it is supported by machine dependent routines. @end table Warning: if a user text is shadowed by a normal user space debugger, user space break points may not work correctly. Setting a break point at the low-level code paths may also cause strange behavior. @item delete[/tuTU] @var{addr}|#@var{number} [ @var{thread1} @dots{} ] Delete the break point. The target break point can be specified by a break point number with @code{#}, or by @var{addr} like specified in @code{break} command. @item cond #@var{number} [ @var{condition} @var{commands} ] Set or delete a condition for the break point specified by the @var{number}. If the @var{condition} and @var{commands} are null, the condition is deleted. Otherwise the condition is set for it. When the break point is hit, the @var{condition} is evaluated. The @var{commands} will be executed if the condition is true and the break point count set by a break point command becomes zero. @var{commands} is a list of commands separated by semicolons. Each command in the list is executed in that order, but if a @code{continue} command is executed, the command execution stops there, and the stopped thread resumes execution. If the command execution reaches the end of the list, and it enters into a command input mode. For example, @example set $work0 0 break/Tu xxx_start $task7.0 cond #1 (1) set $work0 1; set $work1 0; cont break/T vm_fault $task7.0 cond #2 ($work0) set $work1 ($work1+1); cont break/Tu xxx_end $task7.0 cond #3 ($work0) print $work1 " faults\n"; set $work0 0 cont @end example will print page fault counts from @code{xxx_start} to @code{xxx_end} in @code{task7}. @item step[/p] [,@var{count}] Single step @var{count} times. If @code{p} option is specified, print each instruction at each step. Otherwise, only print the last instruction. Warning: depending on machine type, it may not be possible to single-step through some low-level code paths or user space code. On machines with software-emulated single-stepping (e.g., pmax), stepping through code executed by interrupt handlers will probably do the wrong thing. @item continue[/c] Continue execution until a breakpoint or watchpoint. If @code{/c}, count instructions while executing. Some machines (e.g., pmax) also count loads and stores. Warning: when counting, the debugger is really silently single-stepping. This means that single-stepping on low-level code may cause strange behavior. @item until Stop at the next call or return instruction. @item next[/p] Stop at the matching return instruction. If @code{p} option is specified, print the call nesting depth and the cumulative instruction count at each call or return. Otherwise, only print when the matching return is hit. @item match[/p] A synonym for @code{next}. @item trace[/tu] [ @var{frame_addr}|@var{thread} ][,@var{count}] Stack trace. @code{u} option traces user space; if omitted, only traces kernel space. If @code{t} option is specified, it shows the stack trace of the specified thread or a default target thread. Otherwise, it shows the stack trace of the current thread from the frame address specified by a parameter or from the current frame. @var{count} is the number of frames to be traced. If the @var{count} is omitted, all frames are printed. Warning: If the target thread's stack is not in the main memory at that time, the stack trace will fail. User space stack trace is valid only if the machine dependent code supports it. @item search[/bhl] @var{addr} @var{value} [@var{mask}] [,@var{count}] Search memory for a value. This command might fail in interesting ways if it doesn't find the searched-for value. This is because @code{ddb} doesn't always recover from touching bad memory. The optional count argument limits the search. @item macro @var{name} @var{commands} Define a debugger macro as @var{name}. @var{commands} is a list of commands to be associated with the macro. In the expressions of the command list, a variable @code{$argxx} can be used to get a parameter passed to the macro. When a macro is called, each argument is evaluated as an expression, and the value is assigned to each parameter, @code{$arg1}, @code{$arg2}, @dots{} respectively. 10 @code{$arg} variables are reserved to each level of macros, and they can be used as local variables. The nesting of macro can be allowed up to 5 levels. For example, @example macro xinit set $work0 $arg1 macro xlist examine/m $work0,4; set $work0 *($work0) xinit *(xxx_list) xlist @enddots{} @end example will print the contents of a list starting from @code{xxx_list} by each @code{xlist} command. @item dmacro @var{name} Delete the macro named @var{name}. @item show all threads[/ul] Display all tasks and threads information. This version of @code{ddb} prints more information than previous one. It shows UNIX process information like @command{ps} for each task. The UNIX process information may not be shown if it is not supported in the machine, or the bottom of the stack of the target task is not in the main memory at that time. It also shows task and thread identification numbers. These numbers can be used to specify a task or a thread symbolically in various commands. The numbers are valid only in the same debugger session. If the execution is resumed again, the numbers may change. The current thread can be distinguished from others by a @code{#} after the thread id instead of @code{:}. Without @code{l} option, it only shows thread id, thread structure address and the status for each thread. The status consists of 5 letters, R(run), W(wait), S(sus­ pended), O(swapped out) and N(interruptible), and if corresponding status bit is off, @code{.} is printed instead. If @code{l} option is specified, more detail information is printed for each thread. @item show task [ @var{addr} ] Display the information of a task specified by @var{addr}. If @var{addr} is omitted, current task information is displayed. @item show thread [ @var{addr} ] Display the information of a thread specified by @var{addr}. If @var{addr} is omitted, current thread information is displayed. @item show registers[/tu [ @var{thread} ]] Display the register set. Target thread can be specified with @code{t} option and @var{thread} parameter. If @code{u} option is specified, it displays user registers instead of kernel or currently saved one. Warning: The support of @code{t} and @code{u} option depends on the machine. If not supported, incorrect information will be displayed. @item show map @var{addr} Prints the @code{vm_map} at @var{addr}. @item show object @var{addr} Prints the @code{vm_object} at @var{addr}. @item show page @var{addr} Prints the @code{vm_page} structure at @var{addr}. @item show port @var{addr} Prints the @code{ipc_port} structure at @var{addr}. @item show ipc_port[/t [ @var{thread} ]] Prints all @code{ipc_port} structure's addresses the target thread has. The target thread is a current thread or that specified by a parameter. @item show macro [ @var{name} ] Show the definitions of macros. If @var{name} is specified, only the definition of it is displayed. Otherwise, definitions of all macros are displayed. @item show watches Displays all watchpoints. @item watch[/T] @var{addr},@var{size} [ @var{task} ] Set a watchpoint for a region. Execution stops when an attempt to modify the region occurs. The @var{size} argument defaults to 4. Without @code{T} option, @var{addr} is assumed to be a kernel address. If you want to set a watch point in user space, specify @code{T} and @var{task} parameter where the address belongs to. If the @var{task} parameter is omitted, a task of the default target thread or a current task is assumed. If you specify a wrong space address, the request is rejected with an error message. Warning: Attempts to watch wired kernel memory may cause unrecoverable error in some systems such as i386. Watchpoints on user addresses work best. @end table @node Variables @section Variables The debugger accesses registers and variables as $@var{name}. Register names are as in the @code{show registers} command. Some variables are suffixed with numbers, and may have some modifier following a colon immediately after the variable name. For example, register variables can have @code{u} and @code{t} modifier to indicate user register and that of a default target thread instead of that of the current thread (e.g. @code{$eax:tu}). Built-in variables currently supported are: @table @code @item task@var{xx}[.@var{yy}] Task or thread structure address. @var{xx} and @var{yy} are task and thread identification numbers printed by a @code{show all threads} command respectively. This variable is read only. @item thread The default target thread. The value is used when @code{t} option is specified without explicit thread structure address parameter in command lines or expression evaluation. @item radix Input and output radix @item maxoff Addresses are printed as @var{symbol}+@var{offset} unless offset is greater than maxoff. @item maxwidth The width of the displayed line. @item lines The number of lines. It is used by @code{more} feature. @item tabstops Tab stop width. @item arg@var{xx} Parameters passed to a macro. @var{xx} can be 1 to 10. @item work@var{xx} Work variable. @var{xx} can be 0 to 31. @end table @node Expressions @section Expressions Almost all expression operators in C are supported except @code{~}, @code{^}, and unary @code{&}. Special rules in @code{ddb} are: @table @code @item @var{identifier} name of a symbol. It is translated to the address(or value) of it. @code{.} and @code{:} can be used in the identifier. If supported by an object format dependent routine, [@var{file_name}:]@var{func}[:@var{line_number}] [@var{file_name}:]@var{variable}, and @var{file_name}[:@var{line_number}] can be accepted as a symbol. The symbol may be prefixed with @code{@var{symbol_table_name}::} like @code{emulator::mach_msg_trap} to specify other than kernel symbols. @item @var{number} radix is determined by the first two letters: @table @code @item 0x hex @item 0o octal @item 0t decimal @end table otherwise, follow current radix. @item . dot @item + next @item .. address of the start of the last line examined. Unlike dot or next, this is only changed by @code{examine} or @code{write} command. @item ´ last address explicitly specified. @item $@var{variable} register name or variable. It is translated to the value of it. It may be followed by a @code{:} and modifiers as described above. @item a multiple of right hand side. @item *@var{expr} indirection. It may be followed by a @code{:} and modifiers as described above. @end table @include gpl.texi @node Documentation License @appendix Documentation License This manual is copyrighted and licensed under the GNU Free Documentation license. Parts of this manual are derived from the Mach manual packages originally provided by Carnegie Mellon University. @menu * Free Documentation License:: The GNU Free Documentation License. * CMU License:: The CMU license applies to the original Mach kernel and its documentation. @end menu @lowersections @include fdl.texi @raisesections @node CMU License @appendixsec CMU License @quotation @display Mach Operating System Copyright @copyright{} 1991,1990,1989 Carnegie Mellon University All Rights Reserved. @end display Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. @sc{carnegie mellon allows free use of this software in its ``as is'' condition. carnegie mellon disclaims any liability of any kind for any damages whatsoever resulting from the use of this software.} Carnegie Mellon requests users of this software to return to @display Software Distribution Coordinator School of Computer Science Carnegie Mellon University Pittsburgh PA 15213-3890 @end display @noindent or @email{Software.Distribution@@CS.CMU.EDU} any improvements or extensions that they make and grant Carnegie Mellon the rights to redistribute these changes. @end quotation @node Concept Index @unnumbered Concept Index @printindex cp @node Function and Data Index @unnumbered Function and Data Index @printindex fn @summarycontents @contents @bye