diff options
author | Thomas Bushnell <thomas@gnu.org> | 1999-03-12 17:12:26 +0000 |
---|---|---|
committer | Thomas Bushnell <thomas@gnu.org> | 1999-03-12 17:12:26 +0000 |
commit | 5a265d3844e01b738fa9900a05bc3d747ddc297a (patch) | |
tree | c4592c30337a66b86d5bf6596b9c018952eda112 /doc/hurd.texi | |
parent | 0118670d78e521f63c78b3b7ec45fc58f8dd06d6 (diff) |
1998-06-02 Gordon Matzigkeit <gord@profitpress.com>
* Makefile: Add rules for building info, dvi and ps files.
* hurd.texi: Change the basic structure, and add a lot more
information.
Diffstat (limited to 'doc/hurd.texi')
-rw-r--r-- | doc/hurd.texi | 4845 |
1 files changed, 4239 insertions, 606 deletions
diff --git a/doc/hurd.texi b/doc/hurd.texi index 372f8a73..aa356b05 100644 --- a/doc/hurd.texi +++ b/doc/hurd.texi @@ -1,13 +1,24 @@ \input texinfo @c -*-texinfo-*- @setfilename hurd.info +@c Get the Hurd version we are documenting. +@include version.texi + +@c Unify all our little indices for now. +@defcodeindex sc +@syncodeindex sc cp +@syncodeindex fn cp +@syncodeindex vr cp +@syncodeindex tp cp +@syncodeindex pg cp + @dircategory Kernel @direntry -* Hurd: (hurd). The interfaces of the GNU Hurd. +* Hurd: (hurd). Using and programming the Hurd kernel servers. @end direntry @ifinfo -Copyright @copyright{} 1994 Free Software Foundation, Inc. +Copyright @copyright{} 1994-1998 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -30,16 +41,17 @@ Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions. @end ifinfo -@setchapternewpage odd -@settitle Hurd Interface Manual +@setchapternewpage none +@settitle Hurd Reference Manual @titlepage @finalout -@title The GNU Hurd Interface Manual -@author Michael I. Bushnell +@title The GNU Hurd Reference Manual +@author Thomas Bushnell +@author Gordon Matzigkeit @page @vskip 0pt plus 1filll -Copyright @copyright{} 1994 Free Software Foundation, Inc. +Copyright @copyright{} 1994--1998 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -54,70 +66,1472 @@ Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions. @end titlepage +@ifinfo + @node Top -@top Introduction +@top The GNU Hurd + +This file documents the GNU Hurd kernel component. This edition of the +documentation was last updated for version @value{VERSION} of the Hurd. + +@menu +* Introduction:: How to use this manual. +* Installing:: Setting up Hurd software on your computer. +* Bootstrap:: Turning a computer into a Hurd machine. +* Foundations:: Basic features used throughout the Hurd. +* Input and Output:: Reading and writing I/O channels. +* Files:: Regular file and directory nodes. +* Special Files:: Files with unusual Unix-compatible semantics. +* Stores:: Generalized units of storage. +* Stored Filesystems:: Filesystems for physical media. +* Twisted Filesystems:: Providing new hierarchies for existing data. +* Distributed Filesystems:: Sharing files between separate machines. +* Networking:: Interconnecting with other machines. +* Terminal Handling:: Helping people interact with the Hurd. +* Running Programs:: Program execution and process management. +* Authentication:: Verifying user and server privileges. +* Index:: Guide to concepts, functions, and files. + +@detailmenu + --- The Detailed Node Listing --- + +Introduction + +* Audience:: The people for whom this manual is written. +* Features:: Reasons to install and use the Hurd. +* Overview:: Basic architecture of the Hurd. +* History:: How the Hurd was born. +* Copying:: The Hurd is free software. + +Installing + +* Binary Distributions:: Obtaining ready-to-run GNU distributions. +* Cross-Compiling:: Building GNU from its source code. + +Bootstrap + +* Bootloader:: Starting the microkernel, or other OSes. +* Server Bootstrap:: Waking up the Hurd. +* Shutdown:: Letting the Hurd get some rest. + +Server Bootstrap + +* Invoking serverboot:: Starting a set of interdependent servers. +* Boot Scripts:: Describing server bootstrap relationships. +* Recursive Bootstrap:: Running a Hurd under another Hurd. + +Foundations + +* Threads Library:: Every Hurd server and library is multithreaded. +* Microkernel Object Library:: The Microkernel Object Model (MOM). +* Ports Library:: Managing server port receive rights. +* Integer Hash Library:: Integer-keyed hash tables. +* Misc Library:: Things that soon will be in the GNU C library. +* Bug Address Library:: Where to report Hurd bugs. + +Ports Library + +* Buckets and Classes:: Basic units of port organization. +* Port Rights:: Moving port rights to and from @code{libports}. +* Port Metadata:: Managing port-releated information. +* Port References:: Guarding against leaks and lossage. +* RPC Management:: Locking and interrupting RPC operations. + +Input and Output + +* Iohelp Library:: I/O authentication and lock management. +* Pager Library:: Implementing multithreaded external pagers. +* I/O Interface:: RPC-based input/output channels. + +Iohelp Library + +* I/O Users:: User authentication management. +* Conch Management:: Deprecated shared I/O implementation. + +Pager Library + +* Pager Management:: High-level interface to external pagers. +* Pager Callbacks:: Functions that the user must define. + +I/O Interface + +* I/O Object Ports:: How ports to I/O objects work. +* Simple Operations:: Read, write, and seek. +* Open Modes:: State bits that affect pieces of operation. +* Asynchronous I/O:: How to be notified when I/O is possible. +* Information Queries:: How to implement @code{io_stat} and + @code{io_server_version}. +* Mapped Data:: Getting memory objects referring to the + data of an I/O object. + +Files + +* Translators:: Extending the Hurd filesystem hierarchy. +* Trivfs Library:: Implementing single-file translators. +* Fshelp Library:: Miscellaneous generic filesystem routines. +* File Interface:: File ports implement the file interface. +* Filesystem Interface:: Translator control interface. + +Translators + +* Invoking settrans:: Declaring how a node should be translated. +* Invoking showtrans:: Displaying how nodes are translated. +* Invoking mount:: Unix-compatible active filesystem translators. +* Invoking fsysopts:: Modifying translation parameters at runtime. + +Trivfs Library + +* Trivfs Startup:: Writing a simple trivfs-based translator. +* Trivfs Callbacks:: Mandatory user-defined trivfs functions. +* Trivfs Options:: Optional user-defined trivfs functions. +* Trivfs Ports:: Managing control and protid ports. + +Fshelp Library + +* Passive Translator Linkage:: Invoking passive translators. +* Active Translator Linkage:: Managing active translators. +* Fshelp Locking:: Implementing file locking. +* Fshelp Permissions:: Standard file access permission policies. +* Fshelp Misc:: Useful standalone routines. + +File Interface + +* File Overview:: Basic concepts for the file interface. +* Changing Status:: Changing the owner (etc.) of a file. +* Program Execution:: Executing files. +* File Locking:: Implementing the @code{flock} call. +* File Frobbing:: Other active calls on files. +* Opening Files:: Looking up files in directories. +* Modifying Directories:: Creating and deleting nodes. +* Notifications:: File and directory change callbacks. +* File Translators:: How to set and get translators. + +Stores -This manual describes the interfaces that make up the GNU Hurd. It is -assumed that the reader is familiar with the features of the Mach -kernel, and with using the Hurd interfaces as a user, and all of the -associated C library calls. It concentrates on requirements and advice -for the writing of Hurd servers, as well as describing the libraries -that come with the GNU Hurd. +* Store Library:: An abstract interface to storage systems. -It is assumed that the reader of the manual is perusing the referenced -MiG interface definitions and library header files for the section being -examined. +Store Library + +* Store Arguments:: Parsing store command-line arguments. +* Store Management:: Creating and manipulating stores. +* Store I/O:: Reading and writing data to stores. +* Store Classes:: Ready-to-use storage backends. +* Store RPC Encoding:: Transferring store descriptors via RPC. + +Stored Filesystems + +* Repairing Filesystems:: Recovering from minor filesystem crashes. +* Linux Extended 2 FS:: The popular Linux filesystem format. +* BSD Unix FS:: The BSD Unix 4.x Fast File System. +* ISO-9660 CD-ROM FS:: Standard CD-ROM format. +* Diskfs Library:: Implementing new filesystem servers. + +Diskfs Library + +* Diskfs Startup:: Initializing stored filesystems. +* Diskfs Arguments:: Parsing command-line arguments. +* Diskfs Globals:: Global behaviour modification. +* Diskfs Node Management:: Allocation, reference counting, I/O, + caching, and other disk node routines. +* Diskfs Callbacks:: Mandatory user-defined diskfs functions. +* Diskfs Options:: Optional user-defined diskfs functions. +* Diskfs Internals:: Reimplementing small pieces of diskfs. + +Distributed Filesystems + +* File Transfer Protocol:: A distributed filesystem based on FTP. +* Network File System:: Sun's NFS: a lousy, but common filesystem. + +File Transfer Protocol + +* FTP Connection Library:: Managing remote FTP server connections. + +Networking + +* Socket Interface:: Network communication I/O protocol. + +Authentication + +* Auth Interface:: Auth ports implement the auth interface. + +Auth Interface + +* Auth Protocol:: Bidirectional authentication. + +@end detailmenu +@end menu + +@end ifinfo + + +@node Introduction +@chapter Introduction + +The GNU Hurd@footnote{The name @dfn{Hurd} stands for ``Hird of +Unix-Replacing Daemons.'' The name @dfn{Hird} stands for ``Hurd of +Interfaces Representing Depth.''} is the GNU Project's replacement for +the Unix kernel. The Hurd is a collection of servers that run on the +Mach microkernel to implement file systems, network protocols, file +access control, and other features that are normally implemented by the +Unix kernel or similar kernels (such as Linux). @menu -* I/O interface:: The interface for reading and writing - I/O channels -* File interface:: The interface for modifying file-specific - characteristics -* Filesystem interface:: Interfaces supported to control file-servers -* Socket interface:: Interfaces used for manipulating sockets - -* Ports library:: A library to manage port rights for servers -* Iohelp library:: A library to implement some common parts - of the I/O and shared I/O interfaces -* Fshelp library:: A library to implement some common parts - of the file interface -* Pager library:: A library to implement complex - multi-threaded pagers -* Diskfs library:: A library to do almost all the work of - implementing a disk-based filesystem -* Trivfs library:: A library to do the work of handling the - file protocol for directory-less - filesystems +* Audience:: The people for whom this manual is written. +* Features:: Reasons to install and use the Hurd. +* Overview:: Basic architecture of the Hurd. +* History:: How the Hurd was born. +* Copying:: The Hurd is free software. @end menu -@node I/O interface -@chapter I/O interface - -The I/O interface is used to interact with almost all servers in the GNU -Hurd. It provides facilities for reading and writing I/O streams. The -I/O interface facilities are described in <hurd/io.defs> and -<hurd/shared.h> The latter portion of <hurd/io.defs> and all of -<hurd/shared.h> describe how to implement shared-memory I/O operations, -and are described later. The present chapter discusses RPC-based I/O -operations. + +@node Audience +@section Audience + +This manual is designed to be useful to everybody who is interested in +using, administering, or programming the Hurd. + +If you are an end-user and you are looking for help on running the Hurd, +the first few chapters of this manual describes the essential parts of +installing, starting up, and shutting down a Hurd workstation. If you +need help with a specific program, the best way to use this manual is to +find it in the index and go directly to the appropriate section. You +may also wish to try running @kbd{@var{program} --help}, which will +display a brief usage message for @var{program} (@pxref{Foundations}). + +The rest of this manual is a technical discussion of the Hurd servers +and their implementation, and would not be helpful until you want to +learn how to modify the Hurd. + +This manual is organized according to subsystem, and each chapter begins +with descriptions of utilities and servers that are related to that +subsystem. If you are a system administrator, and you want to learn +more about, say, the Hurd networking subsystem, you can skip to the +networking chapter (@pxref{Networking}), and browse the related +utilities and servers. + +Programmers who are interested in learning how to modify Hurd servers or +write new ones should begin by learning about a microkernel to which the +Hurd has been ported (currently only GNU Mach) and reading +@ref{Foundations}. You should then familiarize yourself with a +subsystem that interests you by reading about existing servers and the +libraries they use. At that point, you should be able to study the +source code of existing Hurd servers and understand how they use the +Hurd libraries. + +The final level of mastery is learning the RPC@footnote{Remote Procedure +Call. If you needed to ask, then you've got your work cut out for you +before you'll be ready for Hurd programming.} interfaces which the Hurd +libraries implement. The last section of each chapter describes any +Hurd interfaces used in that subsystem. Those sections assume that you +are perusing the referenced interface definitions as you read. After +you have understood a given interface, you will be in a good position to +improve the Hurd libraries, design your own interfaces, and implement +new subsystems. + + +@node Features +@section Features + +The Hurd is not the most advanced operating system known to the planet +(yet), but it does have a number of enticing features: + +@table @asis +@item it's free software +Anybody can use, modify, and redistribute it under the terms of the GNU +General Public License (@pxref{Copying}). The Hurd is part of the GNU +system, which is a complete operating system licensed under the GPL. + +@item it's compatible +The Hurd provides a familiar programming and user environment. For all +intents and purposes, the Hurd is a modern Unix-like kernel. The Hurd +uses the GNU C Library, whose development closely tracks standards such +as ANSI/ISO, BSD, POSIX, Single Unix, SVID, and X/Open. + +@item it is built to survive +Unlike other popular kernel software, the Hurd has an object-oriented +structure that allows it to evolve without compromising its design. +This structure will help the Hurd undergo major redesign and +modifications without having to be entirely rewritten. + +@item it's scalable +The Hurd implementation is aggressively multithreaded so that it runs +efficiently on both single processors and symmetric multiprocessors. +The Hurd interfaces are designed to allow transparent network clusters +(@dfn{collectives}), although this feature has not yet been implemented. + +@item it's extensible +The Hurd is an attractive platform for learning how to become a kernel +hacker or for implementing new ideas in kernel technology. Every part +of the system is designed to be modified and extended. + +@item it's stable +It is possible to develop and test new Hurd kernel components without +rebooting the machine (not even accidentally). Running your own kernel +components doesn't interfere with other users, and so no special system +privileges are required. The mechanism for kernel extensions is secure +by design: it is impossible to impose your changes upon other users +unless they authorize them or you are the system administrator. + +@item it exists +The Hurd is real software that works Right Now. It is not a research +project or a proposal. You don't have to wait at all before you can +start using and developing it. +@end table + + +@node Overview +@section Overview + +FIXME: overview of basic Hurd architecture, FAQish in nature + + +@node History +@section History + +Richard Stallman (RMS) started GNU in 1983, as a project to create a +complete free operating system. In the text of the GNU Manifesto, he +mentioned that there is a primitive kernel. In the first GNUsletter, +Feb. 1986, he says that GNU's kernel is TRIX, which was developed at the +Massachusetts Institute of Technology. + +By December of 1986, the Free Software Foundation (FSF) had ``started +working on the changes needed to TRIX'' [Gnusletter, Jan. 1987]. +Shortly thereafter, the FSF began ``negotiating with Professor Rashid of +Carnegie-Mellon University about working with them on the development of +the Mach kernel'' [Gnusletter, June, 1987]. The text implies that the +FSF wanted to use someone else's work, rather than have to fix TRIX. + +In [Gnusletter, Feb. 1988], RMS was talking about taking Mach and +putting the Berkeley Sprite filesystem on top of it, ``after the parts +of Berkeley Unix@dots{} have been replaced.'' + +Six months later, the FSF is saying that ``if we can't get Mach, we'll +use TRIX or Berkeley's Sprite.'' Here, they present Sprite as a +full-kernel option, rather than just a filesystem. + +In January, 1990, they say ``we aren't doing any kernel work. It does +not make sense for us to start a kernel project now, when we still hope +to use Mach'' [Gnusletter, Jan. 1990]. Nothing significant occurs until +1991, when a more detailed plan is announced: + +@display +``We are still interested in a multi-process kernel running on top of +Mach. The CMU lawyers are currently deciding if they can release Mach +with distribution conditions that will enable us to distribute it. If +they decide to do so, then we will probably start work. CMU has +available under the same terms as Mach a single-server partial Unix +emulator named Poe; it is rather slow and provides minimal +functionality. We would probably begin by extending Poe to provide full +functionality. Later we hope to have a modular emulator divided into +multiple processes.'' [Gnusletter, Jan. 1991]. +@end display + +RMS explains the relationship between the Hurd and Linux in +@uref{http://www.gnu.org/software/hurd/hurd-and-linux.html}, where he +mentions that the FSF started developing the Hurd in 1990. As of +[Gnusletter, Nov. 1991], the Hurd (running on Mach) is GNU's official +kernel. + + +@node Copying +@section GNU General Public License + +@include gpl.texinfo + + +@node Installing +@chapter Installing + +Before you can use the Hurd on your favorite machine, you'll need to +install all of its software components. Currently, the Hurd only runs +on Intel i386-compatible architectures (such as the Pentium), using the +GNU Mach microkernel. + +If you have unsupported hardware or a different microkernel, you will +not be able to run the Hurd until all the required software has been +@dfn{ported} to your architecture. Porting is an involved process which +requires considerable programming skills, and is not recommended for the +faint-of-heart. If you have the talent and desire to do a port, contact +@email{bug-hurd@@gnu.org} in order to coordinate the effort. + +@menu +* Binary Distributions:: Obtaining ready-to-run GNU distributions. +* Cross-Compiling:: Building GNU from its source code. +@end menu + + +@node Binary Distributions +@section Binary Distributions + +By far the easiest and best way to install the Hurd is to obtain a GNU +binary distribution. Even if you plan on recompiling the Hurd itself, +it is best to start off with an already-working GNU system so that you +can avoid having to reboot every time you want to test a program. + +@ignore @c FIXME: update when binary CD-ROMS are available +You can order GNU on a CD-ROM from the Free Software Foundation. Orders +such as these help fund GNU software development. +@end ignore + +You can get GNU from a friend under the conditions allowed by the GNU +GPL (@pxref{Copying}). Please consider sending a donation to the Free +Software Foundation so that we can continue to improve GNU software. + +You can also FTP the complete GNU system from your closest GNU mirror, +or @uref{ftp://ftp.gnu.org/pub/gnu/}. The GNU binary distribution is +available in a subdirectory called @file{gnu-@var{n.m}}, where @var{n.m} +is the version of the Hurd that this GNU release corresponds to +(@value{VERSION} at the time of this writing). Again, please consider +donating to the Free Software Foundation. + +The format of the binary distribution is prone to change, so this manual +does not describe the details of how to install GNU. The @file{README} +file distributed with the binary distribution gives you complete +instructions. + +After you follow all the appropriate instructions, you will have a +working GNU/Hurd system. If you have used Linux-based GNU systems or +other Unix-like systems before, the Hurd will look quite familiar. You +should play with it for a while, referring to this manual only when you +want to learn more about the Hurd. Have fun! + +If the Hurd is your first introduction to the GNU operating system, then +you will need to learn more about GNU in order to be able to use it. +You should talk to friends who are familiar with GNU, in order to find +out about classes, online tutorials, or books which can help you learn +more about GNU. + +If you have no friends who are already using GNU, you can find some +useful starting points at the GNU web site, @uref{http://www.gnu.org/}. +You can also send e-mail to @email{help-hurd@@gnu.org}, to contact +fellow Hurd users. You can join this mailing list by sending a request +to @email{help-hurd-request@@gnu.org}. + + +@node Cross-Compiling +@section Cross-Compiling + +Another way to install the Hurd is to use an existing operating system +in order to compile all the required Hurd components from source code. +This is called @dfn{cross-compiling}, because it is done between two +different platforms. + +@emph{This process is not recommended unless you are porting the Hurd to +a new platform.} Cross-compiling the Hurd to a platform which already +has a binary distribution is a tremendous waste of time@dots{} it is +frequently necessary to repeat steps over and over again, and you are +not even guaranteed to get a working system. Please, obtain a GNU +binary distribution (@pxref{Binary Distributions}), and use your time to +do more useful things. If you are capable of cross-compiling, then you +are definitely skilled enough to make more useful (and creative) +modifications to the GNU system. + +To emphasize this point: downloading the entire GNU system over a 9600 +baud modem takes @emph{much less time} than cross-compilation, and +provides better results, too. + +If you are still sure that you would like to cross-compile the Hurd, you +should send e-mail to the @email{bug-hurd@@gnu.org} mailing list in +order to coordinate your efforts. People on that list will give you +advice on what to look out for, as well as helping you figure out a way +that your cross-compilation can benefit Hurd development. After that, +don your bug-resistent suit, and read the @file{INSTALL-cross} file, +which comes with the latest Hurd source code distribution. The +instructions in INSTALL-cross are usually out-of-date, but they contain +some useful hints buried amongst the errors. + + +@node Bootstrap +@chapter Bootstrap + +Bootstrapping@footnote{The term @dfn{bootstrapping} refers to a Dutch +legend about a boy who was able to fly by pulling himself up by his +bootstraps. In computers, this term refers to any process where a +simple system activates a more complicated system.} is the procedure by +which your machine loads the microkernel and transfers control to the +Hurd servers. + + +@menu +* Bootloader:: Starting the microkernel, or other OSes. +* Server Bootstrap:: Waking up the Hurd. +* Shutdown:: Letting the Hurd get some rest. +@end menu + +@node Bootloader +@section Bootloader + +The @dfn{bootloader} is the first software that runs on your machine. +Many hardware architectures have a very simple startup routine which +reads a very simple bootloader from the beginning of the internal hard +disk, then transfers control to it. Other architectures have startup +routines which are able to understand more of the contents of the hard +disk, and directly start a more advanced bootloader. + +@cindex GRUB +@cindex GRand Unified Bootloader +Currently, @dfn{GRUB}@footnote{The GRand Unified Bootloader, available +from @uref{http://www.uruk.org/grub/}.} is the preferred GNU bootloader. +GRUB provides advanced functionality, and is capable of loading several +different kernels (such as Linux, DOS, and the *BSD family). + +From the standpoint of the Hurd, the bootloader is just a mechanism to +get the microkernel running and transfer control to @code{serverboot}. +You will need to refer to your bootloader and microkernel documentation +for more information about the details of this process. + + +@node Server Bootstrap +@section Server Bootstrap +@pindex serverboot + +The @code{serverboot} program is responsible for loading and executing +the rest of the Hurd servers. Rather than containing specific +instructions for starting the Hurd, it follows general steps given in a +user-supplied boot script. + +To bootstrap the Hurd, the microkernel must start this program as its +first task, and to pass it appropriate arguments. @code{serverboot} may +also be invoked while the Hurd is already running, which allows users to +start their own complete sub-Hurds (@pxref{Recursive Bootstrap}). + +@menu +* Invoking serverboot:: Starting a set of interdependent servers. +* Boot Scripts:: Describing server bootstrap relationships. +* Recursive Bootstrap:: Running a Hurd under another Hurd. +@end menu + + +@node Invoking serverboot +@subsection Invoking @code{serverboot} + +The @code{serverboot} program has the following synopsis: + +@example +serverboot -@var{switch}... [[@var{host-port} @var{device-port}] @var{root-name}] +@end example + +@c FIXME: serverboot should accept --help and --version, for consistency +Each @var{switch} is a single character, out of the following set: + +@table @samp +@item a +Prompt the user for the @var{root-name}, even if it was already supplied +on the command line. + +@item d +Prompt the user to strike a key after the boot script has been read. + +@item q +Prompt the user for the name of the boot script. By default, use +@file{@var{root-name}:/boot/servers.boot}. +@end table + +All the @var{switches} are put into the @code{$@{boot-args@}} script +variable. + +@var{host-port} and @var{device-port} are integers which represent the +microkernel host and device ports, respectively (and are used to +initialize the @code{$@{host-port@}} and @code{$@{device-port@}} boot +script variables). If these ports are not specified, then +@code{serverboot} assumes that the Hurd is already running, and fetches +the current ports from the procserver (FIXME xref). + +@var{root-name} is the name of the microkernel device that should be +used as the Hurd bootstrap filesystem. @code{serverboot} uses this name +to locate the boot script (described above), and to initialize the +@code{$@{root-device@}} script variable. + + +@node Boot Scripts +@subsection Boot Scripts +@pindex /boot/servers.boot +@pindex servers.boot + +FIXME: finish + + +@node Recursive Bootstrap +@subsection Recursive Bootstrap + +The most appealing use of the @code{serverboot} program is to start a +set of core Hurd servers while another Hurd is already running. You +will rarely need to do this, and it requires superuser privileges, but +it is interesting to note that it can be done. + +Usually, you would make changes to only one server, and simply tell your +programs to use it in order to test out your changes. This process can +be applied even to the core servers. However, some changes have +far-reaching effects, and so it is nice to be able to test those effects +without having to reboot the machine. + +Here are the steps you can follow to test out a new set of servers: + +@enumerate 1 +@item +Create a new root partition. Usually, you would do this under your old +Hurd, and initialize it with your favorite filesystem format. + +@item +Copy the core servers, C library, and any of your modified programs onto +the new partition. + +@item +Use some clever shadowfs hacks (FIXME xref) to mirror the rest of your +programs under the modified partition. Copying them will work, too, if +you don't like shadowfs. + +@item +Create a boot script on the new partition, in @file{/boot/servers.boot}. + +@item +Run @kbd{serverboot -aqd @var{root-name}}, where @var{root-name} is the +microkernel name for your new root device. +@end enumerate + +Note that it is impossible to share microkernel devices between the two +running Hurds, so don't get any funny ideas. When you're finished +testing your new Hurd, then you can run the @code{halt} or @code{reboot} +programs to return control to the parent Hurd. + +If you're satisfied with your new Hurd, you can arrange for your +bootloader to start it, and reboot your machine. Then, you'll be in a +safe place to overwrite your old Hurd with the new one, and reboot back +to your old configuration (with the new Hurd servers). + + +@node Shutdown +@section Shutdown +@scindex halt +@scindex reboot + +FIXME: finish + + +@node Foundations +@chapter Foundations + +Every Hurd program accepts the following optional arguments: + +@table @samp +@item --help +Display a brief usage message, then exit. This message is not a +substitute for reading program documentation, rather it provides useful +reminders about specific command-line options that a program +understands. + +@item --version +Output program version information and exit. +@end table + +The rest of this chapter provides a programmer's introduction to the +Hurd. If you are not a programmer, then this chapter will not make much +sense to you@dots{} you should consider skipping to descriptions of +specific Hurd programs (@pxref{Audience}). + +The Hurd distribution includes many libraries in order to provide a +useful set of tools for writing Hurd utilities and servers. Several of +these libraries are useful not only for the Hurd, but also for writing +microkernel-based programs in general. These fundamental libraries are +not difficult to understand, and they are a good starting point, because +the rest of the Hurd relies upon them quite heavily. + +@menu +* Threads Library:: Every Hurd server and library is multithreaded. +* Microkernel Object Library:: The Microkernel Object Model (MOM). +* Ports Library:: Managing server port receive rights. +* Integer Hash Library:: Integer-keyed hash tables. +* Misc Library:: Things that soon will be in the GNU C library. +* Bug Address Library:: Where to report Hurd bugs. +@end menu + +@node Threads Library +@section Threads Library +@scindex libthreads +@scindex cthreads.h + +All Hurd servers and libraries are aggressively multithreaded in order +to take full advantage of any multiprocessing capabilities provided by +the microkernel and the underlying hardware. The Hurd threads library, +@code{libthreads} contains the default Hurd thread implementation, which +is declared in @code{<cthreads.h>}. + +Currently (April 1998), the Hurd uses cthreads, which have already been +documented thoroughly by CMU. Eventually, it will be migrated to use +POSIX pthreads, which are documented in a lot of places. +@c Thomas, 26-03-1998 + +Every single library in the Hurd distribution (including the GNU C +library) is completely thread-safe, and the Hurd servers themselves are +aggressively multithreaded. + + +@node Microkernel Object Library +@section Microkernel Object Library +@scindex libmom +@scindex mom.h + +A commonly asked question is whether the Hurd has been ported to the +Open Group's version of the Mach microkernel. The answer is ``no''. + +Currently (April 1998), the Hurd is quite dependent on the GNU Mach +microkernel, which is a derivative of the University of Utah's Mach 4. +However, the Hurd developers are all-too-aware of the limitations of +Mach. + +@cindex MOM +@cindex Microkernel Object Model +@code{libmom} is the first of several steps that need to be taken in +order to make the Hurd portable to other message-passing microkernels. +@dfn{MOM} stands for @dfn{Microkernel Object Model}, and is an +abstraction of the basic services provided by common message-passing +microkernels. It will provide the necessary insulation so that Hurd +servers and the C library can avoid making microkernel-dependent kernel +calls. + +At the present, though, @code{libmom} is still evolving, and will take +some time to be fully incorporated into the Hurd. + + +@node Ports Library +@section Ports Library +@scindex libports +@scindex ports.h + +Ports are communication channels that are held by the kernel. + +A port has separate send rights and receive rights, which may be +transferred from task to task via the kernel. Port rights are similar +to Unix file descriptors: they are per-task integers which are used to +identify ports when making kernel calls. Send rights are required in +order to send an RPC request down a port, and receive rights are +required to serve the RPC request. Receive rights may be aggregated +into a single @dfn{portset}, which serve as useful organizational units. + +In a single-threaded RPC client, managing and categorizing ports is not +a difficult process. However, in a complex multithreaded server, it is +useful to have a more abstract interface to managing portsets, as well +as maintaining server metadata. + +The Hurd ports library, @code{libports}, fills that need. The +@code{libports} functions are declared in @code{<hurd/ports.h>}. + +@menu +* Buckets and Classes:: Basic units of port organization. +* Port Rights:: Moving port rights to and from @code{libports}. +* Port Metadata:: Managing port-releated information. +* Port References:: Guarding against leaks and lossage. +* RPC Management:: Locking and interrupting RPC operations. +@end menu + +@node Buckets and Classes +@subsection Buckets and Classes + +The @code{libports} @dfn{bucket} is simply a port set, with some +metadata and a lock. All of the @code{libports} functions operate on +buckets. + +@deftypefun {struct port_bucket *} ports_create_bucket (void) +Create and return a new, empty bucket. +@end deftypefun + +A port @dfn{class} is a collection of individual ports, which can be +manipulated conveniently, and have enforced deallocation routines. +Buckets and classes are entirely orthogonal: there is no requirement +that all the ports in a class be in the same bucket, nor is there a +requirement that all the ports in a bucket be in the same class. + +@deftypefun {struct port_class} ports_create_class (@w{void (*@var{clean_routine}) (void *@var{port})}, @w{void (*@var{dropweak_routine}) (void *@var{port})}) +Create and return a new port class. If nonzero, @var{clean_routine} +will be called for each allocated port object in this class when it is +being destroyed. If nonzero, @var{dropweak_routine} will be called to +request weak references to be dropped. (If @var{dropweak_routine} is +null, then weak references and hard references will be identical for +ports of this class.) +@end deftypefun + +Once you have created at least one bucket and class, you may create new +ports, and store them in those buckets. There are a few different +functions for port creation, depending on your application's +requirements: + +@deftypefun error_t ports_create_port (@w{struct port_class *@var{class}}, @w{struct port_bucket *@var{bucket}}, @w{size_t @var{size}}, @w{void *@var{result}}) +Create and return in @var{result} a new port in @var{class} and +@var{bucket}; @var{size} bytes will be allocated to hold the port +structure and whatever private data the user desires. +@end deftypefun + +@deftypefun error_t ports_create_port_noinstall (@w{struct port_class *@var{class}}, @w{struct port_bucket *@var{bucket}}, @w{size_t @var{size}}, @w{void *@var{result}}) +Just like @code{ports_create_port}, except don't actually put the port +into the portset underlying @var{bucket}. This is intended to be used +for cases where the port right must be given out before the port is +fully initialized; with this call you are guaranteed that no RPC service +will occur on the port until you have finished initializing it and +installed it into the portset yourself. +@end deftypefun + +@deftypefun error_t ports_import_port (@w{struct port_class *@var{class}}, @w{struct port_bucket *@var{bucket}}, @w{mach_port_t @var{port}}, @w{size_t @var{size}}, @w{void *@var{result}}) +For an existing @emph{receive} right, create and return in @var{result} +a new port structure; @var{bucket}, @var{size}, and @var{class} args are +as for @code{ports_create_port}. +@end deftypefun + + +@node Port Rights +@subsection Port Rights + +The following functions move port receive rights to and from the port +structure: + +@deftypefun void ports_reallocate_port (@w{void *@var{port}}) +Destroy the receive right currently associated with @var{port} and +allocate a new one. +@end deftypefun + +@deftypefun void ports_reallocate_from_external (@w{void *@var{port}}, @w{mach_port_t @var{receive}}) +Destroy the receive right currently associated with @var{port} and +designate @var{receive} as the new one. +@end deftypefun + +@deftypefun void ports_destroy_right (@w{void *@var{port}}) +Destroy the receive right currently associated with @var{port}. After +this call, @code{ports_reallocate_port} and +@code{ports_reallocate_from_external} may not be used. +@end deftypefun + +@deftypefun mach_port_t ports_claim_right (@w{void *@var{port}}) +Return the receive right currently associated with @var{port}. The +effects on @var{port} are the same as in @code{ports_destroy_right}, +except that the receive right itself is not affected. Note that in +multi-threaded servers, messages might already have been dequeued for +this port before it gets removed from the portset; such messages will +get @code{EOPNOTSUPP} errors. +@end deftypefun + +@deftypefun error_t ports_transfer_right (@w{void *@var{topt}}, @w{void *@var{frompt}}) +Transfer the receive right from @var{frompt} to @var{topt}. +@var{frompt} ends up with a destroyed right (as if +@code{ports_destroy_right} were called) and @var{topt}'s old right is +destroyed (as if @code{ports_reallocate_from_external} were called. +@end deftypefun + +@deftypefun mach_port_t ports_get_right (@w{void *@var{port}}) +Return the name of the receive right associated with @var{port}. The +user is responsible for creating an ordinary send right from this name. +@end deftypefun + + +@node Port Metadata +@subsection Port Metadata + +It is important to point out that the @var{port} argument to each of +the @code{libports} functions is a @code{void *} and not a @code{struct +port_info *}. This is done so that you may add arbitrary +meta-information to your @code{libports}-managed ports. Simply define +your own structure whose first element is a @code{struct port_info}, and +then you can use pointers to these structures as the @var{port} argument +to any @code{libports} function. + +The following functions are useful for maintaining metadata that is +stored in your own custom ports structure: + +@deftypefun {void *} ports_lookup_port (@w{struct port_bucket *@var{bucket}}, @w{mach_port_t @var{port}}, @w{struct port_class *@var{class}}) +Look up @var{port} and return the associated port structure, allocating +a reference. If the call fails, return zero. If @var{bucket} is nonzero, +then it specifies a bucket to search; otherwise all buckets will be +searched. If @var{class} is nonzero, then the lookup will fail if +@var{port} is not in @var{class}. +@end deftypefun + +@deftypefun error_t ports_bucket_iterate (@w{struct port_bucket *@var{bucket}}, @w{error_t (*@var{fun}) (void *@var{port})}) +Call @var{fun} once for each port in @var{bucket}. +@end deftypefun + + +@node Port References +@subsection Port References + +These functions maintain references to ports so that the port +information structures may be freed if and only if they are no longer +needed. It is your responsibility to tell @code{libports} when +references to ports change. + +@deftypefun void ports_port_ref (@w{void *@var{port}}) +Allocate a hard reference to @var{port}. +@end deftypefun + +@deftypefun void ports_port_deref (@w{void *@var{port}}) +Drop a hard reference to @var{port}. +@end deftypefun + +@deftypefun void ports_no_senders (@w{void *@var{port}}, @w{mach_port_mscount_t @var{mscount}}) +The user is responsible for listening for no senders notifications; when +one arrives, call this routine for the @var{port} the message was sent +to, providing the @var{mscount} from the notification. +@end deftypefun + +@deftypefun int ports_count_class (@w{struct port_class *@var{class}}) +Block creation of new ports in @var{class}. Return the number of ports +currently in @var{class}. +@end deftypefun + +@deftypefun int ports_count_bucket (@w{struct port_bucket *@var{bucket}}) +Block creation of new ports in @var{bucket}. Return the number of ports +currently in @var{bucket}. +@end deftypefun + +@deftypefun void ports_enable_class (@w{struct port_class *@var{class}}) +Permit suspended port creation (blocked by @code{ports_count_class}) to +continue. +@end deftypefun + +@deftypefun void ports_enable_bucket (@w{struct port_bucket *@var{bucket}}) +Permit suspended port creation (blocked by @code{ports_count_bucket}) to +continue. +@end deftypefun + +Weak references are not often used, as they are the same as hard +references for port classes where @var{dropweak_routine} is null. +@xref{Buckets and Classes}. + +@deftypefun void ports_port_ref_weak (@w{void *@var{port}}) +Allocate a weak reference to @var{port}. +@end deftypefun + +@deftypefun void ports_port_deref_weak (@w{void *@var{port}}) +Drop a weak reference to @var{port}. +@end deftypefun + + +@node RPC Management +@subsection RPC Management + +The rest of the @code{libports} functions are dedicated to controlling +RPC operations. These functions help you do all the locking and thread +cancellations that are required in order to build robust servers. + +@deftypefn {Typedef} {typedef int (*} ports_demuxer_type ) (@w{mach_msg_header_t *@var{inp}}, @w{mach_msg_header_t *@var{outp}}) +Type of MiG demuxer routines. +@end deftypefn + +@deftypefun error_t ports_begin_rpc (@w{void *@var{port}}, @w{mach_msg_id_t @var{msg_id}}, @w{struct rpc_info *@var{info}}) +Call this when an RPC is beginning on @var{port}. @var{info} should be +allocated by the caller and will be used to hold dynamic state. If this +RPC should be abandoned, return @code{EDIED}; otherwise we return zero. +@end deftypefun + +@deftypefun void ports_end_rpc (@w{void *@var{port}}, @w{struct rpc_info *@var{info}}) +Call this when an RPC is concluding. The arguments must match the ones +passed to the paired call to @code{ports_begin_rpc}. +@end deftypefun + +@deftypefun void ports_manage_port_operations_one_thread (@w{struct port_bucket *@var{bucket}}, @w{ports_demuxer_type @var{demuxer}}, @w{int @var{timeout}}) +Begin handling operations for the ports in @var{bucket}, calling +@var{demuxer} for each incoming message. Return if @var{timeout} is +nonzero and no messages have been received for @var{timeout} +milliseconds. Use only one thread (the calling thread). +@end deftypefun + +@deftypefun void ports_manage_port_operations_multithread (@w{struct port_bucket *@var{bucket}}, @w{ports_demuxer_type @var{demuxer}}, @w{int @var{thread_timeout}}, @w{int @var{global_timeout}}, @w{void (*@var{hook}) (void)}) +Begin handling operations for the ports in @var{bucket}, calling +@var{demuxer} for each incoming message. Return if @var{global_timeout} +is nonzero and no messages have been receieved for @var{global_timeout} +milliseconds. Create threads as necessary to handle incoming messages +so that no port is starved because of sluggishness on another port. If +@var{thread_timeout} is nonzero, then individual threads will die off +if they handle no incoming messages for @var{local_timeout} +milliseconds. If non-null, @var{hook} will be called in each new thread +immediately after it is created. +@end deftypefun + +@deftypefun error_t ports_inhibit_port_rpcs (@w{void *@var{port}}) +Interrupt any pending RPC on @var{port}. Wait for all pending RPCs to +finish, and then block any new RPCs starting on that port. +@end deftypefun + +@deftypefun error_t ports_inhibit_class_rpcs (@w{struct port_class *@var{class}}) +Similar to @code{ports_inhibit_port_rpcs}, but affects all ports in +@var{class}. +@end deftypefun + +@deftypefun error_t ports_inhibit_bucket_rpcs (@w{struct port_bucket *@var{bucket}}) +Similar to @code{ports_inhibit_port_rpcs}, but affects all ports in +@var{bucket}. +@end deftypefun + +@deftypefun error_t ports_inhibit_all_rpcs (void) +Similar to @code{ports_inhibit_port_rpcs}, but affects all ports +whatsoever. +@end deftypefun + +@deftypefun void ports_resume_port_rpcs (@w{void *@var{port}}) +Reverse the effect of a previous @code{ports_inhibit_port_rpcs} for this +@var{port}, allowing blocked RPCs to continue. +@end deftypefun + +@deftypefun void ports_resume_class_rpcs (@w{struct port_class *@var{class}}) +Reverse the effect of a previous @code{ports_inhibit_class_rpcs} for +@var{class}. +@end deftypefun + +@deftypefun void ports_resume_bucket_rpcs (@w{struct port_bucket *@var{bucket}}) +Reverse the effect of a previous @code{ports_inhibit_bucket_rpcs} for +@var{bucket}. +@end deftypefun + +@deftypefun void ports_resume_all_rpcs (void) +Reverse the effect of a previous @code{ports_inhibit_all_rpcs}. +@end deftypefun + +@deftypefun void ports_interrupt_rpcs (@w{void *@var{port}}) +Cancel (with @code{thread_cancel}) any RPCs in progress on @var{port}. +@end deftypefun + +@deftypefun int ports_self_interrupted (void) +If the current thread's RPC has been interrupted with +@code{ports_interrupt_rpcs}, return nonzero and clear the interrupted +flag. +@end deftypefun + +@deftypefun error_t ports_interrupt_rpc_on_notification (@w{void *@var{object}}, @w{struct rpc_info *@var{rpc}}, @w{mach_port_t @var{port}}, @w{mach_msg_id_t @var{what}}) +Arrange for @code{hurd_cancel} to be called on @var{rpc}'s thread if +@var{object} gets notified that any of the things in @var{what} have +happened to @var{port}. @var{rpc} should be an RPC on @var{object}. +@end deftypefun + +@deftypefun error_t ports_interrupt_self_on_notification (@w{void *@var{object}}, @w{mach_port_t @var{port}}, @w{mach_msg_id_t @var{what}}) +Arrange for @code{hurd_cancel} to be called on the current thread, which +should be an RPC on @var{object}, if @var{port} gets notified with the +condition @var{what}. +@end deftypefun + +@deftypefun error_t ports_interrupt_self_on_port_death (@w{void *@var{object}}, @w{mach_port_t @var{port}}) +Same as calling @code{ports_interrupt_self_on_notification} with +@var{what} set to @code{MACH_NOTIFY_DEAD_NAME}. +@end deftypefun + +@deftypefun void ports_interrupt_notified_rpcs (@w{void *@var{object}}, @w{mach_port_t @var{port}}, @w{mach_msg_id_t @var{what}}) +Interrupt any RPCs on @var{object} that have requested such. +@end deftypefun + +@deftypefun void ports_dead_name (@w{void *@var{object}}, @w{mach_port_t @var{port}}) +Same as calling @code{ports_interrupt_notified_rpcs} with @var{what} set +to @code{MACH_NOTIFY_DEAD_NAME}. +@end deftypefun + + +@node Integer Hash Library +@section Integer Hash Library +@scindex libihash +@scindex ihash.h + +@code{libihash} provides integer-keyed hash tables, for arbitrary +element data types. This kind of hash tables are frequently used when +implementing sparse arrays or buffer caches. + +The following functions are declared in @code{<hurd/ihash.h>}: + +@deftypefun error_t ihash_create (@w{ihash_t *@var{ht}}) +Create an integer hash table and return it in @var{ht}. If a memory +allocation error occurs, @code{ENOMEM} is returned, otherwise zero. +@end deftypefun + +@deftypefun void ihash_free (@w{ihash_t @var{ht}}) +Free @var{ht} and all resources it consumes. +@end deftypefun + +@deftypefun void ihash_set_cleanup (@w{ihash_t @var{ht}}, @w{void (*@var{cleanup}) (void *@var{value}, void *@var{arg})}, @w{void *@var{arg}}) +Sets @var{ht}'s element cleanup function to @var{cleanup}, and its +second argument to @var{arg}. @var{cleanup} will be called on every +element @var{value} to be subsequently overwritten or deleted, with +@var{arg} as the second argument. +@end deftypefun + +@deftypefun error_t ihash_add (@w{ihash_t @var{ht}}, @w{int @var{id}}, @w{void *@var{item}}, @w{void ***@var{locp}}) +Add @var{item} to the hash table @var{ht} under the integer key +@var{id}. @var{locp} is the address of a pointer located in @var{item}; +If non-null, @var{locp} should point to a variable of type @code{void +**}, and will be filled with a pointer that may be used as an argument +to @code{ihash_locp_remove}. The variable pointed to by @var{locp} may +be overwritten sometime between this call and when the element is +deleted, so you cannot stash its value elsewhere and hope to use the +stashed value with @code{ihash_locp_remove}. If a memory allocation +error occurs, @code{ENOMEM} is returned, otherwise zero. +@end deftypefun + +@deftypefun {void *} ihash_find (@w{ihash_t @var{ht}}, @w{int @var{id}}) +Find and return the item in hash table @var{ht} with key @var{id}. +Returns null the specified item doesn't exist. +@end deftypefun + +@deftypefun error_t ihash_iterate (@w{ihash_t @var{ht}}, @w{error_t (*@var{fun}) (void *@var{value})}) +Call function @var{fun} on every element of @var{ht}. @var{fun}'s only +arg, @var{value}, is a pointer to the value stored in the hash table. If +@var{fun} ever returns nonzero, then iteration stops and +@code{ihash_iterate} returns that value, otherwise it (eventually) +returns 0. +@end deftypefun + +@deftypefun int ihash_remove (@w{ihash_t @var{ht}}, @w{int @var{id}}) +Remove the entry with a key of @var{id} from @var{ht}. If there was no +such element, then return zero, otherwise nonzero. +@end deftypefun + +@deftypefun void ihash_locp_remove (@w{ihash_t @var{ht}}, @w{void **@var{ht_locp}}) +Remove the entry at @var{locp} from the hashtable @var{ht}. @var{locp} +is as returned from an earlier call to @code{ihash_add}. This call +should be faster than @code{ihash_remove}. @var{ht} can be null, in +which case the call still succeeds, but no cleanup is done. +@end deftypefun + + +@node Misc Library +@section Misc Library +@scindex libshouldbeinlibc + +The GNU C library is constantly developing to meet the needs of the +Hurd. However, because the C library needs to be very stable, it is +irresponsible to add new functions to it without carefully specifying +their interface, and testing them thoroughly. + +The Hurd distribution includes a library called +@code{libshouldbeinlibc}, which serves as a proving ground for additions +to the GNU C library. This library is in flux, as some functions are +added to it by the Hurd developers and others are moved to the official +C library. + +These functions aren't currently documented (other than in their header +files), but complete documentation will be added to +@iftex +@emph{The GNU C Library Reference Manual} +@end iftex +@ifinfo +@ref{Top, The GNU C Library Reference Manual,, libc}, +@end ifinfo +when these functions become part of the GNU C library. + + +@node Bug Address Library +@section Bug Address Library +@scindex libhurdbugaddr + +@code{libhurdbugaddr} exists only to define a single variable: + +@deftypevar {char *} argp_program_bug_address +@code{argp_program_bug_address} is the default Hurd bug-reporting e-mail +address, @email{bug-hurd@@gnu.org}. This address is displayed to the +user when any of the standard Hurd servers and utilities are invoked +using the @samp{--help} option. +@end deftypevar + + +@node Input and Output +@chapter Input and Output + +There are no specific programs or servers associated with the I/O +subsystem, since it is used to interact with almost all servers in the +GNU Hurd. It provides facilities for reading and writing I/O channels, +which are the underlying implementation of file and socket descriptors +in the GNU C library. + +@menu +* Iohelp Library:: I/O authentication and lock management. +* Pager Library:: Implementing multithreaded external pagers. +* I/O Interface:: RPC-based input/output channels. +@end menu + +@node Iohelp Library +@section Iohelp Library +@scindex libiohelp +@scindex iohelp.h + +The @code{<hurd/iohelp.h>} file declares several functions which are +useful for low-level I/O implementations. Most Hurd servers do not call +these functions directly, but they are used by several of the Hurd +filesystem and networking helper libraries. @code{libiohelp} requires +@code{libthreads}. + +@menu +* I/O Users:: User authentication management. +* Conch Management:: Deprecated shared I/O implementation. +@end menu + +@node I/O Users +@subsection I/O Users + +Most I/O servers need to implement some kind of user authentication +checking. In order to facilitate that process, @code{libiohelp} has +some functions which encapsulate a set of idvecs (FIXME: xref to C +library) in a single @code{struct iouser}. + +@deftypefun {struct iouser *} iohelp_create_iouser (@w{struct idvec *@var{uids}}, @w{struct idvec *@var{gids}}) +Create a new @var{iouser} for the specified @var{uids} and @var{gids}. +@end deftypefun + +@deftypefun {struct iouser *} iohelp_dup_iouser (@w{struct iouser *@var{iouser}}) +Return a copy of @var{iouser}. +@end deftypefun + +@deftypefun void iohelp_free_iouser (@w{struct iouser *@var{iouser}}) +Release a reference to @var{iouser}. +@end deftypefun + +I/O reauthentication is a rather complex protocol involving the +authserver as a trusted third party (@pxref{Auth Protocol}). In order +to reduce the risk of flawed implementations, I/O reauthentication is +encapsulated in the @code{iohelp_reauth} function: + +@deftypefun {struct iouser *} iohelp_reauth (@w{auth_t @var{authserver}}, @w{mach_port_t @var{rend_port}}, @w{mach_port_t @var{newright}}, @w{int @var{permit_failure}}) +Conduct a reauthentication transaction, and return a new @var{iouser}. +@var{authserver} is the I/O server's auth port. The rendezvous port +provided by the user is @var{rend_port}. + +If the transaction cannot be completed, return zero, unless +@var{permit_failure} is nonzero. If @var{permit_failure} is nonzero, +then should the transaction fail, return an @var{iouser} that has no +ids. The new port to be sent to the user is @var{newright}. +@end deftypefun + + +@node Conch Management +@subsection Conch Management + +@cindex conch +@findex iohelp_initialize_conch +@findex iohelp_handle_io_get_conch +@findex iohelp_get_conch +@findex iohelp_handle_io_release_conch +@findex iohelp_verify_user_conch +@findex iohelp_fetch_shared_data +@findex iohelp_put_shared_data +The @dfn{conch} is at the heart of the shared memory I/O system. +Several Hurd libraries implement shared I/O, and so @code{libiohelp} +contains functions to facilitate conch management. + +Everything about shared I/O is undocumented because it is not needed for +adequate performance, and the RPC interface is simpler (@pxref{I/O +Interface}). It is not useful for new libraries or servers to implement +shared I/O. + + +@node Pager Library +@section Pager Library +@scindex libpager +@scindex pager.h + +@cindex XP (external pager) +@cindex external pager (XP) +The @dfn{external pager} (@dfn{XP}) microkernel interface allows +applications to provide the backing store for a memory object, by +converting hardware page faults into RPC requests. External pagers are +required for memory-mapped I/O (@pxref{Mapped Data}) and stored +filesystems (@pxref{Stored Filesystems}). + +The external pager interface is quite complex, so the Hurd pager library +contains functions which aid in creating multithreaded external pagers. +@code{libpager} is declared in @code{<hurd/pager.h>}, and requires only +the threads and ports libraries. @menu -* I/O object ports:: How ports to I/O objects work -* Simple operations:: Read, write, and seek -* Open modes:: State bits that affect pieces of operation -* Asynchronous I/O:: How to get notified when I/O is possible -* Information queries:: How to implement io_stat and io_server_version -* Mapped data:: Getting memory objects referring to the - data of an I/O object +* Pager Management:: High-level interface to external pagers. +* Pager Callbacks:: Functions that the user must define. @end menu -@node I/O object ports -@section I/O object ports + +@node Pager Management +@subsection Pager Management + +The pager library defines the @code{struct pager} data type in order to +represent a multi-threaded pager. The general procedure for creating a +pager is to define the functions listed in @ref{Pager Callbacks}, +allocate a @code{libports} bucket for the ports which will access the +pager, and create at least one new @code{struct pager} with +@code{pager_create}. + +@deftypefun {struct pager *} pager_create (@w{struct user_pager_info *@var{u_pager}}, @w{struct port_bucket *@var{bucket}}, @w{boolean_t @var{may_cache}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}) +Create a new pager. The pager will have a port created for it (using +@code{libports}, in @var{bucket}) and will be immediately ready to +receive requests. @var{u_pager} will be provided to later calls to +@code{pager_find_address}. The pager will have one user reference +created. @var{may_cache} and @var{copy_strategy} are the original +values of those attributes as for @code{memory_object_ready}. Users may +create references to pagers by use of the relevant ports library +functions. On errors, return null and set @code{errno}. +@end deftypefun + +Once you are ready to turn over control to the pager library, you should +call @code{ports_manage_port_operations_multithread} on the +@var{bucket}, using @code{pager_demuxer} as the ports @var{demuxer}. +This will handle all external pager RPCs, invoking your pager callbacks +when necessary. + +@deftypefun int pager_demuxer (@w{mach_msg_header_t *@var{inp}}, @w{mach_msg_header_t *@var{outp}}) +Demultiplex incoming @code{libports} messages on pager ports. +@end deftypefun + +The following functions are the body of the pager library, and provide a +clean interface to pager functionality: + +@deftypefun void pager_sync (@w{struct pager *@var{pager}}, @w{int @var{wait}}) +@deftypefunx void pager_sync_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) +Write data from pager @var{pager} to its backing store. Wait for all +the writes to complete if and only if @var{wait} is set. + +@code{pager_sync} writes all data; @code{pager_sync_some} only writes +data starting at @var{start}, for @var{len} bytes. +@end deftypefun + +@deftypefun void pager_flush (@w{struct pager *@var{pager}}, @w{int @var{wait}}) +@deftypefunx void pager_flush_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) +Flush data from the kernel for pager @var{pager} and force any pending +delayed copies. Wait for all pages to be flushed if and only if +@var{wait} is set. + +@code{pager_flush} flushes all data; @code{pager_flush_some} only +flushes data starting at @var{start}, for @var{len} bytes. +@end deftypefun + +@deftypefun void pager_return (@w{struct pager *@var{pager}}, @w{int @var{wait}}) +@deftypefunx void pager_return_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) +Flush data from the kernel for pager @var{pager} and force any pending +delayed copies. Wait for all pages to be flushed if and only if +@var{wait} is set. Have the kernel write back modifications. + +@code{pager_return} flushes and restores all data; +@code{pager_return_some} only flushes and restores data starting at +@var{start}, for @var{len} bytes. +@end deftypefun + +@deftypefun void pager_offer_page (@w{struct pager *@var{pager}}, @w{int @var{precious}}, @w{int @var{writelock}}, @w{vm_offset_t @var{page}}, @w{vm_address_t @var{buf}}) +Offer a page of data to the kernel. If @var{precious} is set, then this +page will be paged out at some future point, otherwise it might be +dropped by the kernel. If the page is currently in core, the kernel +might ignore this call. +@end deftypefun + +@deftypefun void pager_change_attributes (@w{struct pager *@var{pager}}, @w{boolean_t @var{may_cache}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}, @w{int @var{wait}}) +Change the attributes of the memory object underlying pager @var{pager}. +The @var{may_cache} and @var{copy_strategy} arguments are as for +@code{memory_object_change_atributes}. Wait for the kernel to report +completion if and only if @var{wait} is set. +@end deftypefun + +@deftypefun void pager_shutdown (@w{struct pager *@var{pager}}) +Force termination of a pager. After this returns, no more paging +requests on the pager will be honoured, and the pager will be +deallocated. The actual deallocation might occur asynchronously if +there are currently outstanding paging requests that will complete +first. +@end deftypefun + +@deftypefun error_t pager_get_error (@w{struct pager *@var{p}}, @w{vm_address_t @var{addr}}) +Return the error code of the last page error for pager @var{p} at +address @var{addr}.@footnote{Note that this function will be deleted +when the Mach pager interface is fixed to provide this information.} +@end deftypefun + +@deftypefun error_t pager_memcpy (@w{struct pager *@var{pager}}, @w{memory_object_t @var{memobj}}, @w{vm_offset_t @var{offset}}, @w{void *@var{other}}, @w{size_t *@var{size}}, @w{vm_prot_t @var{prot}}) +Try to copy @code{*@var{size}} bytes between the region @var{other} +points to and the region at @var{offset} in the pager indicated by +@var{pager} and @var{memobj}. If @var{prot} is @code{VM_PROT_READ}, +copying is from the pager to @var{other}; if @var{prot} contains +@code{VM_PROT_WRITE}, copying is from @var{other} into the pager. +@code{*@var{size}} is always filled in the actual number of bytes +successfully copied. Returns an error code if the pager-backed memory +faults; if there is no fault, returns zero and @code{*@var{size}} will +be unchanged. +@end deftypefun + +These functions allow you to recover the internal @code{struct pager} +state, in case the @code{libpager} interface doesn't provide an +operation you need: + +@deftypefun {struct user_pager_info *} pager_get_upi (@w{struct pager *@var{p}}) +Return the @code{struct user_pager_info} associated with a pager. +@end deftypefun + +@deftypefun mach_port_t pager_get_port (@w{struct pager *@var{pager}}) +Return the port (receive right) for requests to the pager. It is +absolutely necessary that a new send right be created from this receive +right. +@end deftypefun + + +@node Pager Callbacks +@subsection Pager Callbacks + +Like several other Hurd libraries, @code{libpager} depends on you to +implement application-specific callback functions. You @emph{must} +define the following functions: + +@deftypefun error_t pager_read_page (@w{struct user_pager_info *@var{pager}}, @w{vm_offset_t @var{page}}, @w{vm_address_t *@var{buf}}, @w{int *@var{write_lock}}) +For pager @var{pager}, read one page from offset @var{page}. Set +@code{*@var{buf}} to be the address of the page, and set +@code{*@var{write_lock}} if the page must be provided read-only. The +only permissable error returns are @code{EIO}, @code{EDQUOT}, and +@code{ENOSPC}. +@end deftypefun + +@deftypefun error_t pager_write_page (@w{struct user_pager_info *@var{pager}}, @w{vm_offset_t @var{page}}, @w{vm_address_t @var{buf}}) +For pager @var{pager}, synchronously write one page from @var{buf} to +offset @var{page}. In addition, @code{vm_deallocate} (or equivalent) +@var{buf}. The only permissable error returns are @code{EIO}, +@code{EDQUOT}, and @code{ENOSPC}. +@end deftypefun + +@deftypefun error_t pager_unlock_page (@w{struct user_pager_info *@var{pager}}, @w{vm_offset_t @var{address}}) +A page should be made writable. +@end deftypefun + +@deftypefun error_t pager_report_extent (@w{struct user_pager_info *@var{pager}}, @w{vm_address_t *@var{offset}}, @w{vm_size_t *@var{size}}) +This function should report in @code{*@var{offset}} and +@code{*@var{size}} the minimum valid address the pager will accept and +the size of the object. +@end deftypefun + +@deftypefun void pager_clear_user_data (@w{struct user_pager_info *@var{pager}}) +This is called when a pager is being deallocated after all extant send +rights have been destroyed. +@end deftypefun + +@deftypefun void pager_dropweak (@w{struct user_pager_info *@var{p}}) +This will be called when the ports library wants to drop weak +references. The pager library creates no weak references itself, so if +the user doesn't either, then it is alright for this function to do +nothing. +@end deftypefun + + +@node I/O Interface +@section I/O Interface +@scindex io.defs + +The I/O interface facilities are described in @code{<hurd/io.defs>}. +This section discusses only RPC-based I/O operations.@footnote{The +latter portion of @code{<hurd/io.defs>} and all of +@code{<hurd/shared.h>} describe how to implement shared-memory I/O +operations. However, shared I/O has been deprecated. @xref{Conch +Management}, for more details.} + +@menu +* I/O Object Ports:: How ports to I/O objects work. +* Simple Operations:: Read, write, and seek. +* Open Modes:: State bits that affect pieces of operation. +* Asynchronous I/O:: How to be notified when I/O is possible. +* Information Queries:: How to implement @code{io_stat} and + @code{io_server_version}. +* Mapped Data:: Getting memory objects referring to the + data of an I/O object. +@end menu + +@node I/O Object Ports +@subsection I/O Object Ports The I/O server must associate each I/O port with a particular set of uids and gids, identifying the user who is responsible for operations on the port. Every port to an I/O server should also support either the -file protocol or the socket protocol; naked I/O ports are not allowed. +file protocol (@pxref{File Interface}) or the socket protocol +(@pxref{Socket Interface}); naked I/O ports are not allowed. In addition, the server associates with each port a default file pointer, a set of open mode bits, a pid (called the ``owner''), and some @@ -130,679 +1544,2898 @@ identification of a set of uids and gids with a particular port at the moment of the port's creation. The other characteristics of an I/O port may be shared with other users. The I/O server interface does not generally specify in what way servers may share these other -characteristics are shared (with the exception of the deprecated O_ASYNC -interface); however, the file and socket interfaces make further -requirements about what sharing is expected and prohibited from +characteristics are shared (with the exception of the deprecated +@code{O_ASYNC} interface); however, the file and socket interfaces make +further requirements about what sharing is expected and prohibited from occurring. -In general, users get send-rights to I/O ports by some mechanism that is -external to the I/O protocol. (For example file servers give out I/O -ports in response to the dir_pathtrans and fsys_getroot calls. Socket -servers give out ports in response to the socket_create and -socket_accept calls.) However, the I/O protocol provides methods of -obtaining new ports that refer to the same underlying object as another -port. In response to all of these calls, all underlying state -(including, but not limited to, the default file pointer, open mode -bits, and underlying object) must be shared between the old and new -ports. In the following descriptions of these calls, the term -``identical'' means this kind of sharing. All these calls must return -send-rights to a newly-constructed Mach port. - -The io_duplicate call simply returns another port which is identical -to an existing port and has the same uid and gid set. - -The io_restrict_auth call returns another port, identical to the +In general, users get send rights to I/O ports by some mechanism that is +external to the I/O protocol. (For example fileservers give out I/O +ports in response to the @code{dir_lookup} and @code{fsys_getroot} +calls. Socket servers give out ports in response to the +@code{socket_create} and @code{socket_accept} calls.) However, the I/O +protocol provides methods of obtaining new ports that refer to the same +underlying object as another port. In response to all of these calls, +all underlying state (including, but not limited to, the default file +pointer, open mode bits, and underlying object) must be shared between +the old and new ports. In the following descriptions of these calls, +the term ``identical'' means this kind of sharing. All these calls must +return send rights to a newly-constructed Mach port. + +@findex io_duplicate +The @code{io_duplicate} call simply returns another port which is +identical to an existing port and has the same uid and gid set. + +@findex io_restrict_auth +The @code{io_restrict_auth} call returns another port, identical to the provided port, but which has a smaller associated uid and gid set. The uid and gid sets of the new port are the intersection of the set on the existing port and the lists of uids and gids provided in the call. -Users use the io_reauthenticate call when they wish to have an entirely -new set of uids or gids associated with a port. In response to the -io_reauthenticate call, the server must create a new port, and then make -the call auth_server_authenticate to the auth server. The rendezvous -port for the auth_server_authenticate call is the I/O port to which was -made the io_reauthenticate call. The server provides rend_int parameter -to the auth server as a copy from the corresponding parameter in the -io_reauthenticate call. The I/O server also gives the auth server a new -port; this must be a newly created port identical to the old port. The -auth server will return the set of uids and gids associated with the -user, and guarantees that the new port will go directly to the user that -possessed the associated authentication port. The server then -identifies the new port given out with the specified id's. - -@node Simple operations -@section Simple operations - -Users write to I/O ports by calling the io_write RPC. They specify an -offset parameter; if the object supports writing at arbitrary offsets, -the server should honor this parameter. If -1 is passed as the offset, -then the server should use the default file pointer. The server should -return the amount of data which was successfully written. If the -operation was interrupted after some but not all of the data was -written, then it is considered to have succeeded and the server should -return the amount written. If the port is not an I/O port at all, the -server should reply with the error EOPNOTSUPP. If the port is an I/O -port, but does not happen to support writing, then the correct error is -EBADF. - -Users read from I/O ports by calling the io_read RPC. The specify the -amount of data they wish to read and the offset. The offset has the -same meaning as for io_write above. The server should return the data -read. If the call is interrupted after same data has been read (and the -operation is not idempotent) then the server should return the amount -read, even if less than the amount requested. The server should return -as much data as possible, but never more than requested by the user. If -there is no data, but there might be later, the call should block until -data becomes available. Indicate end-of-file conditions by returning -zero bytes. If the call is interrupted after some data has been read, -but the call is idempotent, then the server may return EINTR rather than -actually filling the buffer (taking care that any modifications of the -default file pointer have been reversed). Preferably, however, servers -should return data if possible. +@findex io_reauthenticate +Users use the @code{io_reauthenticate} call when they wish to have an +entirely new set of uids or gids associated with a port. In response to +the @code{io_reauthenticate} call, the server must create a new port, +and then make the call @code{auth_server_authenticate} to the auth +server. The rendezvous port for the @code{auth_server_authenticate} +call is the I/O port to which was made the @code{io_reauthenticate} +call. The server provides the @var{rend_int} parameter to the auth +server as a copy from the corresponding parameter in the +@code{io_reauthenticate} call. The I/O server also gives the auth +server a new port; this must be a newly created port identical to the +old port. The authserver will return the set of uids and gids +associated with the user, and guarantees that the new port will go +directly to the user that possessed the associated authentication port. +The server then identifies the new port given out with the specified +ID's. + +@node Simple Operations +@subsection Simple Operations + +@findex io_write +Users write to I/O ports by calling the @code{io_write} RPC. They +specify an @var{offset} parameter; if the object supports writing at +arbitrary offsets, the server should honour this parameter. If @math{-1} +is passed as the offset, then the server should use the default file +pointer. The server should return the amount of data which was +successfully written. If the operation was interrupted after some but +not all of the data was written, then it is considered to have succeeded +and the server should return the amount written. If the port is not an +I/O port at all, the server should reply with the error +@code{EOPNOTSUPP}. If the port is an I/O port, but does not happen to +support writing, then the correct error is @code{EBADF}. + +@findex io_read +Users read from I/O ports by calling the @code{io_read} RPC. They +specify the amount of data they wish to read and the offset. The offset +has the same meaning as for @code{io_write} above. The server should +return the data that was read. If the call is interrupted after some +data has been read (and the operation is not idempotent) then the server +should return the amount read, even if less than the amount requested. +The server should return as much data as possible, but never more than +requested by the user. If there is no data, but there might be later, +the call should block until data becomes available. Indicate +end-of-file conditions by returning zero bytes. If the call is +interrupted after some data has been read, but the call is idempotent, +then the server may return @code{EINTR} rather than actually filling the +buffer (taking care that any modifications of the default file pointer +have been reversed). Preferably, however, servers should return data. There are two categories of objects: seekable and non-seekable. -Seekable objects must accept arbitrary offset parameters in the io_read -and io_write calls, and to implement the io_seek call. Nonseekable -objects must ignore the offset parameters to io_read and io_write, and -should return ESPIPE to the io_seek call. - -On seekable objects, io_seek changes the default file pointer for reads -and writes. (See the C library manual for the interpretation of the -@var{whence} and @var{offset} arguments.) It returns the new offset as -modified by io_seek. - -The io_readable interface returns the amount of data which can be +Seekable objects must accept arbitrary offset parameters in the +@code{io_read} and @code{io_write} calls, and to implement the +@code{io_seek} call. Nonseekable objects must ignore the offset +parameters to @code{io_read} and @code{io_write}, and should return +@code{ESPIPE} to the @code{io_seek} call. + +@findex io_seek +On seekable objects, @code{io_seek} changes the default file pointer for +reads and writes. (@xref{File Positioning, , , libc, The GNU C Library +Reference Manual}, +for the interpretation of the @var{whence} and @var{offset} arguments.) +It returns the new offset as modified by @code{io_seek}. + +@findex io_readable +The @code{io_readable} interface returns the amount of data which can be immediately read. For the special technical meaning of ``immediately'', see @ref{Asynchronous I/O}. -@node Open modes -@section Open modes +@node Open Modes +@subsection Open Modes +@findex io_set_all_openmodes +@findex io_get_openmodes +@findex io_set_some_openmodes +@findex io_clear_some_openmodes The server associates each port with a set of bits that affect its -operation. The io_set_all_openmodes call modifies these bits and the -io_get_openmodes call returns them. In addition, the -io_set_some_openmodes and io_clear_some_openmodes do an atomic -read/modify/write of the openmodes. - -The O_APPEND bit, when set, changes the behavior of io_write when it -uses the default file pointer on seekable objects. When io_write is -done on a port with the O_APPEND bit set, is must set the filepointer to -one more than the ``maximum correct value'' (described below) before doing -the write (which would then increment the file pointer as usual). The -server must atomically bind this update to the actual data write with -respect to other users of io_read, io_write, and io_seek. - -A ``correct value'' for the file pointer which, when provided to io_read, -will successfully return at least one byte of data and not end-of-file. -The ``maximum correct value'' referred to in the description of O_APPEND -is the maximum such correct value. (For ordinary files [see the -description of the file protocol for more information] this is the same -as the current file size.) - -The O_FSYNC bit, when set, causes io_write not to delay writing data to -underlying media in any fashion. - -The O_NONBLOCK bit, when set, prevents read and write from blocking. -They should copy such data as is immediately available. If no data is -immediately available they should return EWOULDBLOCK. - -The definition of ``immediate'' is more or less server dependent. Some -servers (disk-based file servers, most notably) regard all data as -immediatebly available. The one criterion is that something which must -happen immediately may not wait for any user-synchronizable event. - -The O_ASYNC bit is deprecated; its use is documented in the following -section. This bit must be shared between all users of the same -underlying object. +operation. The @code{io_set_all_openmodes} call modifies these bits and +the @code{io_get_openmodes} call returns them. In addition, the +@code{io_set_some_openmodes} and @code{io_clear_some_openmodes} do an +atomic read/modify/write of the openmodes. + +The @code{O_APPEND} bit, when set, changes the behaviour of +@code{io_write} when it uses the default file pointer on seekable +objects. When @code{io_write} is done on a port with the +@code{O_APPEND} bit set, is must set the file pointer to the current +file size before doing the write (which would then increment the file +pointer as usual). The @dfn{current file size} is the smallest offset +which returns end-of-file when provided to @code{io_read}. The server +must atomically bind this update to the actual data write with respect +to other users of @code{io_read}, @code{io_write}, and @code{io_seek}. + +The @code{O_FSYNC} bit, when set, guarantees that @code{io_write} will +not return until data is fully written to the underlying medium. + +The @code{O_NONBLOCK} bit, when set, prevents read and write from +blocking. They should copy such data as is immediately available. If +no data is immediately available they should return @code{EWOULDBLOCK}. + +The definition of ``immediately'' is more-or-less server-dependent. +Some servers, notably stored filesystem servers (@pxref{Stored +Filesystems}), regard all data as immediately available. The one +criterion is that something which must happen @dfn{immediately} may not +wait for any user-synchronizable event. + +The @code{O_ASYNC} bit is deprecated; its use is documented in the +following section. This bit must be shared between all users of the +same underlying object. + @node Asynchronous I/O -@section Asynchronous I/O +@subsection Asynchronous I/O +@findex io_async Users may wish to be notified when I/O can be done without blocking; -they use the io_async call to indicate this to the server. In the -io_async call the user provides a port on which will the server should -send sig_post messages as I/O becomes possible. The server must return -a port which will be the reference port in the sig_post messages. Each -io_async call should generate a new reference port. (See the C library -manual for information on how to send sig_post messages.) - -The server then sends one SIGIO signal to each registered async user -everytime I/O becomes possible. I/O is possible if at least one byte -can be read or written immediately. (The definition of ``immediately'' -must be the same as for the implementation of the O_NONBLOCK flag.) In -addition, everytime a user calls io_read or io_write on a non-seekable -object, or at the default file pointer on a seekable object, another -signal should be sent to each user if I/O is still possible. +they use the @code{io_async} call to indicate this to the server. In +the @code{io_async} call the user provides a port on which will the +server should send @code{sig_post} messages as I/O becomes possible. +The server must return a port which will be the reference port in the +@code{sig_post} messages. Each @code{io_async} call should generate a +new reference port. (FIXME: xref the C library manual for information +on how to send sig_post messages.) + +The server then sends one @code{SIGIO} signal to each registered async +user everytime I/O becomes possible. I/O is possible if at least one +byte can be read or written immediately. The definition of +``immediately'' must be the same as for the implementation of the +@code{O_NONBLOCK} flag (@pxref{Open Modes}). In addition, every time a +user calls io_read or io_write on a non-seekable object, or at the +default file pointer on a seekable object, another signal should be sent +to each user if I/O is still possible. Some objects may also define ``urgent'' conditions. Such servers should -send the SIGURG signal to each registered async user anytime an urgent -condition appears. After any RPC that has the possibility of clearing -the urgent condition, the server should again send the signal to all -registered users if the urgent condition is still present. - -A more fine-grained mechanism for doing async I/O is the io_select call. -The user specifies the kind of access desired, and a send-once right. -If I/O of the kind the user desires is immediately possible, then the -server should return so indicating, and destroy the send-once right. If -I/O is not immediately possible, the server should save the send-once -right, and send a select_done message as soon as I/O becomes immediately -possible. (Again, the definition of ``immediate'' must be the same for -io_select, io_async, and O_NONBLOCK.) - -For compatibility, the I/O interface provides a deprecated feature -(known as icky async I/O).. The calls io_mod_owner and io_get_owner set -the ``owner'' of the object, providing either a pid or a pgrp (if the -value isnegative). Whenever the I/O server is sending sig_post messages -to all the io_async users, if the O_ASYNC bit is set, the server should -also send a signal to the owning pid/pgrp. The ID port for this call -should be different from all the io_async id ports given to users. -Users may find out what ID port the server uses for this by calling -io_get_icky_async_id. - -@node Information queries -@section Information queries - -Users may call io_stat to find out information about the I/O object. -Most of the fieds of a struct stat are meaningful only for files. All -objects, however, must support the fields st_fstype, st_fsid, st_ino, -st_atime, st_atime_usec, st_mtime_user, st_ctime, st_ctime_usec, and -st_blksize. - -st_fstype, st_fsid, and st_ino must be unique for the underlying object -across the entire system. - -st_atime and st_atime_usec hold the seconds and microseconds, -respectively, of the system clock at the last time the object was -read with io_read. - -st_mtime and st_mtime_usec hold the second and microseconds, +send the @code{SIGURG} signal to each registered async user anytime an +urgent condition appears. After any RPC that has the possibility of +clearing the urgent condition, the server should again send the signal +to all registered users if the urgent condition is still present. + +@findex io_select +A more fine-grained mechanism for doing async I/O is the +@code{io_select} call. The user specifies the kind of access desired, +and a send-once right. If I/O of the kind the user desires is +immediately possible, then the server should return so indicating, and +destroy the send-once right. If I/O is not immediately possible, the +server should save the send-once right, and send a @code{select_done} +message as soon as I/O becomes immediately possible. Again, the +definition of ``immediately'' must be the same for @code{io_select}, +@code{io_async}, and @code{O_NONBLOCK} (@pxref{Open Modes}). + +@findex io_mod_owner +@findex io_get_owner +@findex io_get_icky_async_id +For compatibility with 4.2 and 4.3 BSD, the I/O interface provides a +deprecated feature (known as @dfn{icky async I/O}). The calls +@code{io_mod_owner} and @code{io_get_owner} to set the ``owner'' of the +object, providing either a pid or a pgrp (if the value is negative). +This implies that only one process at a time can do icky I/O on a given +object. Whenever the I/O server is sending @code{sig_post} messages to +all the @code{io_async} users, if the @code{O_ASYNC} bit is set, the +server should also send a signal to the owning pid/pgrp. The ID port +for this call should be different from all the @code{io_async} ID ports +given to users. Users may find out what ID port the server uses for +this by calling @code{io_get_icky_async_id}. + +@node Information Queries +@subsection Information Queries + +@findex io_stat +Users may call @code{io_stat} to find out information about the I/O +object. Most of the fields of a @code{struct stat} are meaningful only +for files. All objects, however, must support the fields +@code{st_fstype}, @var{st_fsid}, @var{st_ino}, @var{st_atime}, +@var{st_atime_usec}, @var{st_mtime_user}, @var{st_ctime}, +@var{st_ctime_usec}, and @var{st_blksize}. + +@var{st_fstype}, @var{st_fsid}, and @var{st_ino} must be unique for +the underlying object across the entire system. + +@var{st_atime} and @var{st_atime_usec} hold the seconds and +microseconds, respectively, of the system clock at the last time the +object was read with @code{io_read}. + +@var{st_mtime} and @var{st_mtime_usec} hold the second and microseconds, respectively, of the system clock at the last time the object was -written with io_write. +written with @code{io_write}. -Other appropriate operations may update the atime and the mtime as well; -both the file and socket interfaces specify such operations. +Other appropriate operations may update the @var{atime} and the +@var{mtime} as well; both the file and socket interfaces specify such +operations. -st_ctime and st_ctime_usec hold the seconds and microseconds, -respectively, of the system clock at the last time permanent meta-data -associated with the object was changed. The exact operations which -couse such an update are server-dependent, but must include the creation -of the object. +@var{st_ctime} and @var{st_ctime_usec} hold the seconds and +microseconds, respectively, of the system clock at the last time +permanent meta-data associated with the object was changed. The exact +operations which couse such an update are server-dependent, but must +include the creation of the object. The server is permitted to delay the actual update of these times until stat is called; before the server stores the times on permanent media (if it ever does so) it should update them if necessary. -st_blksize gives the optimal I/O size in bytes for io_read and io_write; -users should endeavor to read and write amounts which are multiples of -the optimal size, and to use offsets which are multiples of the optimal -size +@var{st_blksize} gives the optimal I/O size in bytes for @code{io_read} +and @code{io_write}; users should endeavor to read and write amounts +which are multiples of the optimal size, and to use offsets which are +multiples of the optimal size -In addition, objects which are seekable should set st_size to the -``maximum correct value'' described above in the description of the -O_APPEND flag. +In addition, objects which are seekable should set @var{st_size} to the +current file size as in the description of the @code{O_APPEND} flag +(@pxref{Open Modes}). -The st_uid and st_gid fields are unrelated to the ``owner'' as described -above for icky async I/O. +The @var{st_uid} and @var{st_gid} fields are unrelated to the ``owner'' +as described above for icky async I/O. +@findex io_server_version Users may find out the version of the server they are talking to by -calling io_server_version; this should return strings and integers -describing the version number of the server, as well as its name. - -@node Mapped data -@section Mapped data - -Servers may optionally implement the io_map call. The ports returned by -io_map must implement the XP kernel interface and be suitable as -arguments to vm_map. - -Seekable objects must allow access from 0 to the ``maximum correct value'' -described for O_APPEND. Whether they provide access beyond such a point -is server dependent; in addition, the meaning of such an object for a -non-seekable object is server dependent. - -@ignore -However, servers which -implement the facilities of the next section must obey to certain -requirements about which addresses in the memory objects provided by -io_map must be valid. Simply put, any user following the rules -described in the next chapter should not get any memory faults except as -explicitly permitted by the next chapter. -@end ignore +calling @code{io_server_version}; this should return strings and +integers describing the version number of the server, as well as its +name. -@c FIXME: here is a huge section which is completely ignored -@ignore -@node Shared I/O -@chapter Shared I/O +@node Mapped Data +@subsection Mapped Data + +@findex io_map +Servers may optionally implement the @code{io_map} call. The ports +returned by @code{io_map} must implement the external pager kernel +interface (@pxref{Pager Library}) and be suitable as arguments to +@code{vm_map}. + +Seekable objects must allow access from zero up to (but not including) +the current file size as described for @code{O_APPEND} (@pxref{Open +Modes}). Whether they provide access beyond such a point is +server-dependent; in addition, the meaning of accessing a non-seekable +object is server-dependent. -@c * Shared I/O:: The interface for doing input and output -@c using shared memory -I/O servers may, optionally, provide the services described in this -chapter in addition to the generic services described in the previous -chapter. These facilities allow users to read and write I/O objects -without making RPC's to the server in most circumstances. +@node Files +@chapter Files + +A file is traditionally thought of as a quantity of disk storage. In +the Hurd, files are an extension of the I/O interface, but they do not +necessarily correspond to disk storage. + +Every file in the Hurd is represented by a port, which is connected to +the server that manages the file. When a client wants to operate on a +file, it makes RPC requests via a file port to its server process, which +is commonly called a @dfn{translator}. @menu -* Rules:: The rules users must obey in using shared I/O -* Examples:: Examples of the way different types of servers - could implement shared I/O +* Translators:: Extending the Hurd filesystem hierarchy. +* Trivfs Library:: Implementing single-file translators. +* Fshelp Library:: Miscellaneous generic filesystem routines. +* File Interface:: File ports implement the file interface. +* Filesystem Interface:: Translator control interface. @end menu -@node Rules -@section Rules -Any server implementing the facilities of this chapter must also support -the io_map call as described in the previous chapter. +@node Translators +@section Translators -Users of the shared I/O facilities must call io_map_cntl; this will -return a memory object, called the shared page object. One page of this -object should be mapped from offset zero into the user's address space. -At the front of this page is a struct shared_io as described in -<hurd/shared.h>. Frequent reference will be made to the members of this -structure in this chapter, without further qualification. The shared -page past the struct shared_io may be used by users as they wish. +The Hurd filesystem allows you to set translators on any file or +directory that you own. A @dfn{translator} is any Hurd server which +provides the basic filesystem interface. Translated nodes are somewhat +like a cross between Unix symbolic links and mount points. + +Whenever a program tries to access the contents of a translated node, +the filesystem server redirects the request to the appropriate +translator (starting it if necessary). Then, the new translator +services the client's request. The GNU C library makes this behaviour +seamless from the client's perspective, so that standard Unix programs +behave correctly under the Hurd. + +Translators run with the priviledges of the translated node's +@emph{owner}, so they cannot be used to compromise the security of the +system. This also means that @emph{any} user can write their own +translators, and provide other users with arbitrary +filesystem-structured data, regardless of the data's actual source. +Other chapters in this manual describe existing translators, and how you +can modify them or write your own. + +The standard Hurd filesystem servers are constantly evolving to provide +innovative features that users want. Here are a few examples of +existing translators: + +@itemize @bullet +@item +Disk-based filesystem formats, such as @code{ext2fs}, @code{ufs}, and +@code{isofs} (@pxref{Stored Filesystems}). + +@item +Network filesystems, such as @code{nfs} and @code{ftpfs} +(@pxref{Distributed Filesystems}). + +@item +Single files with dynamic content, such as FIXME: we need a good +example. + +@item +@c FIXME: reword +Hurd servers which translate rendezvous filesystem nodes in standard +locations, so that other programs can easily find them and use +server-specific interfaces. For example, @code{pflocal} implements the +filesystem interfaces, but it also provides a special Unix-domain socket +RPC interface (FIXME xref). Programs can fetch a port to this +translator simply by calling @code{file_name_lookup} (FIXME xref) on +@file{/servers/socket/1}@footnote{The number 1 corresponds to the +@code{PF_LOCAL} C library socket domain constant.} then use Unix +socket-specific RPC's on that port, rather than adhering to the file +protocol. +@end itemize + +This section focuses on the generic programs that need to be understood +in order to use existing translators. Many other parts of this manual +describe how you can write your own translators. + +@menu +* Invoking settrans:: Declaring how a node should be translated. +* Invoking showtrans:: Displaying how nodes are translated. +* Invoking mount:: Unix-compatible active filesystem translators. +* Invoking fsysopts:: Modifying translation parameters at runtime. +@end menu -Users should examine the shared_page_magic field; from it they can -discover the byte ordering used by the server. Users should not blindly -assume that the server uses the same byte ordering as they. -Only one shared user can be active on a given port at a time. If a user -calls io_map_cntl on a port which already has an active shared user, the -server should return EBUSY, at which point the user should call -io_duplicate to obtain a new port, and call io_map_cntl there. +@node Invoking settrans +@subsection Invoking @code{settrans} +@pindex settrans + +The @code{settrans} program allows you to set a translator on a file or +directory. By default, the passive translator is set (see the +@samp{--passive} option). + +The @code{settrans} program has the following synopsis: + +@example +settrans [@var{option}]@dots{} @var{node} [@var{translator} @var{arg}@dots{}] +@end example + +@noindent +where @var{translator} is the absolute filename of the new translator +program. Each @var{arg} is passed to @var{translator} when it starts. +If @var{translator} is not specified, then @code{settrans} clears the +existing translator rather than setting a new one. + +@code{settrans} accepts the following options: + +@table @samp +@item -a +@itemx --active +Set @var{node}'s active translator. @dfn{Active translators} are +started immediately and are not persistent: if the system is rebooted +then they are lost. + +@item -c +@itemx --create +Create @var{node} as a zero-length file if it doesn't already exist. + +@item -L +@itemx --dereference +If @var{node} is already translated, stack the new translator on top of +it (rather than replacing the existing translator). + +@item --help +Display a brief usage message, then exit. + +@item -p +@itemx --passive +Set @var{node}'s passive translator. @dfn{Passive translators} are only +activated by the underlying filesystem when clients try to use the +@var{node}, and they shut down automatically after they are no longer +active in order to conserve system resources. + +Passive translators are stored on the underlying filesystem media, and +so they persist between system reboots. Not all filesystems support +passive translators, due to limitations in their underlying media@dots{} +consult the filesystem-specific documentation to see if they are +supported. + +If you are setting the passive translator, and @var{node} already has an +active translator, then the following options apply: + +@table @samp +@item -g +@itemx --goaway +Tell the active translator to go away. In this case, the following +additional options apply: + +@table @samp +@item -f +@itemx --force +If the active translator doesn't go away, then force it. + +@item -S +@itemx --nosync +Don't flush its contents to disk before terminating. + +@item -R +@itemx --recursive +Shut down all of the active translator's children, too. +@end table + + +@item -k +@itemx --keep-active +Leave the existing active translator running. The new translator will +not be started unless the active translator has stopped. +@end table + +@item -P +@itemx --pause +When starting an active translator, prompt and wait for a newline on +standard input before completing the startup handshake. This is useful +when debugging a translator, as it gives you time to start the debugger. + +@item -t @var{sec} +@itemx --timeout=@var{sec} +If the translator does not start up in @var{sec} seconds (the default is +60), then return an error; if @var{sec} is 0, then never timeout. + +@item --version +Output program version information and exit. + +@item -x +@itemx --exclusive +Only set the translator if there is none already. +@end table + + +FIXME: finish +@node Invoking showtrans +@subsection Invoking @code{showtrans} +@node Invoking mount +@subsection Invoking @code{mount} +@node Invoking fsysopts +@subsection Invoking @code{fsysopts} + + +@node Trivfs Library +@section Trivfs Library +@scindex libtrivfs +@scindex trivfs.h + +Certain translators do not need to be very complex, because they +represent a single file rather than an entire directory hierarchy. The +trivfs library, which is declared in @code{<hurd/trivfs.h>}, does most of +the work of implementing this kind of translator. This library requires +the iohelp and ports libraries. @menu -* Conch:: How access to the shared page is mediated -* Access rules:: Where in the io_map memory objects users may - peek and poke -* Status notification:: Calls users should make at certain times to - keep the server abreast of the current state - of the object -* Behavior modification:: Modifications of behavior -* Violations:: When the rules are broken +* Trivfs Startup:: Writing a simple trivfs-based translator. +* Trivfs Callbacks:: Mandatory user-defined trivfs functions. +* Trivfs Options:: Optional user-defined trivfs functions. +* Trivfs Ports:: Managing control and protid ports. @end menu -@node Conch -@subsection Conch - -Access to the shared page is mediated through a facility known as the -``conch''. The ``lock'' field of the shared page protects the -conch_status field; users and the server must acquire this lock with -spin_lock before they may modify or examine conch_status. - -If the conch_status field is USER_HAS_CONCH or USER_RELEASE_CONCH, then -the user has the conch, and may access the shared page after releasing -the spin lock. If the conch_status field is USER_COULD_HAVE_CONCH, then -the user may immediately set conch_status to USER_HAS_CONCH, and proceed -to access the shared page after releasing the spin lock. If the conch -status is USER_HAS_NOT_CONCH, then the user should release the spin -lock, and call io_get_conch. Upon return from io_get_conch, the user -should reacquire the spin lock and check conch_status again. - -When the user is through accessing the shared page, the user should -acquire the spin lock and examine the conch_status field. If it has -been set to USER_RELEASE_CONCH, then the user should release the spin -lock and call io_release_conch. Otherwise, the user should change -conch_status from USER_HAS_CONCH to USER_COULD_HAVE_CONCH and then -release the spin lock. - -The implementation of io_read and io_write must not modify the object -data or the default file pointer except when the server is holding the -conch; users who wish to be atomic with respect to those functions -should be similarly reticent. - -The server must guarantee that at most one user of an underlying object -has the conch at a time; the server may only have the conch if no user -does. The server may not modify conch_status or the shared page if the -status is USER_HAS_CONCH except to set it to USER_RELEASE_CONCH, thus -requesting a call to io_release_conch. - -The server is permitted to modify any characteristics of the shared page -anytime the conch_status is not USER_HAS_CONCH or USER_RELEASE_CONCH; -users may not assume that the shared page has not changed even when only -upgrading USER_COULD_HAVE_CONCH to USER_HAS_CONCH. - -@node Access rules -@subsection Access rules - -The conch fields file_size, read_size, and prenotify_size affect which -areas of the data objects may be accessed. In addition, for -non-seekable objects, the file pointers rd_file_pointer, -wr_file_pointer, and xx_file_pointer affect which areas may be accessed. - -For seekable objects, the user may read the read object from offset 0 -through the minimum of file_size and read_size. - -For seekable objects, the user may write the write object from offset 0 -through the prenotify_size. - -For nonseekable objects, the user may read the read object from -rd_file_pointer through the minimum of file_size and read_size. - -For nonseekable objects, the user may write the write object from -wr_file_pointer through prenotify_size. - -The server may permit access outside these regions, but need not -preserve data for any length of time if so written. If the server -wishes to deny such access, it issue faults with EIO. Servers may also -issue faults on modifications of the write object for reasons such as -EDQUOT and ENOSPC, as well as reporting hardware errors with EIO. -Servers may only fault valid addresses in the read object in the event -of hardware failure, giving EIO. - -Users should ignore the foo field if the value use_foo is clear in the -shared page; this may result in there being no maximum valid address for -a particular access. In that case, the user may access the object to -the end of its virtual address space. - -If use_file_size is set, the user may increase the file_size, but may -not decrease it, to indicate the new ``maximum correct value'' as -described for O_APPEND. Normally when users write beyond the current -file_size they should extend it at least to the end of the write. - -The xx_file_pointer for seekable objects must be the same as the default -file pointer used by io_read and io_write. - -If use_read_size is set and the user wishes to read past read_size, she -may call io_readsleep, which must return as soon as read_size is -increased. The server should set read_block_reason anytime -use_read_size is set; if read_block_reason is RBR_BUFFER_FULL, then the -server is indicating that the read_size might never be increased until -the rd_file_pointer is sufficiently increased. - -If the server has set use_prenotify_size and the user wishes to write -past prenotify_size, she may call io_prenotify, specifying the maximum -offset the user intends to write. The server should return when after -increasing prenotify_size, but is not obligated to extend it as far as -the user wishes. In addition, io_prenotify may return errors such as -ENOSPC, indicating that the prenotify_size cannot be increased. - -Users of seekable objects may modify the xx_file_pointer at will -(including pointing past read_size, file_size, or prenotify_size). -Users of non-seekable objects, however, may only increase the -rd_file_pointer and wr_file_pointer. In addition, they may not modify -them to point past the valid data as described above. Failing to -advance them at all may prevent the read_size or prenotify_size from -being increased. - -If the server sets eof_notify, then the user may attempt to have the -file_size to be increased by calling io_eofnotify after ``noticing'' the -current file size limit. io_eofnotify must return immediately, but need -not actually increase the file_size or clear user_file_size. (However, -if it is impossible for io_eofnotify to ever do anything, then the -server should not bother setting eof_notify.) - -@node Status notification -@subsection Status notification - -The flag do_sigio requests the user to call io_sigio every time she -changes the file pointers or the file_size. - -If the server sets use_postnotify_size, then the user should call -io_postnotify after writing data that extends past postnotify_size. The -server may buffer writes internally beyond postnotify_size for -arbitrarily long periods until io_postnotify is called, regardless of -the setting of the O_FSYNC bit. - -After modifying or reading the object contents, the user should set the -written or accessed fields respectively. (Users who fail to set these -fields will not thereby defeat the mtime/atime mechanism.) - -If the flag use_eof is set, then users should call io_eofnotify after -reading up to the file_size and noticing it. - -@node Behavior modification -@subsection Behavior modification - -The server flag append_mode is a copy of the O_APPEND open mode bit; if -it is set, then the user should do writes at file_size and set the file -pointer appropriately (this applies only if the user would be writing at -the file pointer in the first place). - -Servers should implement the flag O_FSYNC by using the postnotify_size -field. - -Servers should implement the io_async and O_ASYNC notifications by using -the do_sigio field. - -@node Violations -@subsection Violations - -Users who hold the conch for too long while conch_status is set to -USER_RELEASE_CONCH may have the conch stolen from them and their -conch_status unilaterally downgraded to USER_HAS_NOT_CONCH by the -server. Users who hold the spin lock for too long (where this ``too -long'' is much much shorter than the previous one) may have the spin -lock stolen from them by the server. - -Users who read or write outside the valid regions described above may -get memory faults and may not expect data written to be saved in any -fashion. - -Users who write the read object (when it is different from the write -object) may or may not get faults; they may not expect such data to be -saved in any fashion. - -Users who fail to call io_postnotify may cause data to be buffered for -arbitrarily long periods. - -Users who reduce rd_file_pointer, wr_file_pointer, or file_size will -have such modifications ignored. - -Users may not call any server functions (whether in the I/O protocol or -another) while holding the conch except for those specified in this -chapter. Such calls may block indefinitely or fail silently. - -@node Examples -@section Examples - -FIXME: Examples of the way different types of servers could implement -shared I/O +@node Trivfs Startup +@subsection Trivfs Startup + +In order to use the trivfs library, you will need to define the +appropriate callbacks (@pxref{Trivfs Callbacks}). As with all Hurd +servers, your trivfs-based translator should first parse any +command-line options, in case the user is just asking for help. Trivfs +uses argp (@pxref{Argp, , , libc, The GNU C Library Reference Manual}) +for parsing command-line arguments. + +Your translator should redefine the following functions and variables as +necessary, and then call @code{argp_parse} with the relevant arguments: + +@deftypevar {extern struct argp *} trivfs_runtime_argp +If this is defined or set to an argp structure, it will be used by the +default @code{trivfs_set_options} to handle runtime options parsing. +Redefining this is the normal way to add option parsing to a trivfs +program. +@end deftypevar + +@deftypefun error_t trivfs_set_options (@w{struct trivfs_control *@var{fsys}}, @w{char *@var{argz}}, @w{size_t @var{argz_len}}) +Set runtime options for @var{fsys} to @var{argz} and @var{argz_len}. +The default definition for this routine simply uses +@var{trivfs_runtime_argp} (supplying @var{fsys} as the argp input +field). +@end deftypefun + +@deftypefun error_t trivfs_append_args (@w{struct trivfs_control *@var{fsys}}, @w{char **@var{argz}}, @w{size_t *@var{argz_len}}) +Append to the malloced string @code{*@var{argz}} of length +@code{*@var{argz_len}} a NUL-separated list of the arguments to this +translator. +@end deftypefun + +After your translator parses its command-line arguments, it should fetch +its bootstrap port by using @code{task_get_bootstrap_port}. If this +port is @code{MACH_PORT_NULL}, then your program wasn't started as a +translator. Otherwise, you can use the bootstrap port to create a new +control structure (and advertise its port) with @code{trivfs_startup}: + +@deftypefun error_t trivfs_startup (@w{mach_port_t @var{bootstrap}}, @w{int @var{flags}}, @w{struct port_class *@var{control_class}}, @w{struct port_bucket *@var{control_bucket}}, @w{struct port_class *@var{protid_class}}, @w{struct port_bucket *@var{protid_bucket}}, @w{struct trivfs_control **@var{control}}) +@deftypefunx error_t trivfs_create_control (@w{mach_port_t @var{bootstrap}}, @w{struct port_class *@var{control_class}}, @w{struct port_bucket *@var{control_bucket}}, @w{struct port_class *@var{protid_class}}, @w{struct port_bucket *@var{protid_bucket}}, @w{struct trivfs_control **@var{control}}) +@code{trivfs_startup} creates a new trivfs control port, advertises it +to the underlying node @var{bootstrap} with @code{fsys_startup}, +returning the results of this call, and places its control structure in +@code{*@var{control}}. @code{trivfs_create_control} does the same +thing, except it doesn't advertise the control port to the underlying +node. @var{control_class} and @var{control_bucket} are passed to +@code{libports} to create the control port, and @var{protid_class} and +@var{protid_bucket} are used when creating ports representing opens of +this node; any of these may be zero, in which case an appropriate port +class/bucket is created. If @var{control} is non-null, the trivfs +control port is returned in it. @var{flags} (a bitmask of the +appropriate @code{O_*} constants) specifies how to open the underlying +node. +@end deftypefun + +If you did not supply zeros as the class and bucket arguments to +@code{trivfs_startup}, you will probably need to use the trivfs port +management functions (@pxref{Trivfs Ports}). + +Once you have successfully called @code{trivfs_startup}, and have a +pointer to the control structure stored in, say, the @var{fsys} +variable, you are ready to call one of the +@code{ports_manage_port_operations_*} functions using +@code{@var{fsys}->pi.bucket} and @code{trivfs_demuxer}. This will +handle any incoming filesystem requests, invoking your callbacks when +necessary. + +@deftypefun int trivfs_demuxer (@w{mach_msg_header_t *@var{inp}}, @w{mach_msg_header_t *@var{outp}}) +Demultiplex incoming @code{libports} messages on trivfs ports. +@end deftypefun + +The following functions are not usually necessary, but they allow you to +use the trivfs library even when it is not possible to turn +message-handling over to @code{trivfs_demuxer} and @code{libports}: + +@deftypefun {struct trivfs_control *} trivfs_begin_using_control (@w{mach_port_t @var{port}}) +@deftypefunx {struct trivfs_protid *} trivfs_begin_using_protid (@w{mach_port_t @var{port}}) +These functions can be used as @code{intran} functions for a MiG port +type to have the stubs called with either the control or protid pointer. +@end deftypefun + +@deftypefun void trivfs_end_using_control (@w{struct trivfs_control *@var{port}}) +@deftypefunx void trivfs_end_using_protid (@w{struct trivfs_protid *@var{port}}) +These can be used as `destructor' functions for a MiG port type, to have +the stubs called with the control or protid pointer. +@end deftypefun + +@deftypefun error_t trivfs_open (@w{struct trivfs_control *@var{fsys}}, @w{struct iouser *@var{user}}, @w{unsigned @var{flags}}, @w{mach_port_t @var{realnode}}, @w{struct trivfs_protid **@var{cred}}) +Return a new protid (that is, a port representing an open of this node) +pointing to a new peropen in @var{cred}, with @var{realnode} as the +underlying node reference, with the given identity, and open flags in +@var{flags}. @var{cntl} is the trivfs control object. +@end deftypefun + +@deftypefun error_t trivfs_protid_dup (@w{struct trivfs_protid *@var{cred}}, @w{struct trivfs_protid **@var{dup}}) +Return a duplicate of @var{cred} in @var{dup}, sharing the same peropen +and hook. A non-null protid @var{hook} indicates that +@var{trivfs_peropen_create_hook} created this protid (@pxref{Trivfs +Options}). +@end deftypefun + +@deftypefun error_t trivfs_set_atime (@w{struct trivfs_control *@var{cntl}}) +@deftypefunx error_t trivfs_set_mtime (@w{struct trivfs_control *@var{cntl}}) +Call these to set atime or mtime for the node to the current time. +@end deftypefun + + +@node Trivfs Callbacks +@subsection Trivfs Callbacks + +Like several other Hurd libraries, @code{libtrivfs} requires that you +define a number of application-specific callback functions and +configuration variables. You @emph{must} define the following variables +and functions: + +@deftypevar {extern int} trivfs_fstype +@deftypevarx {extern int} trivfs_fsid +These variables are returned in the @var{st_fstype} and @var{st_fsid} +fields of @code{struct stat}. @var{trivfs_fstype} should be chosen +from the @code{FSTYPE_*} constants found in @code{<hurd/hurd_types.h>}. +@end deftypevar + +@deftypevar {extern int} trivfs_allow_open +Set this to some bitwise OR combination of @code{O_READ}, +@code{O_WRITE}, and @code{O_EXEC}; trivfs will only allow opens of the +specified modes. +@end deftypevar + +@deftypevar {extern int} trivfs_support_read +@deftypevarx {extern int} trivfs_support_write +@deftypevarx {extern int} trivfs_support_exec +Set these to nonzero if trivfs should allow read, write, or execute of +the file. These variables are necessary because @var{trivfs_allow_open} +is used only to validate opens, not actual operations. +@end deftypevar + +@deftypefun void trivfs_modify_stat (@w{struct trivfs_protid *@var{cred}}, @w{struct stat *@var{stbuf}}) +This should modify a @code{struct stat} (as returned from the underlying +node) for presentation to callers of @code{io_stat}. It is permissable +for this function to do nothing, but it must still be defined. +@end deftypefun + +@deftypefun error_t trivfs_goaway (@w{struct trivfs_control *@var{cntl}}, @w{int @var{flags}}) +This function is called when someone wants the filesystem @var{cntl} to +go away. @var{flags} are from the set @code{FSYS_GOAWAY_*} found in +@code{<hurd/hurd_types.h>}. +@end deftypefun + + +@node Trivfs Options +@subsection Trivfs Options + +The functions and variables described in this subsection already have +default definitions in @code{libtrivfs}, so you are not forced to define +them; rather, they may be redefined on a case-by-case basis. + +@deftypevar {extern struct port_class *} trivfs_protid_portclasses[] +@deftypevarx {extern int} trivfs_protid_nportclasses +@deftypevarx {extern struct port_class *} trivfs_cntl_portclasses[] +@deftypevarx {extern int} trivfs_cntl_nportclasses +If you define these, they should be vectors (and the associated sizes) +of port classes that will be translated into control and protid pointers +for passing to RPCs, in addition to those passed to or created by +@code{trivfs_create_control} (or @code{trivfs_startup}) will +automatically be recognized. +@end deftypevar + +@deftypefn {Variable} {error_t (*} trivfs_check_open_hook ) (@w{struct trivfs_control *@var{cntl}}, @w{struct iouser *@var{user}}, @w{int @var{flags}}) +If this variable is set, it is called every time an open happens. +@var{user} and @var{flags} are from the open; @var{cntl} identifies the +node being opened. This call need not check permissions on the +underlying node. This call can block as necessary, unless +@code{O_NONBLOCK} is set in @var{flags}. Any desired error can be +returned, which will be reflected to the user and prevent the open from +succeeding. +@end deftypefn + +@deftypefn {Variable} {error_t (*} trivfs_protid_create_hook ) (@w{struct trivfs_protid *@var{prot}}) +@deftypefnx {Variable} {error_t (*} trivfs_peropen_create_hook ) (@w{struct trivfs_peropen *@var{perop}}) +If these variables are set, they is called every time a new protid or +peropen structure is created and initialized. +@end deftypefn + +@deftypefn {Variable} {void (*} trivfs_protid_destroy_hook ) (@w{struct trivfs_protid *@var{prot}}) +@deftypefnx {Variable} {void (*} trivfs_peropen_destroy_hook ) (@w{struct trivfs_peropen *@var{perop}}) +If these variables is set, they are called every time a protid or +peropen structure is about to be destroyed. +@end deftypefn + +@deftypefn {Variable} {error_t (*} trivfs_getroot_hook ) (@w{struct trivfs_control *@var{cntl}}, @w{mach_port_t @var{reply_port}}, @w{mach_msg_type_name_t @var{reply_port_type}}, @w{mach_port_t @var{dotdot}}, @w{uid_t *@var{uids}}, @w{u_int @var{nuids}}, @w{uid_t *@var{gids}}, @w{u_int @var{ngids}}, @w{int @var{flags}}, @w{retry_type *@var{do_retry}}, @w{char *@var{retry_name}}, @w{mach_port_t *@var{node}}, @w{mach_msg_type_name_t *@var{node_type}}) +If this variable is set, it is called by @code{trivfs_S_fsys_getroot} +before any other processing takes place; if the return value is +@code{EAGAIN}, normal trivfs getroot processing continues, otherwise the +RPC returns with that return value. +@end deftypefn + + +@node Trivfs Ports +@subsection Trivfs Ports + +If you choose to allocate your own trivfs port classes and buckets, the +following functions may come in handy: + +@deftypefun error_t trivfs_add_port_bucket (@w{struct port_bucket **@var{bucket}}) +Add the port bucket @code{*@var{bucket}} to the list of dynamically +allocated port buckets; if @code{*@var{bucket}} is zero, an attempt is +made to allocate a new port bucket, which is then stored in +@code{*@var{bucket}}. +@end deftypefun + +@deftypefun void trivfs_remove_port_bucket (@w{struct port_bucket *@var{bucket}}) +Remove the previously added dynamic port bucket @var{bucket}, freeing it +if it was allocated by @code{trivfs_add_port_bucket}. +@end deftypefun + +@deftypefun error_t trivfs_add_control_port_class (@w{struct port_class **@var{class}}) +@deftypefunx error_t trivfs_add_protid_port_class (@w{struct port_class **@var{class}}) +Add the port class @code{*@var{class}} to the list of control or protid port +classes recognized by trivfs; if @code{*@var{class}} is zero, an attempt is +made to allocate a new port class, which is stored in @code{*@var{class}}. +@end deftypefun + +@deftypefun void trivfs_remove_control_port_class (@w{struct port_class *@var{class}}) +@deftypefunx void trivfs_remove_protid_port_class (@w{struct port_class *@var{class}}) +Remove the previously added dynamic control or protid port class +@var{class}, freeing it if it was allocated by +@code{trivfs_add_control_port_class} or +@code{trivfs_add_protid_port_class}. +@end deftypefun + +Even if you do not use the above allocation functions, you may still be +able to use the default trivfs cleanroutines: + +@deftypefun void trivfs_clean_cntl (@w{void *@var{port}}) +@deftypefunx void trivfs_clean_protid (@w{void *@var{port}}) +These functions should be installed as @code{libports} cleanroutines for +control port classes and protid port classes, respectively. +@end deftypefun + + +@node Fshelp Library +@section Fshelp Library +@scindex libfshelp +@scindex fshelp.h + +The fshelp library implements various things that are generic to most +implementors of the file protocol. It presumes that you are using the +iohelp library as well. @code{libfshelp} is divided into separate +facilities which may be used independently. These functions are +declared in @code{<hurd/fshelp.h>}. -@end ignore -@node File interface -@chapter File interface +@menu +* Passive Translator Linkage:: Invoking passive translators. +* Active Translator Linkage:: Managing active translators. +* Fshelp Locking:: Implementing file locking. +* Fshelp Permissions:: Standard file access permission policies. +* Fshelp Misc:: Useful standalone routines. +@end menu -This chapter documents the interface for operating on files. +@node Passive Translator Linkage +@subsection Passive Translator Linkage + +These routines are self-contained and start passive translators, +returning the control port. They do not require multithreading or the +ports library. + +@deftypefn {Typedef} {typedef error_t (*} fshelp_open_fn_t ) (@w{int @var{flags}}, @w{file_t *@var{node}}, @w{mach_msg_type_name_t *@var{node_type}}) +A callback used by the translator starting functions, which should be a +function that given some open flags, opens the appropiate file, and +returns the node port. +@end deftypefn + +@deftypefun error_t fshelp_start_translator_long (@w{fshelp_open_fn_t @var{underlying_open_fn}}, @w{char *@var{name}}, @w{char *@var{argz}}, @w{int @var{argz_len}}, @w{mach_port_t *@var{fds}}, @w{mach_msg_type_name_t @var{fds_type}}, @w{int @var{fds_len}}, @w{mach_port_t *@var{ports}}, @w{mach_msg_type_name_t @var{ports_type}}, @w{int @var{ports_len}}, @w{int *@var{ints}}, @w{int @var{ints_len}}, @w{int @var{timeout}}, @w{fsys_t *@var{control}}) +Start a passive translator @var{name} with arguments @var{argz} (length +@var{argz_len}). Initialize the initports to @var{ports} (length +@var{ports_len}), the initints to @var{ints} (length @var{ints_len}), +and the file descriptor table to @var{fds} (length @var{fds_len}). +Return the control port in @code{*@var{control}}. If the translator doesn't +respond or die in @var{timeout} milliseconds (if @var{timeout} is +greater than zero), return an appropriate error. If the translator dies +before responding, return @code{EDIED}. +@end deftypefun + +@deftypefun error_t fshelp_start_translator (@w{fshelp_open_fn_t @var{underlying_open_fn}}, @w{char *@var{name}}, @w{char *@var{argz}}, @w{int @var{argz_len}}, @w{int @var{timeout}}, @w{fsys_t *@var{control}}) +Same as @code{fshelp_start_translator_long}, except the initports and +ints are copied from our own state, @var{fd[2]} is copied from our own +stderr, and the other fds are cleared. +@end deftypefun + +@node Active Translator Linkage +@subsection Active Translator Linkage + +These routines implement the linkage to active translators needed +by any filesystem which supports them. They require the threads +library and use the passive translator routines above, but they don't +require the ports library at all. + +This interface is complex, because creating the ports and state +necessary for @code{start_translator_long} is expensive. The caller to +@code{fshelp_fetch_root} should not need to create them on every call, +since usually there will be an existing active translator. + +@deftypefun void fshelp_transbox_init (@w{struct transbox *@var{transbox}}, @w{struct mutex *@var{lock}}, @w{void *@var{cookie}}) +Initialize a transbox, which contains state information for active +translators. +@end deftypefun + +@deftypefn {Typedef} {typedef error_t (*} fshelp_fetch_root_callback1_t ) (@w{void *@var{cookie1}}, @w{void *@var{cookie2}}, @w{uid_t *@var{uid}}, @w{gid_t *@var{gid}}, @w{char **@var{argz}}, @w{size_t *@var{argz_len}}) +This routine is called by @code{fshelp_fetch_root} to fetch more +information. Return the owner and group of the underlying translated +file in @code{*@var{uid}} and @code{*@var{gid}}; point +@code{*@var{argz}} at the entire passive translator specification for +the file (setting @code{*@var{argz_len}} to the length). If there is no +passive translator, then return @code{ENOENT}. @var{cookie1} is the +cookie passed in @code{fshelp_transbox_init}. @var{cookie2} is the +cookie passed in the call to @code{fshelp_fetch_root}. +@end deftypefn + +@deftypefn {Typedef} {typedef error_t (*} fshelp_fetch_root_callback2_t ) (@w{void *@var{cookie1}}, @w{void *@var{cookie2}}, @w{int @var{flags}}, @w{mach_port_t *@var{underlying}}, @w{mach_msg_type_name_t *@var{underlying_type}}) +This routine is called by @code{fshelp_fetch_root} to fetch more +information. Return an unauthenticated node for the file itself in +@code{*@var{underlying}} and @code{*@var{underlying_type}} (opened with +@var{flags}). @var{cookie1} is the cookie passed in +@code{fshelp_transbox_init}. @var{cookie2} is the cookie passed in the +call to @code{fshelp_fetch_root}. +@end deftypefn + +@deftypefun error_t fshelp_fetch_root (@w{struct transbox *@var{transbox}}, @w{void *@var{cookie}}, @w{file_t @var{dotdot}}, @w{struct iouser *@var{user}}, @w{int @var{flags}}, @w{fshelp_fetch_root_callback1_t @var{callback1}}, @w{fshelp_fetch_root_callback2_t @var{callback2}}, @w{retry_type *@var{retry}}, @w{char *@var{retryname}}, @w{mach_port_t *@var{root}}) +Fetch the root from @var{transbox}. @var{dotdot} is an unauthenticated +port for the directory in which we are looking; @var{user} specifies the +ids of the user responsible for the call. @var{flags} are as for +@code{dir_pathtrans} (but @code{O_CREAT} and @code{O_EXCL} are not +meaningful and are ignored). The transbox lock (as set by +@code{fshelp_transbox_init}) must be held before the call, and will be +held upon return, but may be released during the operation of the call. +@end deftypefun + +@deftypefun int fshelp_translated (@w{struct transbox *@var{box}}) +Return true if and only if there is an active translator on this box. +@end deftypefun + +@deftypefun error_t fshelp_set_active (@w{struct transbox *@var{box}}, @w{fsys_t @var{newactive}}, @w{int @var{excl}}) +Atomically replace the existing active translator port for this box with +@var{newactive}. If @var{excl} is non-zero then don't modify an +existing active transbox; return @code{EBUSY} instead. +@end deftypefun + +@deftypefun error_t fshelp_fetch_control (@w{struct transbox *@var{box}}, @w{mach_port_t *@var{control}}) +Fetch the control port to make a request on it. It's a bad idea to use +@code{fsys_getroot} with the result; use @code{fshelp_fetch_root} +instead. +@end deftypefun + +@deftypefun void fshelp_drop_transbox (@w{struct transbox *@var{box}}) +Clean transbox state so that deallocation or reuse is possible. +@end deftypefun + + +@node Fshelp Locking +@subsection Fshelp Locking + +The @code{flock} call is in flux, as the current Hurd interface (as of +version @value{VERSION}) is not suitable for implementing the POSIX +record-locking semantics. + + +@node Fshelp Permissions +@subsection Fshelp Permissions + +These functions are designed to aid with user permission checking. It +is a good idea to use these routines rather than to roll your own, so +that Hurd users see consistent handling of file and directory permission +bits. + +@deftypefun error_t fshelp_isowner (@w{struct stat *@var{st}}, @w{struct iouser *@var{user}}) +Check to see whether @var{user} should be considered the owner of the +file identified by @var{st}. If so, return zero; otherwise return an +appropriate error code. +@end deftypefun + +@deftypefun error_t fshelp_access (@w{struct stat *@var{st}}, @w{int @var{op}}, @w{struct iouser *@var{user}}) +Check to see whether the user @var{user} can operate on the file +identified by @var{st}. @var{op} is one of @code{S_IREAD}, +@code{S_IWRITE}, and @code{S_IEXEC}. If the access is permitted, return +zero; otherwise return an appropriate error code. +@end deftypefun + +@deftypefun error_t fshelp_checkdirmod (@w{struct stat *@var{dir}}, @w{struct stat *@var{st}}, @w{struct iouser *@var{user}}) +Check to see whether @var{user} is allowed to modify @var{dir} with respect to +existing file @var{st}. If there is no existing file, then @var{st} +should be set to zero. If the access is permissable return zero; +otherwise return an appropriate error code. +@end deftypefun + + +@node Fshelp Misc +@subsection Fshelp Misc + +The following functions are completely standalone: + +@deftypefun error_t fshelp_delegate_translation (@w{char *@var{server_name}}, @w{mach_port_t @var{requestor}}, @w{char **@var{argv}}) +Try to hand off responsibility from a translator to the server located +on the node @var{server_name}. @var{requestor} is the translator's +bootstrap port, and @var{argv} is the command line. If +@var{server_name} is null, then a name is concocted by appending +@code{argv[0]} to @code{_servers}. +@end deftypefun + +@deftypefun error_t fshelp_exec_reauth (@w{int @var{suid}}, @w{uid_t @var{uid}}, @w{int @var{sgid}}, @w{gid_t @var{gid}}, @w{auth_t @var{auth}}, error_t (*@var{get_file_ids}) (@w{struct idvec *@var{uids}}, @w{struct idvec *@var{gids}}), @w{mach_port_t *@var{ports}}, @w{mach_msg_type_number_t @var{num_ports}}, @w{mach_port_t *@var{fds}}, @w{mach_msg_type_number_t @var{num_fds}}, @w{int *@var{secure}}) +If @var{suid} or @var{sgid} is true, adds @var{uid} and/or @var{gid} +respectively to the authentication in +@code{@var{ports}[INIT_PORT_AUTH]}, and replaces it with the result. +All the other ports in @var{ports} and @var{fds} are then +reauthenticated, using any privileges available through @var{auth}. If +the auth port in @code{@var{ports}[INIT_PORT_AUTH]} is bogus, and +@var{get_file_ids} is non-null, it is called to get a list +of uids and gids from the file to use as a replacement. If @var{secure} +is non-null and any added ids are new, then the variable it points to is +set to nonzero, otherwise zero. If either the uid or gid case fails, +then the other may still apply. +@end deftypefun + +@deftypefun error_t fshelp_get_identity (@w{struct port_bucket *@var{bucket}}, @w{ino_t @var{fileno}}, @w{mach_port_t *@var{pt}}) +Return an identity port in @code{*@var{pt}} for the node numbered +@var{fileno}, suitable for returning from @code{io_identity}; exactly +one send right must be created from the returned value. @var{fileno} +should be the same value returned as the @var{fileno} out-parameter in +@code{io_identity}, and in the enclosing directory (except for mount +points), and in the @code{st_ino} stat field. @var{bucket} should be a +@code{libports} port bucket; fshelp requires the caller to make sure +port operations (for no-senders notifications) are used. +@end deftypefun + +@deftypefun error_t fshelp_return_malloced_buffer (@w{char *@var{buf}}, @w{size_t @var{len}}, @w{char **@var{rbuf}}, @w{mach_msg_type_number_t *@var{rlen}}) +Put data from the malloced buffer @var{buf}, @var{len} bytes long, into +@var{rbuf} (which is @var{rlen} bytes long), suitable for returning from +an RPC. If @var{len} is greater than zero, @var{buf} is freed, +regardless of whether an error is returned or not. +@end deftypefun + +@deftypefun error_t fshelp_set_options (@w{struct argp *@var{argp}}, @w{int @var{flags}}, @w{char *@var{argz}}, @w{size_t @var{argz_len}}, @w{void *@var{input}}) +Invoke @code{argp_parse} in the standard way, with data from @var{argz} +and @var{argz_len}. +@end deftypefun + +@deftypefun void fshelp_touch (@w{struct stat *@var{st}}, @w{unsigned @var{what}}, @w{volatile struct mapped_time_value *@var{maptime}}) +Change the stat times of @var{node} as indicated by @var{what} to +the current time. @var{what} is a bitmask of one or more of +the @code{TOUCH_ATIME}, @code{TOUCH_MTIME}, and @code{TOUCH_CTIME} +constants. +@end deftypefun + + +@node File Interface +@section File Interface +@scindex fs.defs + +This section documents the interface for operating on files. @menu -* File overview:: Basic concepts for the file interface -* Changing status:: Changing the owner (etc.) of a file -* Program execution:: Executing files -* File locking:: Implementing the flock call -* File frobbing:: Other active calls on files -* Opening files:: Looking up files in directories -* Modifying directories:: Creating and deleting nodes -* Notifications:: File and directory change callbacks -* Translators:: How to set and get translators +* File Overview:: Basic concepts for the file interface. +* Changing Status:: Changing the owner (etc.) of a file. +* Program Execution:: Executing files. +* File Locking:: Implementing the @code{flock} call. +* File Frobbing:: Other active calls on files. +* Opening Files:: Looking up files in directories. +* Modifying Directories:: Creating and deleting nodes. +* Notifications:: File and directory change callbacks. +* File Translators:: How to set and get translators. @end menu -@node File overview -@section File overview +@node File Overview +@subsection File Overview The file interface is a superset of the I/O interface (@pxref{I/O -interface}). Servers which provide the file interface are required to +Interface}). Servers which provide the file interface are required to support the I/O interface as well. All objects reachable in the filesystem are expected to provide the file interface, even if they do -not contain data. (The trivfs library make it easy to do so for -ordinary sorts of cases. @xref{Trivfs library}.) +not contain data. (The @code{trivfs} library makes it easy to do so for +ordinary sorts of cases. @xref{Trivfs Library}.) The interface definitions for the file interface are found in -<hurd/fs.defs>. +@code{<hurd/fs.defs>}. Files have various pieces of status information which are returned by -io_stat (@pxref{Information queries}). Most of this status information -can be directly changed by various calls in the file interface; some of -it should vary implicitly as the contents of the file change. +@code{io_stat} (@pxref{Information Queries}). Most of this status +information can be directly changed by various calls in the file +interface; some of it should vary implicitly as the contents of the file +change. Many of these calls have general rules associated with them describing -how security and privilege should operate. The diskfs (@pxref{Diskfs -library}) implements these rules for disk-based filesystems. (Trivfs -based servers generally have no need to implement these rules at all.) -We hope to move the implementation of these rules to the fshelp -library. +how security and privilege should operate. The @code{diskfs} library +(@pxref{Diskfs Library}) implements these rules for stored filesystems. +These rules have also been implemented in the fshelp library +(@pxref{Fshelp Library}). Trivfs-based servers generally have no need +to implement these rules at all. In special cases, there may be a reason to implement a different security check from that specified here, or to implement a call to do something slightly different. But such cases must be carefully -considered; make sure that you will not confuse blameless user programs +considered; make sure that you will not confuse innocent user programs through excessive cleverness. -If some operation cannot be implemented (for example, chauthor over -ftp), then the call should return EOPNOTSUPP. If it is merely difficult -to implement a call, it is much better to figure out a way to implement -it as a series of operations rather than returning errors to the user. +If some operation cannot be implemented (for example, @code{chauthor} +over FTP), then the call should return @code{EOPNOTSUPP}. If it is +merely difficult to implement a call, it is much better to figure out a +way to implement it as a series of operations rather than returning +errors to the user. -@node Changing status -@section Changing status +@node Changing Status +@subsection Changing Status -There are several RPC's avalaible for users to change much of the status +There are several RPCs available for users to change much of the status information associated with a file. (The information is returned by the -io_stat rpc; @ref{Information queries}.) +@code{io_stat} RPC; see @ref{Information Queries}.) All these operations are restricted to root and the owner of the file. -When attempted by another user, they should return EPERM. +When attempted by another user, they should return @code{EPERM}. -The file_chown RPC changes the owner and group of the file. Only root -should be able to change the owner, and changing the group to a group -the caller is not in should also be prohibited. Violating either of -these conditions should return EPERM. +@findex file_chown +The @code{file_chown} RPC changes the owner and group of the file. Only +root should be able to change the owner, and changing the group to a +group the caller is not in should also be prohibited. Violating either +of these conditions should return @code{EPERM}. -The file_chauthor RPC changes the author of the file. It should be -legitimate to change the author to any value without restriction. +@findex file_chauthor +The @code{file_chauthor} RPC changes the author of the file. It should +be legitimate to change the author to any value without restriction. -The file_chflags RPC changes the flags of the file. It should be +@findex file_chmod +The @code{file_chmod} RPC changes the file permission mode bits. + +@findex file_chflags +The @code{file_chflags} RPC changes the flags of the file. It should be legitimate to change the flags to any value without restriction. No standard meanings have been assigned to the flags yet, but we intend to do so. Do not assume that the flags format we choose will map -identically to that for some existing filesystem format. - -The file_utimes RPC changes the atime and mtime of the file. Making -this call must cause the ctime to be updated as well, even if no actual -change to either the mtime or the atime occurs. - -The file_set_size RPC is special; not only does it change the status -word specifing the size of the file, but it also changes the actual -contents of the file. If the file size is being reduced it should -release secondary storage associated with the previous contents of the -file. If the file is being extended, the new region added to the file -must be zero filled. Unlike the other RPC's in this section, -file_set_size should be permitted to any user who is allowed to write -the file. - -@node Program execution -@section Program execution - -Execution of programs on the Hurd is done through file servers with the -file_exec RPC. The fileserver is expected to verify that the user is -allowed to execute the file, make whatever modifications to the ports -are necessary for setuid execution, and then invoke the standard -execserver found on /servers/exec. - -This section specifically addresses what file servers are expected to -do, with minimal attention to the other parts of the process. - -The file must be opened for execution; if it is not, EBADF should be -returned In addition, at least one of the execute bits must be on. A -failure of this check should result in EACCES--not ENOEXEC. It is not -proper for the file server to ever respond ENOEXEC in response to the -file_exec RPC. +identically to that of some existing filesystem format. + +@findex file_utimes +The @code{file_utimes} RPC changes the @var{atime} and @var{mtime} of +the file. Making this call must cause the @var{ctime} to be updated as +well, even if no actual change to either the @var{mtime} or the +@var{atime} occurs. + +@findex file_set_size +The @code{file_set_size} RPC is special; not only does it change the +status word specifing the size of the file, but it also changes the +actual contents of the file. If the file size is being reduced it +should release secondary storage associated with the previous contents +of the file. If the file is being extended, the new region added to the +file must be zero-filled. Unlike the other RPCs in this section, +@code{file_set_size} should be permitted to any user who is allowed to +write the file. + + +@node Program Execution +@subsection Program Execution + +@findex file_exec +Execution of programs on the Hurd is done through fileservers with the +@code{file_exec} RPC. The fileserver is expected to verify that the +user is allowed to execute the file, make whatever modifications to the +ports are necessary for setuid execution, and then invoke the standard +execserver found on @file{/servers/exec}. + +This section specifically addresses what fileservers are expected to do, +with minimal attention to the other parts of the process. @xref{Running +Programs}, for more general information. + +The file must be opened for execution; if it is not, @code{EBADF} should +be returned In addition, at least one of the execute bits must be on. A +failure of this check should result in @code{EACCES}---not +@code{ENOEXEC}. It is not proper for the fileserver ever to respond to +the @code{file_exec} RPC with @code{ENOEXEC}. If either the setuid or setgid bits are set, the server needs to -construct a new authentication handle with the additional new id's. -Then all the ports passed to file_exec need to be reauthenticated with -the new handle. If the fileserver is unable to make the new +construct a new authentication handle with the additional new ID's. +Then all the ports passed to @code{file_exec} need to be reauthenticated +with the new handle. If the fileserver is unable to make the new authentication handle (for example, because it is not running as root) it is not acceptable to return an error; in such a case the server should simply silently fail to implement the setuid/setgid semantics. If the setuid/setgid transformation adds a new uid or gid to the user's authentication handle that was not previously present (as opposed to -merely reordering them) then the EXEC_SECURE and EXEC_NEWTASK flags -should both be added in the call to exec_exec. +merely reordering them) then the @code{EXEC_SECURE} and +@code{EXEC_NEWTASK} flags should both be added in the call to +@code{exec_exec}. The server then needs to open a new port onto the executed file which -will not share any filepointers with the port the user passed in, opened -with O_READ. Finally, all the information (mutated appropriately for -setuid/setgid) should be sent to the execserver with exec_exec. -Whatever error code exec_exec returns should returned to the caller of -file_exec. +will not share any file pointers with the port the user passed in, +opened with @code{O_READ}. Finally, all the information (mutated +appropriately for setuid/setgid) should be sent to the execserver with +@code{exec_exec}. Whatever error code @code{exec_exec} returns should +returned to the caller of @code{file_exec}. + +@node File Locking +@subsection File Locking -@node File locking -@section File locking +The @code{flock} call is in flux, as the current Hurd interface (as of +version @value{VERSION}) is not suitable for implementing the POSIX +record-locking semantics. -FIXME: Implementing the flock call +@findex file_lock +@findex file_lock_stat +You should ignore the @code{file_lock} and @code{file_lock_stat} calls +until the new record-locking interface is implemented. -@node File frobbing -@section File frobbing + +@node File Frobbing +@subsection File Frobbing FIXME: Other active calls on files -@node Opening files -@section Opening files +@code{file_sync} + +@code{file_getfh} + +@code{file_getlinknode} + +@code{file_check_access} + +These manipulate meta-information: + +@code{file_reparent} + +@code{file_statfs} + +@code{file_syncfs} + +@code{file_getcontrol} + +@code{file_get_storage_info} + +@code{file_get_fs_options} + + +@node Opening Files +@subsection Opening Files FIXME: Looking up files in directories -@node Modifying directories -@section Modifying directories +@code{dir_lookup} + +@code{dir_readdir} + +@node Modifying Directories +@subsection Modifying Directories FIXME: Creating and deleting nodes +@code{dir_mkfile} + +@code{dir_mkdir} + +@code{dir_rmdir} + +@code{dir_unlink} + +@code{dir_link} + +@code{dir_rename} + @node Notifications -@section Notifications +@subsection Notifications FIXME: File and directory change callbacks -@node Translators -@section Translators +File change notifications are not yet implemented, but directory +notifications are. + +@code{file_notice_changes} + +@code{dir_notice_changes} + +@node File Translators +@subsection File Translators FIXME: How to set and get translators -@node Filesystem interface -@chapter Filesystem interface +@code{file_set_translator} + +@code{file_get_translator} + +@code{file_get_translator_cntl} + + +@node Filesystem Interface +@section Filesystem Interface +@scindex fsys.defs + +The filesystem interface (described in @code{<hurd/fsys.defs>}) is +supported by translator control ports. + +FIXME: finish + + +@node Special Files +@chapter Special Files + +In Unix, any file that does not act as a general-purpose unit of storage +is called a @dfn{special file}. These are FIFOs, Unix-domain sockets, +and device nodes. In the Hurd, there is no need for the ``special +file'' distinction, since they are implemented by translators, just as +regular files are. + +Nevertheless, the Hurd maintains this distinction, in order to provide +backward-compatibility for Unix programs (which do not know about +translators). Studying the implementation of Hurd special files is a +good way to introduce the idea of translators to people who are familiar +with Unix. + +This chapter does not discuss @file{/dev/zero} or any of the +microkernel-based devices, since these are translated by the generalized +storeio server (FIXME xref). + +FIXME: finish + +@section fifo +@section ifsock +@section magic +@section null + + +FIXME: a chapter on libtreefs and libdirmgt will probably go here + + +@node Stores +@chapter Stores + +A @dfn{store} is a fixed-size block of storage, which can be read and +perhaps written to. A store is more general than a file: it refers to +any type of storage such as devices, files, memory, tasks, etc. Stores +can also be representations of other stores, which may be combined and +filtered in various ways. + +@menu +* Store Library:: An abstract interface to storage systems. +@end menu + +@section storeinfo, storecat, storeread +@section storeio + +FIXME: finish + +@node Store Library +@section Store Library +@scindex libstore +@scindex store.h + +The store library (which is declared in @code{<hurd/store.h>}) +implements many different backends which support the store abstraction. +Hurd programs use @code{libstore} so that new storage types can be +implemented with minimum impact. + +@menu +* Store Arguments:: Parsing store command-line arguments. +* Store Management:: Creating and manipulating stores. +* Store I/O:: Reading and writing data to stores. +* Store Classes:: Ready-to-use storage backends. +* Store RPC Encoding:: Transferring store descriptors via RPC. +@end menu + + +@node Store Arguments +@subsection Store Arguments + +FIXME: describe startup sequence + +@deftypevr {Structure} struct store_parsed +The result of parsing a store, which should be enough information to +open it, or return the arguments. +@end deftypevr + +@deftypefn {Structure} struct store_argp_params @{ @w{struct store_parsed *@var{result}}; @w{const char *@var{default_type}}; @w{const struct store_class *const *@var{classes}}; @} +This is the structure used to pass args back and forth from +@var{store_argp}. @var{result} is the resulting parsed result. If +@samp{--store-type} isn't specified, then @var{default_type} should be +used as the store type; zero is equivalent to @code{"query"}. +@var{classes} is set of classes used to validate store types and +argument syntax. +@end deftypefn + +@deftypevar {extern struct argp} store_argp +This is an argument parser that may be used for parsing a simple command +line specification for stores. The accompanying input parameter must be +a pointer to a @code{struct store_argp_params}. +@end deftypevar + +@deftypefun void store_parsed_free (@w{struct store_parsed *@var{parsed}}) +Free all resources used by @var{parsed}. +@end deftypefun + +@deftypefun error_t store_parsed_open (@w{const struct store_parsed *@var{parsed}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Open the store specified by @var{parsed}, and return it in @var{store}. +@end deftypefun + +@deftypefun error_t store_parsed_append_args (@w{const struct store_parsed *@var{parsed}}, @w{char **@var{argz}}, @w{size_t *@var{argz_len}}) +Add the arguments used to create @var{parsed} to @var{argz} and +@var{argz_len}. +@end deftypefun + +@deftypefun error_t store_parsed_name (@w{const struct store_parsed *@var{parsed}}, @w{char **@var{name}}) +Make an option string describing @var{parsed}, and return it in malloced +storage in @var{name}. +@end deftypefun + + +@node Store Management +@subsection Store Management + +The following functions provide basic management of stores: + +@deftypefun error_t store_create (@w{file_t @var{source}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Return a new store in @var{store}, which refers to the storage +underlying @var{source}. @var{classes} is used to select classes +specified by the provider; if zero, @var{store_std_classes} is used. +@var{flags} is set with @code{store_set_flags}, with the exception of +@code{STORE_INACTIVE}, which merely indicates that no attempt should be +made to activate an inactive store; if @code{STORE_INACTIVE} is not +specified, and the store returned for SOURCE is inactive, an attempt is +made to activate it (failure of which causes an error to be returned). +A reference to @var{source} is created (but may be destroyed with +@code{store_close_source}). + +It is usually better to use a specific store open or create function +such as @code{store_open} (@pxref{Store Classes}), since they are +tailored to the needs of a specific store. Generally, you should only +use @code{store_create} if you are defining your own store class, or you +need options that are not provided by a more specific store creation +function. +@end deftypefun + +@deftypefun void store_close_source (@w{struct store *@var{store}}) +If @var{store} was created using @code{store_create}, remove the +reference to the source from which it was created. +@end deftypefun + +@deftypefun void store_free (@w{struct store *@var{store}}) +Clean up and deallocate @var{store}'s underlying stores. +@end deftypefun + +@deftypefn {Structure} struct store_run @{ @w{off_t @var{start}}, @var{length}; @} +A @code{struct store_run} represents a contiguous region in a store's +address range. These are used to designate active portions of a store. +If @var{start} is -1, then the region is a @dfn{hole} (it is zero-filled +and doesn't correspond to any real addresses). +@end deftypefn + +@deftypefun error_t store_set_runs (@w{struct store *@var{store}}, @w{const struct store_run *@var{runs}}, @w{size_t @var{num_runs}}) +Set @var{store}'s current runs list to (a copy of) @var{runs} and +@var{num_runs}. +@end deftypefun + +@deftypefun error_t store_set_children (@w{struct store *@var{store}}, @w{struct store *const *@var{children}}, @w{size_t @var{num_children}}) +Set @var{store}'s current children to (a copy of) @var{children} and +@var{num_children} (note that just the vector @var{children} is copied, +not the actual children). +@end deftypefun + +@deftypefun error_t store_children_name (@w{const struct store *@var{store}}, @w{char **@var{name}}) +Try to come up with a name for the children in @var{store}, combining +the names of each child in a way that could be used to parse them with +@code{store_open_children}. This is done heuristically, and so may not +succeed. If a child doesn't have a name, @code{EINVAL} is returned. +@end deftypefun + +@deftypefun error_t store_set_name (@w{struct store *@var{store}}, @w{const char *@var{name}}) +Sets the name associated with @var{store} to a copy of @var{name}. +@end deftypefun + +@deftypefun error_t store_set_flags (@w{struct store *@var{store}}, @w{int @var{flags}}) +Add @var{flags} to @var{store}'s currently set flags. +@end deftypefun + +@deftypefun error_t store_clear_flags (@w{struct store *@var{store}}, @w{int @var{flags}}) +Remove @var{flags} from @var{store}'s currently set flags. +@end deftypefun + +@deftypefun error_t store_set_child_flags (@w{struct store *@var{store}}, @w{int @var{flags}}) +Set @var{flags} in all children of @var{store}, and if successful, add +@var{flags} to @var{store}'s flags. +@end deftypefun + +@deftypefun error_t store_clear_child_flags (@w{struct store *@var{store}}, @w{int @var{flags}}) +Clear @var{flags} in all children of @var{store}, and if successful, +remove @var{flags} from @var{store}'s flags. +@end deftypefun + +@deftypefun int store_is_securely_returnable (@w{struct store *@var{store}}, @w{int @var{open_flags}}) +Returns true if @var{store} can safely be returned to a user who has +accessed it via a node using @var{open_flags}, without compromising +security. +@end deftypefun + +@deftypefun error_t store_clone (@w{struct store *@var{from}}, @w{struct store **@var{to}}) +Return a copy of @var{from} in @var{to}. +@end deftypefun + +@deftypefun error_t store_remap (@w{struct store *@var{source}}, @w{const struct store_run *@var{runs}}, @w{size_t @var{num_runs}}, @w{struct store **@var{store}}) +Return a store in @var{store} that reflects the blocks in @var{runs} and +@var{runs_len} from source; @var{source} is consumed, but not +@var{runs}. Unlike the @code{store_remap_create} function, this may +simply modify @var{source} and return it. +@end deftypefun + + +@node Store I/O +@subsection Store I/O + +The following functions allow you to read and modify the contents of a +store: + +@deftypefun error_t store_map (@w{const struct store *@var{store}}, @w{vm_prot_t @var{prot}}, @w{mach_port_t *@var{memobj}}) +Return a memory object paging on @var{store}. +@ignore @c FIXME: update if/when there are more pager-related functions +If this call fails with @code{EOPNOTSUPP}, you can try calling some of +the routines below to get a pager. +@end ignore +@end deftypefun + +@deftypefun error_t store_read (@w{struct store *@var{store}}, @w{off_t @var{addr}}, @w{size_t @var{amount}}, @w{void **@var{buf}}, @w{size_t *@var{len}}) +Read @var{amount} bytes from @var{store} at @var{addr} into @var{buf} +and @var{len} (which follows the usual Mach buffer-return semantics) to +@var{store} at @var{addr}. @var{addr} is in @var{blocks} (as defined by +@code{@var{store}->block_size}). Note that @var{len} is in bytes. +@end deftypefun + +@deftypefun error_t store_write (@w{struct store *@var{store}}, @w{off_t @var{addr}}, @w{void *@var{buf}}, @w{size_t @var{len}}, @w{size_t *@var{amount}}) +Write @var{len} bytes from @var{buf} to @var{store} at @var{addr}. +Returns the amount written in @var{amount} (in bytes). @var{addr} is in +@var{blocks} (as defined by @code{@var{store}->block_size}). +@end deftypefun + + +@node Store Classes +@subsection Store Classes + +The store library comes with a number of standard store class +implementations: + +@deftypevar {extern const struct store_class *const} store_std_classes[] +This is a null-terminated vector of the standard store classes +implemented by @code{libstore}. +@end deftypevar + +If you are building your own class vectors, the following function may +be useful: + +@deftypevar error_t store_concat_class_vectors (@w{struct store_class **@var{cv1}}, @w{struct store_class **@var{cv2}}, @w{struct store_class ***@var{concat}}) +Concatenate the store class vectors in @var{cv1} and @var{cv2}, and +return a new (malloced) vector in @var{concat}. +@end deftypevar + +@subsubsection @code{query} store +@cindex @code{query} store + +@deftypevar {extern const struct store_class} store_query_class +This store is a virtual store which queries a filesystem node, and +delegates control to an appropriate store class. +@end deftypevar + +@deftypefun error_t store_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Open the file @var{name}, and return a new store in @var{store}, which +refers to the storage underlying it. @var{classes} is used to select +classes specified by the provider; if it is zero, then +@var{store_std_classes} is used. @var{flags} is set with +@code{store_set_flags}. A reference to the open file is created (but +may be destroyed with @code{store_close_source}). +@end deftypefun + +@subsubsection @code{typed_open} store +@cindex @code{typed_open} store + +@deftypevar {extern const struct store_class} store_typed_open_class +This store is special in that it doesn't correspond to any specific +store functions, rather it provides a way to interpret character strings +as specifications for other stores. +@end deftypevar + +@deftypefun error_t store_typed_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Open the store indicated by @var{name}, which should consist of a store +type name followed by a @samp{:} and any type-specific name, returning the +new store in @var{store}. @var{classes} is used to select classes +specified by the type name; if it is zero, @var{store_std_classes} is +used. +@end deftypefun + +@deftypefun error_t store_open_children (@w{const char *@var{name}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store ***@var{stores}}, @w{size_t *@var{num_stores}}) +Parse multiple store names in @var{name}, and open each individually, +returning all in the vector @var{stores}, and the number in +@var{num_stores}. The syntax of @var{name} is a single non-alphanumeric +separator character, followed by each child store name separated by the +same separator; each child name is @samp{@var{type}:@var{name}} notation +as parsed by @code{store_typed_open}. If every child uses the same +@samp{@var{type}:} prefix, then it may be factored out and put before +the child list instead (the two notations are differentiated by whether +or not the first character of @var{name} is alphanumeric). +@end deftypefun + +@subsubsection @code{device} store +@cindex @code{device} store + +@cindex @code{device drivers} +@deftypevar {extern const struct store_class} store_device_class +This store is a simple wrapper for a microkernel device +driver.@footnote{It is important to note that device drivers are not +provided by the Hurd, but by the underlying microkernel. Hurd `devices' +are just storeio-translated nodes which make the microkernel device +drivers obey Hurd semantics. If you wish to implement a new device +driver, you will need to consult the appropriate microkernel +documentation.} +@end deftypevar + +@deftypefun error_t store_device_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Open the device named @var{name}, and return the corresponding store in +@var{store}. +@end deftypefun + +@deftypefun error_t store_device_create (@w{device_t @var{device}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} referring to the microkernel device +@var{device}. Consumes the @var{device} send right. +@end deftypefun + +@subsubsection @code{file} store +@cindex @code{file} store + +@deftypevar {extern const struct store_class} store_file_class +This store reads and writes the contents of a Hurd file. +@end deftypevar + +@deftypefun error_t store_file_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Open the file @var{name}, and return the corresponding store in @var{store}. +@end deftypefun + +@deftypefun error_t store_file_create (@w{file_t @var{file}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} referring to the file @var{file}. +Unlike @code{store_create}, this will always use file I/O, even it would +be possible to be more direct. This may work in more cases, for instance +if the file has holes. Consumes the @var{file} send right. +@end deftypefun + +@subsubsection @code{task} store +@cindex @code{task} store + +@deftypevar {extern const struct store_class} store_task_class +This store provides access to the contents of a microkernel task. +@end deftypevar + +@deftypevar error_t store_task_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Open the task @var{name} (@var{name} should be the task's pid), and +return the corresponding store in @var{store}. +@end deftypevar + +@deftypevar {error_t} store_task_create (@w{task_t @var{task}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} referring to the task @var{task}, +consuming the @var{task} send right. +@end deftypevar + +@subsubsection @code{zero} store +@cindex @code{zero} store + +@deftypevar {extern const struct store_class} store_zero_class +Reads to this store always return zero-filled buffers, no matter what +has been written into it. This store corresponds to the Unix +@file{/dev/zero} device node. +@end deftypevar + +@deftypefun error_t store_zero_create (@w{off_t @var{size}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new zero store @var{size} bytes long in @var{store}. +@end deftypefun + +@subsubsection @code{copy} store +@cindex @code{copy} store + +@deftypevar {extern const struct store_class} store_copy_class +This store provides a temporary copy of another store. This is useful +if you want to provide writable data, but do not wish to modify the +underlying store. All changes to a copy store are lost when it is +closed. +@end deftypevar + +@deftypefun error_t store_copy_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Open the copy store @var{name} (which consists of another store class +name, a @samp{:}, and a name for the store class to open) and return the +corresponding store in @var{store}. @var{classes} is used to select +classes specified by the type name; if it is zero, +@var{store_std_classes} is used. +@end deftypefun + +@deftypefun error_t store_copy_create (@w{struct store *@var{from}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} which contains a snapshot of the +contents of the store @var{from}; @var{from} is consumed. +@end deftypefun + +@deftypefun error_t store_buffer_create (@w{void *@var{buf}}, @w{size_t @var{buf_len}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} which contains the memory buffer +@var{buf}, of length @var{buf_len}. @var{buf} must be allocated with +@code{vm_allocate}, and will be consumed. +@end deftypefun + +@subsubsection @code{gunzip} store +@cindex @code{gunzip} store + +@deftypevar {extern const struct store_class} store_gunzip_class +This store provides transparent GNU zip decompression of a substore. +Unfortunately, this store is currently read-only. +@end deftypevar + +@deftypevar error_t store_gunzip_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Open the gunzip store @var{name} (which consists of another store class +name, a @samp{:}, and a name for that store class to open), and return +the corresponding store in @var{store}. @var{classes} is used to select +classes specified by the type name; if it is zero, +@var{store_std_classes} is used. +@end deftypevar + +@deftypevar error_t store_gunzip_create (@w{struct store *@var{from}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} which contains a snapshot of the +uncompressed contents of the store @var{from}; @var{from} is consumed. +@var{block_size} is the desired block size of the result. +@end deftypevar + +@subsubsection @code{concat} store +@cindex @code{concat} store + +@cindex linear concatenation +@cindex appending disks +@cindex disks, appending +@cindex disk concatenation +@cindex concatenation, disk +@deftypevar {extern const struct store_class} store_concat_class +This class provides a linear concatenation storage mode. It creates a +new virtual store which consists of several different substores appended +to one another. + +This mode is designed to increase storage capacity, so that when one +substore is filled, new data is transparently written to the next +substore. Concatenation requires robust hardware, since a failure in +any single substore will wipe out a large section of the data. +@end deftypevar + +@deftypefun error_t store_concat_open (@w{const char *@var{name}}, @w{int @var{flags}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Return a new store that concatenates the stores created by opening all +the individual stores described in @var{name}; for the syntax of +@var{name}, see @code{store_open_children}. +@end deftypefun + +@deftypefun error_t store_concat_create (@w{struct store * const *@var{stores}}, @w{size_t @var{num_stores}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} that concatenates all the stores in +@var{stores} (@var{num_stores} of them). The stores in @var{stores} are +consumed; that is, they will be freed when this store is freed. The +@var{stores} @emph{array}, however, is copied, and so should be freed by +the caller. +@end deftypefun + +@subsubsection @code{ileave} store +@cindex @code{ileave} store + +@cindex RAID-0 +@cindex striping, disk +@cindex disk striping +@cindex interleaving disks +@cindex disks, interleaving +@deftypevar {extern const struct store_class} store_ileave_class +This class provides a RAID-0@footnote{RAID is a @dfn{Redundant Array of +Independent Disks}, which refers to the idea of using several disks in +parallel in order to achieve increased capacity, redundancy and/or +performance.} storage mode (also called @dfn{disk striping}). It +creates a new virtual store by interleaving the contents of several +different substores. + +This RAID mode is designed to increase storage performance, since I/O +will probably occur in parallel if the substores reside on different +physical devices. Interleaving works best with evenly-yoked +substores@dots{} if the stores are different sizes, some space will be +not be used at the end of the larger stores; if the stores are different +speeds, then I/O will have to wait for the slowest store; if some stores +are not as reliable as others, failures will wipe out every @var{n}th +storage block, where @var{n} is the number of substores. +@end deftypevar + +@deftypefun error_t store_ileave_create (@w{struct store * const *@var{stripes}}, @w{size_t @w{num_stripes}}, @w{off_t @var{interleave}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} that interleaves all the stores in +@var{stripes} (@var{num_stripes} of them) every @var{interleave} bytes; +@var{interleave} must be an integer multiple of each stripe's block +size. The stores in @var{stripes} are consumed; that is, they will be +freed when this store is freed. The @var{stripes} @emph{array}, +however, is copied, and so should be freed by the caller. +@end deftypefun + +@subsubsection @code{mvol} store +@cindex @code{mvol} store + +@deftypevar {extern const struct store_class} store_mvol_class +This store provides access to multiple volumes using a single-volume +device. One use of this store would be to provide a store which +consists of multiple floppy disks when there is only a single disk +drive. It works by remapping a single linear address range to multiple +address ranges, and keeping track of the currently active range. +Whenever a request maps to a range that is not active, a callback is +made in order to switch to the new range. + +This class is not included in @var{store_std_classes}, because it +requires an application-specific callback. +@end deftypevar + +@deftypefun error_t store_mvol_create (@w{struct store *@var{phys}}, error_t (*@var{swap_vols}) (@w{struct store *@var{store}}, @w{size_t @var{new_vol}}, @w{ssize_t @var{old_vol}}), @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} that multiplexes multiple physical +volumes from @var{phys} as one larger virtual volume. @var{swap_vols} +is a function that will be called whenever reads or writes refer to a +block which is not on addressable on the currently active volume. +@var{phys} is consumed. +@end deftypefun + +@subsubsection @code{remap} store +@pindex @code{remap} store + +@deftypevar {extern const struct store_class} store_remap_class +This store translates I/O requests into different addresses on a +different store. +@end deftypevar + +@deftypefun error_t store_remap_create (@w{struct store *@var{source}}, @w{const struct store_run *@var{runs}}, @w{size_t @var{num_runs}}, @w{int @var{flags}}, @w{struct store **@var{store}}) +Return a new store in @var{store} that reflects the blocks in @var{runs} +and @var{runs_len} from @var{source}; @var{source} is consumed, but +@var{runs} is not. Unlike the @code{store_remap} function, this +function always operates by creating a new store of type @samp{remap} +which has @var{source} as a child, and so may be less efficient than +@w{store_remap} for some types of stores. +@end deftypefun + + +@node Store RPC Encoding +@subsection Store RPC Encoding + +The store library also provides some functions which help transfer +stores between tasks via RPC: + +@deftypevr {Structure} struct store_enc +This structure is used to hold the various bits that make up the +representation of a store for transmission via RPC. See +@code{<hurd/hurd_types.h>} for an explanation of the encodings for the +various storage types. +@end deftypevr + +@deftypefun void store_enc_init (@w{struct store_enc *@var{enc}}, @w{mach_port_t *@var{ports}}, @w{mach_msg_type_number_t @var{num_ports}}, @w{int *@var{ints}}, @w{mach_msg_type_number_t @var{num_ints}}, @w{off_t *@var{offsets}}, @w{mach_msg_type_number_t @var{num_offsets}}, @w{char *@var{data}}, @w{mach_msg_type_number_t @var{data_len}}) +Initialize @var{enc}. The given vector and sizes will be used for the +encoding if they are big enough (otherwise new ones will be +automatically allocated). +@end deftypefun + +@deftypefun void store_enc_dealloc (@w{struct store_enc *@var{enc}}) +Deallocate storage used by the fields in @var{enc} (but nothing is done +with @var{enc} itself). +@end deftypefun + +@deftypefun void store_enc_return (@w{struct store_enc *@var{enc}}, @w{mach_port_t **@var{ports}}, @w{mach_msg_type_number_t *@var{num_ports}}, @w{int **@var{ints}}, @w{mach_msg_type_number_t *@var{num_ints}}, @w{off_t **@var{offsets}}, @w{mach_msg_type_number_t *@var{num_offsets}}, @w{char **@var{data}}, @w{mach_msg_type_number_t *@var{data_len}}) +Copy out the parameters from @var{enc} into the given variables suitably +for returning from a @code{file_get_storage_info} RPC, and deallocate +@var{enc}. +@end deftypefun + +@deftypefun error_t store_return (@w{const struct store *@var{store}}, @w{mach_port_t **@var{ports}}, @w{mach_msg_type_number_t *@var{num_ports}}, @w{int **@var{ints}}, @w{mach_msg_type_number_t *@var{num_ints}}, @w{off_t **@var{offsets}}, @w{mach_msg_type_number_t *@var{num_offsets}}, @w{char **@var{data}}, @w{mach_msg_type_number_t *@var{data_len}}) +Encode @var{store} into the given return variables, suitably for +returning from a @code{file_get_storage_info} RPC. +@end deftypefun + +@deftypefun error_t store_encode (@w{const struct store *@var{store}}, @w{struct store_enc *@var{enc}}) +Encode @var{store} into @var{enc}, which should have been prepared with +@code{store_enc_init}, or return an error. The contents of @var{enc} +may then be returned as the value of @code{file_get_storage_info}; if +for some reason this can't be done, @code{store_enc_dealloc} may be used +to deallocate the mmemory used by the unsent vectors. +@end deftypefun + +@deftypefun error_t store_decode (@w{struct store_enc *@var{enc}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{store}}) +Decode @var{enc}, either returning a new store in @var{store}, or an +error. @var{classes} the mapping from Hurd storage class ids to store +classes; if it is zero, @var{store_std_classes} is used. If nothing +else is to be done with @var{enc}, its contents may then be freed using +@code{store_enc_dealloc}. +@end deftypefun + +@deftypefun error_t store_allocate_child_encodings (@w{const struct store *@var{store}}, @w{struct store_enc *@var{enc}}) +Calls the @code{allocate_encoding} method in each child store of +@var{store}, propagating any errors. If any child does not have such a +method, @code{EOPNOTSUPP} is returned. +@end deftypefun + +@deftypefun error_t store_encode_children (@w{const struct store *@var{store}}, @w{struct store_enc *@var{enc}}) +Calls the encode method in each child store of @var{store}, propagating +any errors. If any child does not hae such a method, @code{EOPNOTSUPP} +is returned. +@end deftypefun + +@deftypefun error_t store_decode_children (@w{struct store_enc *@var{enc}}, @w{int @var{num_children}}, @w{const struct store_class *const *@var{classes}}, @w{struct store **@var{children}}) +Decodes @var{num_children} from @var{enc}, storing the results into +successive positions in @var{children}. +@end deftypefun + +@deftypefun error_t store_with_decoded_runs (@w{struct store_enc *@var{enc}}, @w{size_t @var{num_runs}}, error_t (*@var{fun}) (@w{const struct store_run *@var{runs}}, @w{size_t @var{num_runs}})) +Call @var{fun} with the vector @var{runs} of length @var{num_runs} +extracted from @var{enc}. +@end deftypefun + +@deftypefun error_t store_std_leaf_allocate_encoding (@w{const struct store *@var{store}}, @w{struct store_enc *@var{enc}}) +@deftypefunx error_t store_std_leaf_encode (@w{const struct store *@var{store}}, @w{struct store_enc *@var{enc}}) +Standard encoding used for most data-providing (as opposed to filtering) +store classes. +@end deftypefun + +@deftypefn {Typedef} {typedef error_t (*} store_std_leaf_create_t )(@w{mach_port_t @var{port}}, @w{int @var{flags}}, @w{size_t @var{block_size}}, @w{const struct store_run *@var{runs}}, @w{size_t @var{num_runs}}, @w{struct store **@var{store}}) +Creation function used by @code{store_std_leaf_decode}. +@end deftypefn + +@deftypefun error_t store_std_leaf_decode (@w{struct store_enc *@var{enc}}, @w{store_std_leaf_create_t @var{create}}, @w{struct store **@var{store}}) +Decodes the standard leaf encoding which is common to various builtin +formats, and calls @var{create} to actually create the store. +@end deftypefun + + +@node Stored Filesystems +@chapter Stored Filesystems +@cindex disk-based filesystems +@cindex filesystems, disk-based + +Stored filesystems allow users to save and load persistent data from any +random-access storage media, such as hard disks, floppy diskettes, and +CD-ROMs. Stored filesystems are required for bootstrapping standalone +workstations, as well. + +@menu +* Repairing Filesystems:: Recovering from minor filesystem crashes. +* Linux Extended 2 FS:: The popular Linux filesystem format. +* BSD Unix FS:: The BSD Unix 4.x Fast File System. +* ISO-9660 CD-ROM FS:: Standard CD-ROM format. +* Diskfs Library:: Implementing new filesystem servers. +@end menu + + +@node Repairing Filesystems +@section Repairing Filesystems +@pindex fsck + +FIXME: finish + -FIXME: interfaces supported to control file-servers +@node Linux Extended 2 FS +@section Linux Extended 2 FS +@pindex ext2fs -@node Socket interface -@chapter Socket interface +FIXME: finish -FIXME: Interfaces used for manipulating sockets -@node Ports library -@chapter Ports library +@node BSD Unix FS +@section BSD Unix FS +@scindex ufs -FIXME: A library to manage port rights for servers +FIXME: finish -@node Iohelp library -@chapter Iohelp library -FIXME: A library to implement some common parts of the I/O and shared -I/O interfaces +@node ISO-9660 CD-ROM FS +@section ISO-9660 CD-ROM FS +@pindex isofs -@node Fshelp library -@chapter Fshelp library +FIXME: finish -FIXME: A library to implement some common parts of the file interface -@node Pager library -@chapter Pager library +@node Diskfs Library +@section Diskfs Library +@scindex libdiskfs +@scindex diskfs.h + +The diskfs library is declared in @code{<hurd/diskfs.h>}, and does a lot +of the work of implementing stored filesystems. @code{libdiskfs} +requires the threads, ports, iohelp, fshelp, and store libraries. You +should understand all these libraries before you attempt to use diskfs, +and you should also be familiar with the pager library (@pxref{Pager +Library}). + +@scindex libstorefs +For historical reasons, the library for implementing stored filesystems +is called @code{libdiskfs} instead of @code{libstorefs}. Keep in mind, +however, that diskfs is useful for filesystems which are implemented on +any block-addressed storage device, since it uses the store library to +do I/O. + +Note that stored filesystems can be tricky to implement, since the +diskfs callback interfaces are not trivial. It really is best if you +examine the source code of a similar existing filesystem server, and +follow its example rather than trying to write your own from scratch. + +@menu +* Diskfs Startup:: Initializing stored filesystems. +* Diskfs Arguments:: Parsing command-line arguments. +* Diskfs Globals:: Global behaviour modification. +* Diskfs Node Management:: Allocation, reference counting, I/O, + caching, and other disk node routines. +* Diskfs Callbacks:: Mandatory user-defined diskfs functions. +* Diskfs Options:: Optional user-defined diskfs functions. +* Diskfs Internals:: Reimplementing small pieces of diskfs. +@end menu + + +@node Diskfs Startup +@subsection Diskfs Startup + +This subsection gives an outline of the general steps involved in +implementing a filesystem server, to help refresh your memory and to +offer explanations rather than to serve as a tutorial. + +The first thing a filesystem server should do is parse its command-line +arguments (@pxref{Diskfs Arguments}). Then, the standard output and +error streams should be redirected to the console, so that error +messages are not lost if this is the bootstrap filesystem: + +@deftypefun void diskfs_console_stdio (void) +Redirect error messages to the console, so that they can be seen by +users. +@end deftypefun + +The following is a list of the relevant functions which would be called +during the rest of the server initialization. Again, you should refer +to the implementation of an already-working filesystem if you have any +questions about how these functions should be used: + +@deftypefun error_t diskfs_init_diskfs (void) +Call this function after arguments have been parsed to initialize the +library. You must call this before calling any other diskfs functions, +and after parsing diskfs options. +@end deftypefun + +@deftypefun void diskfs_spawn_first_thread (void) +Call this after all format-specific initialization is done (except for +setting @code{diskfs_root_node}); at this point the pagers should be +ready to go. +@end deftypefun + +@deftypefun mach_port_t diskfs_startup_diskfs (@w{mach_port_t @var{bootstrap}}, @w{int @var{flags}}) +Call this once the filesystem is fully initialized, to advertise the new +filesystem control port to our parent filesystem. If @var{bootstrap} is set, +diskfs will call @code{fsys_startup} on that port as appropriate and return +the @var{realnode} from that call; otherwise we call +@code{diskfs_start_bootstrap} and return @code{MACH_PORT_NULL}. +@var{flags} specifies how to open @var{realnode} (from the O_* set). +@end deftypefun + +You should not need to call the following function directly, since +@code{diskfs_startup_diskfs} will do it for you, when appropriate: + +@deftypefun void diskfs_start_bootstrap (void) +Start the Hurd bootstrap sequence as if we are the bootstrap filesystem +(that is, @code{diskfs_boot_flags} is nonzero). All filesystem +initialization must be complete before you call this function. +@end deftypefun + + +@node Diskfs Arguments +@subsection Diskfs Arguments + +The following functions implement standard diskfs command-line and +runtime argument parsing, using argp (@pxref{Argp, , , libc, The GNU C +Library Reference Manual}): + +@deftypefun error_t diskfs_set_options (@w{char *@var{argz}}, @w{size_t @var{argz_len}}) +Parse and execute the runtime options specified by @var{argz} and +@var{argz_len}. @code{EINVAL} is returned if some option is +unrecognized. The default definition of this routine will parse them +using @code{diskfs_runtime_argp}. +@end deftypefun + +@deftypefun error_t diskfs_append_args (@w{char **@var{argz}}, @w{unsigned *@var{argz_len}}) +Append to the malloced string @code{*@var{argz}} of length +@code{*@var{argz_len}} a NUL-separated list of the arguments to this +translator. The default definition of this routine simply calls +@code{diskfs_append_std_options}. +@end deftypefun + +@deftypefun error_t diskfs_append_std_options (@w{char **@var{argz}}, @w{unsigned *@var{argz_len}}) +@emph{Appends} NUL-separated options describing the standard diskfs +option state to @var{argz} and increments @var{argz_len} appropriately. +Note that unlike @code{diskfs_get_options}, @var{argz} and +@var{argz_len} must already have sane values. +@end deftypefun + +@deftypevar {struct argp *} diskfs_runtime_argp +If this is defined or set to an argp structure, it will be used by the +default @code{diskfs_set_options} to handle runtime option parsing. The +default definition is initialized to a pointer to +@code{diskfs_std_runtime_argp}. +@end deftypevar + +@deftypevar {const struct argp} diskfs_std_runtime_argp +An argp for the standard diskfs runtime options. The default definition +of @code{diskfs_runtime_argp} points to this, although the user can +redefine that to chain this onto his own argp. +@end deftypevar + +@deftypevar {const struct argp} diskfs_startup_argp +An argp structure for the standard diskfs command line arguments. The +user may call @code{argp_parse} on this to parse the command line, chain +it onto the end of his own argp structure, or ignore it completely. +@end deftypevar + +@deftypevar {const struct argp} diskfs_store_startup_argp +An argp structure for the standard diskfs command line arguments plus a +store specification. The address of a location in which to return the +resulting @code{struct store_parsed} structure should be passed as the +input argument to @code{argp_parse}; FIXME xref the declaration for +STORE_ARGP. +@end deftypevar + + +@node Diskfs Globals +@subsection Diskfs Globals + +The following functions and variables control the overall behaviour of +the library. Your callback functions may need to refer to these, but +you should not need to modify or redefine them. + +@deftypevar mach_port_t diskfs_default_pager +@deftypevarx mach_port_t diskfs_exec_ctl +@deftypevarx mach_port_t diskfs_exec +@deftypevarx auth_t diskfs_auth_server_port +These are the respective send rights to the default pager, execserver +control port, execserver itself, and authserver. +@end deftypevar + +@deftypevar mach_port_t diskfs_fsys_identity +The @code{io_identity} identity port for the filesystem. +@end deftypevar + +@deftypevar {char **} diskfs_argv +The command line diskfs was started, set by the default argument parser. +If you don't use it, set this yourself. This is only used for bootstrap +file systems, to give the procserver. +@end deftypevar + +@deftypevar {char *} diskfs_boot_flags +When this is a bootstrap filesystem, the command line options passed from +the kernel. If not a bootstrap filesystem, it is zero, so it can be used to +distinguish between the two cases. +@end deftypevar + +@deftypevar {struct rwlock} diskfs_fsys_lock +Hold this lock while doing filesystem-level operations. Innocuous users +can just hold a reader lock, but operations that might corrupt other +threads should hold a writer lock. +@end deftypevar + +@deftypevar {volatile struct mapped_time_value *} diskfs_mtime +The current system time, as used by the diskfs routines. This is +converted into a @code{struct timeval} by the @code{maptime_read} +C library function (FIXME xref). +@end deftypevar + +@deftypevar int diskfs_synchronous +True if and only if we should do every operation synchronously. It +is the format-specific code's responsibility to keep allocation +information permanently in sync if this is set; the rest will +be done by format-independent code. +@end deftypevar + +@deftypefun error_t diskfs_set_sync_interval (@w{int @var{interval}}) +Establish a thread to sync the filesystem every @var{interval} seconds, +or never, if @var{interval} is zero. If an error occurs creating the +thread, it is returned, otherwise zero. Subsequent calls will create a +new thread and (eventually) get rid of the old one; the old thread won't +do any more syncs, regardless. +@end deftypefun + +@deftypevar spin_lock_t diskfs_node_refcnt_lock +Pager reference count lock. +@end deftypevar + +@deftypevar int diskfs_readonly +Set to zero if the filesystem is currently writable. +@end deftypevar + +@deftypefun error_t diskfs_set_readonly (@w{int @var{readonly}}) +Change an active filesystem between read-only and writable modes, +setting the global variable @var{diskfs_readonly} to reflect the current +mode. If an error is returned, nothing will have changed. +@var{diskfs_fsys_lock} should be held while calling this routine. +@end deftypefun + +@deftypefun int diskfs_check_readonly (void) +Check if the filesystem is readonly before an operation that writes it. +Return nonzero if readonly, otherwise zero. +@end deftypefun + +@deftypefun error_t diskfs_remount (void) +Reread all in-core data structures from disk. This function can only be +successful if @var{diskfs_readonly} is true. @var{diskfs_fsys_lock} +should be held while calling this routine. +@end deftypefun + +@deftypefun error_t diskfs_shutdown (@w{int @var{flags}}) +Shutdown the filesystem; @var{flags} are as for @code{fsys_shutdown}. +@end deftypefun + + +@node Diskfs Node Management +@subsection Diskfs Node Management + +Every file or directory is a diskfs @dfn{node}. The following functions +help your diskfs callbacks manage nodes and their references: + +@deftypefun void diskfs_drop_node (@w{struct node *@var{np}}) +Node @var{np} now has no more references; clean all state. The +@var{diskfs_node_refcnt_lock} must be held, and will be released upon +return. @var{np} must be locked. +@end deftypefun + +@deftypefun void diskfs_node_update (@w{struct node *@var{np}}, @w{int @var{wait}}) +Set disk fields from @code{@var{np}->dn_stat}; update ctime, atime, and mtime +if necessary. If @var{wait} is true, then return only after the +physical media has been completely updated. +@end deftypefun + +@deftypefun void diskfs_nref (@w{struct node *@var{np}}) +Add a hard reference to node @var{np}. If there were no hard references +previously, then the node cannot be locked (because you must hold a hard +reference to hold the lock). +@end deftypefun + +@deftypefun void diskfs_nput (@w{struct node *@var{np}}) +Unlock node @var{np} and release a hard reference; if this is the last +hard reference and there are no links to the file then request light +references to be dropped. +@end deftypefun + +@deftypefun void diskfs_nrele (@w{struct node *@var{np}}) +Release a hard reference on @var{np}. If @var{np} is locked by anyone, +then this cannot be the last hard reference (because you must hold a +hard reference in order to hold the lock). If this is the last hard +reference and there are no links, then request light references to be +dropped. +@end deftypefun + +@deftypefun void diskfs_nref_light (@w{struct node *@var{np}}) +Add a light reference to a node. +@end deftypefun + +@deftypefun void diskfs_nput_light (@w{struct node *@var{np}}) +Unlock node @var{np} and release a light reference. +@end deftypefun + +@deftypefun void diskfs_nrele_light (@w{struct node *@var{np}}) +Release a light reference on @var{np}. If @var{np} is locked by anyone, +then this cannot be the last reference (because you must hold a hard +reference in order to hold the lock). +@end deftypefun + +@deftypefun error_t diskfs_node_rdwr (@w{struct node *@var{np}}, @w{char *@var{data}}, @w{off_t @var{off}}, @w{size_t @var{amt}}, @w{int @var{direction}}, @w{struct protid *@var{cred}}, @w{size_t *@var{amtread}}) +This is called by other filesystem routines to read or write files, and +extends them automatically, if necessary. @var{np} is the node to be +read or written, and must be locked. @var{data} will be written or +filled. @var{off} identifies where in the file the I/O is to take place +(negative values are not allowed). @var{amt} is the size of @var{data} +and tells how much to copy. @var{dir} is zero for reading or nonzero +for writing. @var{cred} is the user doing the access (only used to +validate attempted file extension). For reads, @code{*@var{amtread}} is +filled with the amount actually read. +@end deftypefun + +@deftypefun void diskfs_notice_dirchange (@w{struct node *@var{dp}}, @w{enum dir_changed_type @var{type}}, @w{char *@var{name}}) +Send notifications to users who have requested them for directory +@var{dp} with @code{dir_notice_changes}. The type of modification and +affected name are @var{type} and @var{name} respectively. This should +be called by @code{diskfs_direnter}, @code{diskfs_dirremove}, +@code{diskfs_dirrewrite}, and anything else that changes the directory, +after the change is fully completed. +@end deftypefun + +@deftypefun {struct node *} diskfs_make_node (@w{struct disknode *@var{dn}}) +Create a new node structure with @var{ds} as its physical disknode. The +new node will have one hard reference and no light references. +@end deftypefun + +These next node manipulation functions are not generally useful, but may +come in handy if you need to redefine any diskfs functions. + +@deftypefun error_t diskfs_create_node (@w{struct node *@var{dir}}, @w{char *@var{name}}, @w{mode_t @var{mode}}, @w{struct node **@var{newnode}}, @w{struct protid *@var{cred}}, @w{struct dirstat *@var{ds}}) +Create a new node. Give it @var{mode}: if @var{mode} includes +@code{IFDIR}, also initialize @file{.} and @file{..} in the new +directory. Return the node in @var{npp}. @var{cred} identifies the +user responsible for the call. If @var{name} is nonzero, then link the +new node into @var{dir} with name @var{name}; @var{ds} is the result of +a prior @code{diskfs_lookup} for creation (and @var{dir} has been held +locked since). @var{dir} must always be provided as at least a hint for +disk allocation strategies. +@end deftypefun + +@deftypefun void diskfs_set_node_times (@w{struct node *@var{np}}) +If @code{@var{np}->dn_set_ctime} is set, then modify +@code{@var{np}->dn_stat.st_ctime} appropriately; do the analogous +operations for atime and mtime as well. +@end deftypefun + +@deftypefun {struct node *} diskfs_check_lookup_cache (@w{struct node *@var{dir}}, @w{char *@var{name}}) +Scan the cache looking for @var{name} inside @var{dir}. If we don't +know any entries at all, then return zero. If the entry is confirmed to +not exist, then return -1. Otherwise, return @var{np} for the entry, +with a newly-allocated reference. +@end deftypefun + +@deftypefun error_t diskfs_cached_lookup (@w{int @var{cache_id}}, @w{struct node **@var{npp}}) +Return the node corresponding to @var{cache_id} in @code{*@var{npp}}. +@end deftypefun + +@deftypefun void diskfs_enter_lookup_cache (@w{struct node *@var{dir}}, @w{struct node *@var{np}}, @w{char *@var{name}}) +Node @var{np} has just been found in @var{dir} with @var{name}. If +@var{np} is null, that means that this name has been confirmed as absent +in the directory. +@end deftypefun + +@deftypefun void diskfs_purge_lookup_cache (@w{struct node *@var{dp}}, @w{struct node *@var{np}}) +Purge all references in the cache to @var{np} as a node inside directory +@var{dp}. +@end deftypefun + + +@node Diskfs Callbacks +@subsection Diskfs Callbacks + +Like several other Hurd libraries, @code{libdiskfs} depends on you to +implement application-specific callback functions. You @emph{must} +define the following functions and variables, but you should also look +at @ref{Diskfs Options}, as there are several defaults which should be +modified to provide good filesystem support: + +@deftypevr {Structure} struct dirstat +You must define this type, which will hold information between a call to +@code{diskfs_lookup} and a call to one of @code{diskfs_direnter}, +@code{diskfs_dirremove}, or @code{diskfs_dirrewrite}. It must contain +enough information so that those calls work as described below. +@end deftypevr + +@deftypevar size_t diskfs_dirstat_size +This must be the size in bytes of a @code{struct dirstat}. +@end deftypevar + +@deftypevar int diskfs_link_max +This is the maximum number of links to any one file, which must be a +positive integer. The implementation of @code{dir_rename} does not know +how to succeed if this is only one allowed link; on such formats you +need to reimplement @code{dir_rename} yourself. +@end deftypevar + +@deftypevar int diskfs_maxsymlinks +This variable is a positive integer which is the maximum number of +symbolic links which can be traversed within a single call to +@code{dir_pathtrans}. If this is exceeded, @code{dir_pathtrans} will +return @code{ELOOP}. +@end deftypevar + +@deftypevar {struct node *} diskfs_root_node +Set this to be the node of the root of the filesystem. +@end deftypevar + +@deftypevar {char *} diskfs_server_name +Set this to the name of the filesystem server. +@end deftypevar + +@deftypevar {char *} diskfs_server_version +Set this to be the server version string. +@end deftypevar + +@deftypevar {char *} diskfs_disk_name +This should be a string that somehow identifies the particular disk this +filesystem is interpreting. It is generally only used to print messages +or to distinguish instances of the same filesystem type from one +another. If this filesystem accesses no external media, then define +this to be zero. +@end deftypevar + +@deftypefun error_t diskfs_set_statfs (@w{fsys_statfsbuf_t *@var{statfsbuf}}) +Set @code{*@var{statfsbuf}} with appropriate values to reflect the +current state of the filesystem. +@end deftypefun + +@deftypefun error_t diskfs_lookup (@w{struct node *@var{dp}}, @w{char *@var{name}}, @w{enum lookup_type @var{type}}, @w{struct node **@var{np}}, @w{struct dirstat *@var{ds}}, @w{struct protid *@var{cred}}) +@deftypefunx error_t diskfs_lookup_hard (@w{struct node *@var{dp}}, @w{char *@var{name}}, @w{enum lookup_type @var{type}}, @w{struct node **@var{np}}, @w{struct dirstat *@var{ds}}, @w{struct protid *@var{cred}}) +You should not define @code{diskfs_lookup}, because it is simply a +wrapper for @code{diskfs_lookup_hard}, and is already defined in +@code{libdiskfs}. + +Lookup in directory @var{dp} (which is locked) the name @var{name}. +@var{type} will either be @code{LOOKUP}, @code{CREATE}, @code{RENAME}, +or @code{REMOVE}. @var{cred} identifies the user making the call. + +If the name is found, return zero, and (if @var{np} is nonzero) set +@code{*@var{np}} to point to the node for it, which should be locked. +If the name is not found, return @code{ENOENT}, and (if @var{np} is +nonzero) set @code{*@var{np}} to zero. If @var{np} is zero, then the +node found must not be locked, not even transitorily. Lookups for +@code{REMOVE} and @code{RENAME} (which must often check permissions on +the node being found) will always set @var{np}. + +If @var{ds} is nonzero then the behaviour varies depending on the +requested lookup @var{type}: + +@table @code +@item LOOKUP +Set @code{*@var{ds}} to be ignored by @code{diskfs_drop_dirstat} + +@item CREATE +On success, set @code{*@var{ds}} to be ignored by +@code{diskfs_drop_dirstat}. @* +On failure, set @code{*@var{ds}} for a future call to +@code{diskfs_direnter}. + +@item RENAME +On success, set @code{*@var{ds}} for a future call to +@code{diskfs_dirrewrite}. @* +On failure, set @code{*@var{ds}} for a future call to +@code{diskfs_direnter}. + +@item REMOVE +On success, set @code{*@var{ds}} for a future call to +@code{diskfs_dirremove}. @* +On failure, set @code{*@var{ds}} to be ignored by +@code{diskfs_drop_dirstat}. +@end table + +The caller of this function guarantees that if @var{ds} is nonzero, then +either the appropriate call listed above or @code{diskfs_drop_dirstat} +will be called with @var{ds} before the directory @var{dp} is unlocked, +and guarantees that no lookup calls will be made on this directory +between this lookup and the use (or destruction) of *DS. + +If you use the library's versions of @code{diskfs_rename_dir}, +@code{diskfs_clear_directory}, and @code{diskfs_init_dir}, then lookups +for @file{..} might have the flag @code{SPEC_DOTDOT} ORed in. This has a +special meaning depending on the requested lookup @var{type}: + +@table @code +@item LOOKUP +@var{dp} should be unlocked and its reference dropped before returning. + +@item CREATE +Ignore this case, because @code{SPEC_DOTDOT} is guaranteed not to be +given. + +@item RENAME +@itemx REMOVE +In both of these cases, the node being found (@code{*@var{np}}) is +already held locked, so don't lock it or add a reference to it. +@end table + +Return @code{ENOENT} if @var{name} isn't in the directory. Return +@code{EAGAIN} if @var{name} refers to the @file{..} of this filesystem's +root. Return @code{EIO} if appropriate. +@end deftypefun + +@deftypefun error_t diskfs_direnter (@w{struct node *@var{dp}}, @w{char *@var{name}}, @w{struct node *@var{np}}, @w{struct dirstat *@var{ds}}, @w{struct protid *@var{cred}}) +@deftypefunx error_t diskfs_direnter_hard (@w{struct node *@var{dp}}, @w{char *@var{name}}, @w{struct node *@var{np}}, @w{struct dirstat *@var{ds}}, @w{struct protid *@var{cred}}) +You should not define @code{diskfs_direnter}, because it is simply a +wrapper for @code{diskfs_direnter_hard}, and is already defined in +@code{libdiskfs}. + +Add @var{np} to directory @var{dp} under the name @var{name}. This will +only be called after an unsuccessful call to @code{diskfs_lookup} of type +@code{CREATE} or @code{RENAME}; @var{dp} has been locked continuously +since that call and @var{ds} is as that call set it, @var{np} is locked. +@var{cred} identifies the user responsible for the call (to be used only +to validate directory growth). +@end deftypefun + +@deftypefun error_t diskfs_dirrewrite (@w{struct node *@var{dp}}, @w{struct node *@var{oldnp}}, @w{struct node *@var{np}}, @w{char *@var{name}}, @w{struct dirstat *@var{ds}}) +@deftypefunx error_t diskfs_dirrewrite_hard (@w{struct node *@var{dp}}, @w{struct node *@var{np}}, @w{struct dirstat *@var{ds}}) +You should not define @code{diskfs_dirrewrite}, because it is simply a +wrapper for @code{diskfs_dirrewrite_hard}, and is already defined in +@code{libdiskfs}. + +This will only be called after a successful call to @code{diskfs_lookup} +of type @code{RENAME}; this call should change the name found in +directory @var{dp} to point to node @var{np} instead of its previous +referent. @var{dp} has been locked continuously since the call to +@code{diskfs_lookup} and @var{ds} is as that call set it; @var{np} is +locked. + +@code{diskfs_dirrewrite} has some additional specifications: @var{name} +is the name within @var{dp} which used to correspond to the previous +referent, @var{oldnp}; it is this reference which is being rewritten. +@code{diskfs_dirrewrite} also calls @code{diskfs_notice_dirchange} if +@code{@var{dp}->dirmod_reqs} is nonzero. +@end deftypefun + +@deftypefun error_t diskfs_dirremove (@w{struct node *@var{dp}}, @w{struct node *@var{np}}, @w{char *@var{name}}, @w{struct dirstat *@var{ds}}) +@deftypefunx error_t diskfs_dirremove_hard (@w{struct node *@var{dp}}, @w{struct dirstat *@var{ds}}) +You should not define @code{diskfs_dirremove}, because it is simply a +wrapper for @code{diskfs_dirremove_hard}, and is already defined in +@code{libdiskfs}. + +This will only be called after a successful call to @code{diskfs_lookup} +of type @code{REMOVE}; this call should remove the name found from the +directory @var{ds}. @var{dp} has been locked continuously since the +call to @code{diskfs_lookup} and @var{ds} is as that call set it. + +@code{diskfs_dirremove} has some additional specifications: this routine +should call @code{diskfs_notice_dirchange} if +@code{@var{dp}->dirmod_reqs} is nonzero. The entry being removed has +name @var{name} and refers to @var{np}. +@end deftypefun + +@deftypefun error_t diskfs_drop_dirstat (@w{struct node *@var{dp}}, @w{struct dirstat *@var{ds}}) +@var{ds} has been set by a previous call to @code{diskfs_lookup} on +directory @var{dp}; this function is guaranteed to be called if +@code{diskfs_direnter}, @code{diskfs_dirrewrite}, and +@code{diskfs_dirremove} have not been called, and should free any state +retained by a @code{struct dirstat}. @var{dp} has been locked +continuously since the call to @code{diskfs_lookup}. +@end deftypefun + +@deftypefun void diskfs_null_dirstat (@w{struct dirstat *@var{ds}}) +Initialize @var{ds} such that @code{diskfs_drop_dirstat} will ignore it. +@end deftypefun + +@deftypefun error_t diskfs_get_directs (@w{struct node *@var{dp}}, @w{int @var{entry}}, @w{int @var{n}}, @w{char **@var{data}}, @w{u_int *@var{datacnt}}, @w{vm_size_t @var{bufsiz}}, @w{int *@var{amt}}) +Return @var{n} directory entries starting at @var{entry} from locked +directory node @var{dp}. Fill @code{*@var{data}} with the entries; +which currently points to @code{*@var{datacnt}} bytes. If it isn't big +enough, @code{vm_allocate} into @code{*@var{data}}. Set +@code{*@var{datacnt}} with the total size used. Fill @var{amt} with the +number of entries copied. Regardless, never copy more than @var{bufsiz} +bytes. If @var{bufsiz} is zero, then there is no limit on +@code{*@var{datacnt}}; if @var{n} is -1, then there is no limit on +@var{amt}. +@end deftypefun + +@deftypefun int diskfs_dirempty (@w{struct node *@var{dp}}, @w{struct protid *@var{cred}}) +Return nonzero if locked directory @var{dp} is empty. If the user has +not redefined @code{diskfs_clear_directory} and +@code{diskfs_init_directory}, then `empty' means `only possesses entries +labelled @file{.} and @file{..}. @var{cred} identifies the user making +the call@dots{} if this user cannot search the directory, then this +routine should fail. +@end deftypefun + +@deftypefun error_t diskfs_get_translator (@w{struct node *@var{np}}, @w{char **@var{namep}}, @w{u_int *@var{namelen}}) +For locked node @var{np} (for which @code{diskfs_node_translated} is +true) look up the name of its translator. Store the name into newly +malloced storage and set @code{*@var{namelen}} to the total length. +@end deftypefun + +@deftypefun error_t diskfs_set_translator (@w{struct node *@var{np}}, @w{char *@var{name}}, @w{u_int @var{namelen}}, @w{struct protid *@var{cred}}) +For locked node @var{np}, set the name of the translating program to be +@var{name}, which is @var{namelen} bytes long. @var{cred} identifies +the user responsible for the call. +@end deftypefun + +@deftypefun error_t diskfs_truncate (@w{struct node *@var{np}}, @w{off_t @var{size}}) +Truncate locked node @var{np} to be @var{size} bytes long. If @var{np} +is already less than or equal to @var{size} bytes long, do nothing. If +this is a symlink (and @code{diskfs_shortcut_symlink} is set) then this +should clear the symlink, even if @code{diskfs_create_symlink_hook} +stores the link target elsewhere. +@end deftypefun + +@deftypefun error_t diskfs_grow (@w{struct node *@var{np}}, @w{off_t @var{size}}, @w{struct protid *@var{cred}}) +Grow the disk allocated to locked node @var{np} to be at least +@var{size} bytes, and set @code{@var{np}->allocsize} to the actual +allocated size. If the allocated size is already @var{size} bytes, do +nothing. @var{cred} identifies the user responsible for the call. +@end deftypefun + +@deftypefun error_t diskfs_node_reload (@w{struct node *@var{node}}) +This function must reread all data specific to @var{node} from disk, +without writing anything. It is always called with +@var{diskfs_readonly} set to true. +@end deftypefun + +@deftypefun error_t diskfs_reload_global_state (void) +This function must invalidate all cached global state, and reread it as +necessary from disk, without writing anything. It is always called with +@var{diskfs_readonly} set to true. @code{diskfs_node_reload} is +subsequently called on all active nodes, so this call doesn't need to +reread any node-specific data. +@end deftypefun + +@deftypefun error_t diskfs_node_iterate (error_t (*@var{fun}) (@w{struct node *@var{np}})) +For each active node @var{np}, call @var{fun}. The node is to be locked +around the call to @var{fun}. If @var{fun} returns nonzero for any +node, then stop immediately, and return that value. +@end deftypefun + +@deftypefun error_t diskfs_alloc_node (@w{struct node *@var{dp}}, @w{mode_t @var{mode}}, @w{struct node **@var{np}}) +Allocate a new node to be of mode @var{mode} in locked directory +@var{dp}, but don't actually set the mode or modify the directory, since +that will be done by the caller. The user responsible for the request +can be identified with @var{cred}. Set @code{*@var{np}} to be the newly +allocated node. +@end deftypefun + +@deftypefun void diskfs_free_node (@w{struct node *@var{np}}, @w{mode_t @var{mode}}) +Free node @var{np}; the on-disk copy has already been synchronized with +@code{diskfs_node_update} (where @code{@var{np}->dn_stat.st_mode} was +zero). @var{np}'s mode used to be @var{mode}. +@end deftypefun + +@deftypefun void diskfs_lost_hardrefs (@w{struct node *@var{np}}) +Locked node @var{np} has some light references but has just lost its +last hard reference. +@end deftypefun + +@deftypefun void diskfs_new_hardrefs (@w{struct node *@var{np}}) +Locked node @var{np} has just acquired a hard reference where it had +none previously. Therefore, it is okay again to have light references +without real users. +@end deftypefun + +@deftypefun void diskfs_try_dropping_softrefs (@w{struct node *@var{np}}) +Node @var{np} has some light references, but has just lost its last hard +references. Take steps so that if any light references can be freed, +they are. Both @var{diskfs_node_refcnt_lock} and @var{np} are locked. +This function will be called after @code{diskfs_lost_hardrefs}. +@end deftypefun + +@deftypefun void diskfs_node_norefs (@w{struct node *@var{np}}) +Node @var{np} has no more references; free local state, including +@code{*@var{np}} if it shouldn't be retained. +@var{diskfs_node_refcnt_lock} is held. +@end deftypefun + +@deftypefun error_t diskfs_set_hypermetadata (@w{int @var{wait}}, @w{int @var{clean}}) +Write any non-paged metadata from format-specific buffers to disk, +asynchronously unless @var{wait} is nonzero. If @var{clean} is nonzero, +then after this is written the filesystem will be absolutely clean, and +it must be possible for the non-paged metadata to indicate that fact. +@end deftypefun + +@deftypefun void diskfs_write_disknode (@w{struct node *@var{np}}, @w{int @var{wait}}) +Write the information in @code{@var{np}->dn_stat} and any associated +format-specific information to the disk. If @var{wait} is true, then +return only after the physicial media has been completely updated. +@end deftypefun + +@deftypefun void diskfs_file_update (@w{struct node *@var{np}}, @w{int @var{wait}}) +Write the contents and all associated metadata of file NP to disk. +Generally, this will involve calling @code{diskfs_node_update} for much +of the metadata. If @var{wait} is true, then return only after the +physical media has been completely updated. +@end deftypefun + +@deftypefun mach_port_t diskfs_get_filemap (@w{struct node *@var{np}}, @w{vm_prot_t @var{prot}}) +Return a memory object port (send right) for the file contents of +@var{np}. @var{prot} is the maximum allowable access. On errors, +return @code{MACH_PORT_NULL} and set @code{errno}. +@end deftypefun + +@deftypefun {struct pager *} diskfs_get_filemap_pager_struct (@w{struct node *@var{np}}) +Return a @code{struct pager *} that refers to the pager returned by +diskfs_get_filemap for locked node NP, suitable for use as an argument +to @code{pager_memcpy}. +@end deftypefun + +@deftypefun vm_prot_t diskfs_max_user_pager_prot (void) +Return the bitwise OR of the maximum @code{prot} parameter (the second +argument to @code{diskfs_get_filemap}) for all active user pagers. +@end deftypefun + +@deftypefun int diskfs_pager_users (void) +Return nonzero if there are pager ports exported that might be in use by +users. Further pager creation should be blocked before this function +returns zero. +@end deftypefun + +@deftypefun void diskfs_sync_everything (@w{int @var{wait}}) +Sync all the pagers and write any data belonging on disk except for the +hypermetadata. If @var{wait} is true, then return only after the +physicial media has been completely updated. +@end deftypefun + +@deftypefun void diskfs_shutdown_pager (void) +Shut down all pagers. This is irreversable, and is done when the +filesystem is exiting. +@end deftypefun + + +@node Diskfs Options +@subsection Diskfs Options + +The functions and variables described in this subsection already have +default definitions in @code{libdiskfs}, so you are not forced to define +them; rather, they may be redefined on a case-by-case basis. + +You should set the value of any option variables as soon as your program +starts (before you make any calls to diskfs, such as argument parsing). + +@deftypevar int diskfs_hard_readonly +You should set this variable to nonzero if the filesystem media can +never be made writable. +@end deftypevar + +@deftypevar {char *} diskfs_extra_version +Set this to be any additional version specification that should be +printed for --version. +@end deftypevar + +@deftypevar int diskfs_shortcut_symlink +This should be nonzero if and only if the filesystem format supports +shortcutting symbolic link translation. The library guarantees that +users will not be able to read or write the contents of the node +directly, and the library will only do so if the symlink hook functions +(@code{diskfs_create_symlink_hook} and @code{diskfs_read_symlink_hook}) +return @code{EINVAL} or are not defined. The library knows that the +@code{dn_stat.st_size} field is the length of the symlink, even if the +hook functions are used. +@end deftypevar + +@deftypevar int diskfs_shortcut_chrdev +@deftypevarx int diskfs_shortcut_blkdev +@deftypevarx int diskfs_shortcut_fifo +@deftypevarx int diskfs_shortcut_ifsock +These variables should be nonzero if and only if the filesystem format +supports shortcutting character device node, block device node, FIFO, or +Unix-domain socket translation, respectively. +@end deftypevar + +@deftypevar int diskfs_default_sync_interval +@code{diskfs_set_sync_interval} is called with this value when the first +diskfs thread is started up (in @code{diskfs_spawn_first_thread}). This +variable has a default default value of 30, which causes disk buffers to +be flushed at least every 30 seconds. +@end deftypevar + +@deftypefun error_t diskfs_validate_mode_change (@w{struct node *@var{np}}, @w{mode_t @var{mode}}) +@deftypefunx error_t diskfs_validate_owner_change (@w{struct node *@var{np}}, @w{uid_t @var{uid}}) +@deftypefunx error_t diskfs_validate_group_change (@w{struct node *@var{np}}, @w{gid_t @var{gid}}) +@deftypefunx error_t diskfs_validate_author_change (@w{struct node *@var{np}}, @w{uid_t @var{author}}) +@deftypefunx error_t diskfs_validate_flags_change (@w{struct node *@var{np}}, @w{int @var{flags}}) +@deftypefunx error_t diskfs_validate_rdev_change (@w{struct node *@var{np}}, @w{dev_t @var{rdev}}) +Return zero if for the node @var{np} can be changed as requested. That +is, if @var{np}'s mode can be changed to @var{mode}, owner to @var{uid}, +group to @var{gid}, author to @var{author}, flags to @var{flags}, or raw +device number to @var{rdev}, respectively. Otherwise, return an error +code. + +It must always be possible to clear the mode or the flags; diskfs will +not ask for permission before doing so. +@end deftypefun + +@deftypefun void diskfs_readonly_changed (@w{int @var{readonly}}) +This is called when the disk has been changed from read-only to +read-write mode or vice-versa. @var{readonly} is the new state (which +is also reflected in @var{diskfs_readonly}). This function is also +called during initial startup if the filesystem is to be writable. +@end deftypefun + +@deftypefn {Variable} {error_t (*} diskfs_create_symlink_hook ) (@w{struct node *@var{np}}, @w{char *@var{target}}) +If this function pointer is nonzero (and @code{diskfs_shortcut_symlink} +is set) it is called to set a symlink. If it returns @code{EINVAL} or +isn't set, then the normal method (writing the contents into the file +data) is used. If it returns any other error, it is returned to the +user. +@end deftypefn + +@deftypefn {Variable} {error_t (*} diskfs_read_symlink_hook ) (@w{struct node *@var{np}}, @w{char *@var{target}}) +If this function pointer is nonzero (and @code{diskfs_shortcut_symlink} +is set) it is called to read the contents of a symlink. If it returns +@code{EINVAL} or isn't set, then the normal method (reading from the +file data) is used. If it returns any other error, it is returned to +the user. +@end deftypefn + +@deftypefun error_t diskfs_rename_dir (@w{struct node *@var{fdp}}, @w{struct node *@var{fnp}}, @w{char *@var{fromname}}, @w{struct node *@var{tdp}}, @w{char *@var{toname}}, @w{struct protid *@var{fromcred}}, @w{struct protid *@var{tocred}}) +Rename directory node @var{fnp} (whose parent is @var{fdp}, and which +has name @var{fromname} in that directory) to have name @var{toname} +inside directory @var{tdp}. None of these nodes are locked, and none +should be locked upon return. This routine is serialized, so it doesn't +have to be reentrant. Directories will never be renamed except by this +routine. @var{fromcred} is the user responsible for @var{fdp} and +@var{fnp}. @var{tocred} is the user responsible for @var{tdp}. This +routine assumes the usual convention where @file{.} and @file{..} are +represented by ordinary links; if that is not true for your format, you +have to redefine this function. +@end deftypefun + +@deftypefun error_t diskfs_clear_directory (@w{struct node *@var{dp}}, @w{struct node *@var{pdp}}, @w{struct protid *@var{cred}}) +Clear the @file{.} and @file{..} entries from directory @var{dp}. Its +parent is @var{pdp}, and the user responsible for this is identified by +@var{cred}. Both directories must be locked. This routine assumes the +usual convention where @file{.} and @file{..} are represented by +ordinary links; if that is not true for your format, you have to +redefine this function. +@end deftypefun + +@deftypefun error_t diskfs_init_dir (@w{struct node *@var{dp}}, @w{struct node *@var{pdp}}, @w{struct protid *@var{cred}}) +Locked node @var{dp} is a new directory; add whatever links are +necessary to give it structure; its parent is the (locked) node +@var{pdp}. This routine may not call @code{diskfs_lookup} on @var{pdp}. +The new directory must be clear within the meaning of +@code{diskfs_dirempty}. This routine assumes the usual convention where +@file{.} and @file{..} are represented by ordinary links; if that is not +true for your format, you have to redefine this function. @var{cred} +identifies the user making the call. +@end deftypefun + + +@node Diskfs Internals +@subsection Diskfs Internals + +The library also exports the following functions, but they are not +generally useful unless you are redefining other functions the library +provides. + +@deftypefun error_t diskfs_create_protid (@w{struct peropen *@var{po}}, @w{struct iouser *@var{user}}, @w{struct protid **@var{cred}}) +Create and return a protid for an existing peropen @var{po} in +@var{cred}, referring to user @var{user}. The node @code{@var{po}->np} +must be locked. +@end deftypefun + +@deftypefun error_t diskfs_start_protid (@w{struct peropen *@var{po}}, @w{struct protid **@var{cred}}) +Build and return in @var{cred} a protid which has no user +identification, for peropen @var{po}. The node @code{@var{po}->np} must +be locked. +@end deftypefun + +@deftypefun void diskfs_finish_protid (@w{struct protid *@var{cred}}, @w{struct iouser *@var{user}}) +Finish building protid @var{cred} started with @code{diskfs_start_protid}; +the user to install is @var{user}. +@end deftypefun + +@deftypefun void diskfs_protid_rele (@w{void *@var{arg}}) +Called when a protid @var{cred} has no more references. Because +references to protids are maintained by the port management library, +this is installed in the clean routines list. The ports library will +free the structure. +@end deftypefun + +@deftypefun {struct peropen *} diskfs_make_peropen (@w{struct node *@var{np}}, @w{int @var{flags}}, @w{struct peropen *@var{context}}) +Create and return a new peropen structure on node @var{np} with open +flags @var{flags}. The initial values for the @code{root_parent}, +@code{shadow_root}, and @code{shadow_root_parent} fields are copied from +@var{context} if it is nonzero, otherwise each of these values are +set to zero. +@end deftypefun + +@deftypefun void diskfs_release_peropen (@w{struct peropen *@var{po}}) +Decrement the reference count on @var{po}. +@end deftypefun + +@deftypefun error_t diskfs_execboot_fsys_startup (@w{mach_port_t @var{port}}, @w{int @var{flags}}, @w{mach_port_t @var{ctl}}, @w{mach_port_t *@var{real}}, @w{mach_msg_type_name_t *@var{realpoly}}) +This function is called by @code{S_fsys_startup} for execserver +bootstrap. The execserver is able to function without a real node, +hence this fraud. Arguments are as for @code{fsys_startup} in +@code{<hurd/fsys.defs>}. +@end deftypefun + +@deftypefun int diskfs_demuxer (@w{mach_msg_header_t *@var{inp}}, @w{mach_msg_header_t *@var{outp}}) +Demultiplex incoming @code{libports} messages on diskfs ports. +@end deftypefun + +@findex diskfs_S_* +The diskfs library also provides functions to demultiplex the fs, io, +fsys, interrupt, and notify interfaces. All the server routines have +the prefix @code{diskfs_S_}. For those routines, @code{in} arguments of +type @code{file_t} or @code{io_t} appear as @code{struct protid *} to +the stub. + + +@node Twisted Filesystems +@chapter Twisted Filesystems + +In the Hurd, translators are capable of redirecting filesystem requests +to other translators, which makes it possible to implement alternative +views of the same underlying data. The translators described in this +chapter do not provide direct access to any data; rather, they are +organizational tools to help you simplify an existing physical +filesystem layout. + +Be prudent with these translators: you may accidentally injure people +who want their filesystems to be rigidly tree-structured.@footnote{You +are lost in a maze of twisty little filesystems, all alike@dots{}.} + +FIXME: finish + +@section symlink, firmlink +@section hostmux, usermux +@section shadowfs + + +@node Distributed Filesystems +@chapter Distributed Filesystems + +Distributed filesystems are designed to share files between separate +machines via a network connection of some sort. Their design is +significantly different than stored filesystems (@pxref{Stored +Filesystems}): they need to deal with the problems of network delays and +failures, and may require complex authentication and replication +protocols involving multiple file servers. + +@menu +* File Transfer Protocol:: A distributed filesystem based on FTP. +* Network File System:: Sun's NFS: a lousy, but common filesystem. +@end menu + + +@node File Transfer Protocol +@section File Transfer Protocol +@cindex FTP + +FIXME: finish + +@menu +* FTP Connection Library:: Managing remote FTP server connections. +@end menu + +@subsection ftpcp, ftpdir +@subsection ftpfs + +@node FTP Connection Library +@subsection FTP Connection Library +@scindex libftpconn +@scindex ftpconn.h + +FIXME: finish + + +@node Network File System +@section Network File System +@cindex NFS + +FIXME: finish + +@subsection nfsd +@subsection nfs + + +@node Networking +@chapter Networking + +FIXME: this subsystem is in flux @c Thomas, 26-03-1998 + +@menu +* Socket Interface:: Network communication I/O protocol. +@end menu + + +@section pfinet +@section pflocal +@section libpipe + +@node Socket Interface +@section Socket Interface +@scindex socket.defs + +FIXME: net frobbing stuff may be added to socket.defs +@c Thomas, 26-03-1998 + + +@node Terminal Handling +@chapter Terminal Handling + +FIXME: finish + +@section term +@section term.defs + + +@node Running Programs +@chapter Running Programs + +FIXME: finish + +@section ps, w +@section libps +@section exec +@section proc +@section crash + + +@node Authentication +@chapter Authentication + +FIXME: finish + +@menu +* Auth Interface:: Auth ports implement the auth interface. +@end menu + +@section addauth, rmauth, setauth +@section su, sush, unsu +@section login, loginpr +@section auth + +@node Auth Interface +@section Auth Interface +@scindex auth.defs + +FIXME: finish + +@menu +* Auth Protocol:: Bidirectional authentication. +@end menu -FIXME: A library to implement complex multi-threaded pagers +@node Auth Protocol +@subsection Auth Protocol -@node Diskfs library -@chapter Diskfs library +FIXME: finish -FIXME: A library to do almost all the work of implementing a disk-based -filesystem -@node Trivfs library -@chapter Trivfs library +@node Index +@unnumbered Index -FIXME: A library to do the work of handling the file protocol for -directory-less filesystems +@printindex cp +@summarycontents +@contents @bye -@c Local variables: -@c texinfo-column-for-description: 28 -@c End: |