diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/hurd.texi | 554 |
1 files changed, 554 insertions, 0 deletions
diff --git a/doc/hurd.texi b/doc/hurd.texi new file mode 100644 index 00000000..e7cc9e92 --- /dev/null +++ b/doc/hurd.texi @@ -0,0 +1,554 @@ +\input texinfo @c -*-texinfo-*- +@setfilename hurd.texi + +@ifinfo +@format +START-INFO-DIR-ENTRY +* Hurd: (hurd). The interfaces of the GNU Hurd. +END-INFO-DIR-ENTRY +@end format +@end ifinfo + +@ifinfo +Copyright @copyright{} 1994 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through TeX and print the +results, provided the printed document carries a copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore + +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided also that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. +@end ifinfo + +@setchapternewpage odd +@settitle Hurd Interface Manual +@titlepage +@finalout +@title The GNU Hurd Interface Manual +@author Michael I. Bushnell +@page + +@vskip 0pt plus 1filll +Copyright @copyright{} 1994 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided also that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. +@end titlepage + +@node Top +@top Introduction + +This manual describes the interfaces that make up the GNU Hurd. It is +assumed that the reader is familiar with the features of the Mach +kernel, and with using the Hurd interfaces as a user, and all of the +associated C library calls. It concentrates on requirements and advice +for the writing of Hurd servers, as well as describing the libraries +that come with the GNU Hurd. + +@menu +* I/O interface:: The interface for reading and writing + I/O channels +* Shared I/O:: The interface for doing input and output + using shared memory +* File interface:: The interface for modifying file-specific + characteristics +* Filesystem interface:: Interfaces supported to control file-servers +* Socket interface:: Interfaces used for manipulating sockets + +* Ports library:: A library to manage port rights for servers +* Iohelp library:: A library to implement some common parts + of the I/O and shared I/O interfaces. +* Fshelp library:: A library to implement some common parts + of the file interface. +* Pager library:: A library to implement complex + multi-threaded pagers. +* Diskfs library:: A library to do almost all the work of + implementing a disk-based filesystem. +* Trivfs library:: A library to do the work of handling the + file protocol for directory-less + filesystems. +* Mapped data:: Getting memory objects referring to the + data of an I/O object. + +@node I/O interface +@chapter I/O interface + +The I/O interface is used to interact with almost all servers in the +GNU Hurd. It provides facilities for reading and writing I/O streams. +The I/O interface facilities are described in <hurd/io.defs> and +<hurd/shared.h> The latter portion of <hurd/io.defs> and all of +<hurd/shared.h> describe how to implement shared-memory I/O operations, +and are described later. The present chapter is concerned with +RPC-based I/O operations. + +@menu +* I/O object ports:: How ports to I/O objects work +* Simple operations:: Read, write, and seek +* Open modes:: State bits that affect pieces of + operation +* Asynchronous I/O:: How to get notified when I/O is possible +* Information queries:: How to implement io_stat and + io_server_version +@end menu + +@node I/O object ports +@section I/O object ports + +Each port to an I/O server should be associated with a particular set of +uids and gids, identifying the user who is responsible for operations on +the port. Every port to an I/O server should also support either the +file protocol or the socket protocol; naked I/O ports are not allowed. + +In addition, each port is associated with a default file pointer, a set +of open mode bits, a pid (called the ``owner''), and some underlying +object which can absorb data (for write) or provide data (for read). + +The uid and gid sets associated with a port may not be visibly shared +with other ports; nor may they ever change. The identification of a set +of uids and gids with a particular port must be fixed at the moment of +the port's creation. The other characteristics of an I/O port may be +shared with other users. The manner in which these characteristics are +shared is not part of the I/O server interface; however, the file and +socket interfaces make further requirements about what sharing is +expected and prohibited from occurring. + +In general, users get send-rights to I/O ports by some mechanism that is +external to the I/O protocol. (For example file servers give out I/O +ports in response to the dir_pathtrans and fsys_getroot calls. Socket +servers give out ports in response to the socket_create and +socket_accept calles.) However, the I/O protocol provides methods of +obtaining new ports that refer to the same underlying object as another +port. In response to all of these calls, all underlying state +(including, but not limited to, the default file poirter, open mode +bits, and underlying object) must be shared between the old and new +ports. In the following descriptions of these calls, this is what is +meant by saying that the new port is "identical" to the old port. They +all must return send-rights to a newly-constructed Mach port. + +The io_duplicate call simply returns another port which is identical +to an existing port and has the same uid and gid set. + +The io_restrict_auth call should return another port, identical to the +provided port, but which has a smaller associated uid and gid set. The +uid and gid sets of the new port should be the intersection of the set +on the existing port and the lists of uids and gids provided in the +call. + +The io_reauthenticate call is used when users wish to have an entirely +new set of uids or gids associated with a port. When such a call is +received, the server must create a new port, and then make the call +auth_server_authenticate to the auth server. The rendezvous port for +the auth_server_authenticate call is the I/O port to which was made the +io_reauthenticate call. The rend_int parameter should be copied from +the io_reauthenticate call. The I/O server also gives the auth server a +new port; this should be a newly created port identical to the old port. +The auth server will return the set of uids and gids associated with the +user, and guarantees that the new port will go directly to the user that +possessed the associated authentication port. + +@node Simple operations +@section Simple operations + +Users write to I/O ports by calling the io_write RPC. They specify an +offset parameter; if the object supports writing at arbitrary offsets, +this should be honored. If -1 is passed as the offset, then the default +file pointer should be used. The server should return the amount of +data which was successfully written. If the operation was interrupted +after some but not all of the data was written, then it is considered to +have succeeded and should return the amount written. If the port is not +an I/O port at all, the error EOPNOTSUPP should be returned. If the +port is an I/O port, but does not happen to support writing, then EBADF +should be returned. + +Users read from I/O ports by calling the io_read RPC. The specify the +amount of data they wish to read and the offset. The offset has the +same meaning as for io_write above. The server should return the data +read. If the call is interrupted after same data has been read (and the +operation is not idempotent) then the server should return the amount +read, even if less than the amount requested. The server should return +as much data as possible, but never more than requested by the user. If +there is no data, but there might be later, the call should block until +data becomes available. End-of-file conditions are indicated by +returning zero bytes. If the call is interrupted after some data has +been read, but the call is idempotent, then the server may return EINTR +rather than actually filling the buffer (taking care that any +modifications of the default file pointer have been reversed). + +Objects are divided into two categories: seekable and non-seekable. +Seekable objects are required to accept arbitrary offset parameters in +the io_read and io_write calls, and to implement the io_seek call. +Nonseekable objects must ignore the offset parameters to io_read and +io_write, and should return ESPIPE to the io_seek call. + +On seekable objects, io_seek is used to change the default file pointer +for reads and writes. (See the C library manual for the interpretation +of the WHENCE and OFFSET arguments, and why the grammatically incorrect +term `whence' is used.) It returns the new offset as modified by +io_seek. + +The io_readable interface should return the amount of data which can be +immediately read. For the special technical meaning of "immediately", +see the description of asynchronous I/O. (*Note: Asynchronous I/O.) + +@node Open modes +@section Open modes + +Each port is identified with a set of bits that affect its operation. +These bits are modified with the io_set_all_openmodes call and fetched +with the io_get_openmodes. In addition, the io_set_some_openmodes and +io_clear_some_openmodes do atomic read/writes of the openmodes. + +The O_APPEND bit, when set, changes the behavior of io_write when it +uses the default file pointer on seekable objects. When io_write is +done on a port with the O_APPEND bit set, is must set the filepointer to +one more than the "maximum correct value" (described below) before doing +the write (which would then increment the file pointer as usual). This +update must be atomically bound to the actual data write with respect to +other users of io_read, io_write, and io_seek. + +A "correct value" for the file pointer which, when provided to io_read, +will successfully return at least one byte of data and not end-of-file. +The "maximum correct value" referred to in the description of O_APPEND +is the maximum such correct value. (For ordinary files [see the +description of the file protocol for more information] this is the same +as the current file size.) + +The O_FSYNC bit, when set, should cause io_write not to delay writing +data to underlying media in any fashion. + +The O_NONBLOCK bit, when set, should prevent read and write from +blocking. They should copy such data as is immediately available. If +no data is immediately available they should return EWOULDBLOCK. + +The definition of "immediate" is more or less server dependent. Some +servers (disk based file servers, most notable) regard all data as +immediatebly available. The one criterion is that something which must +happen immediately may not wait for any user-synchronizable event. + +The O_ASYNC bit is deprecated; its use is documented in the following +section. This bit must be shared between all users of the same +underlying object. + +@node Asynchronous I/O +@section Asynchronous I/O + +Users may wish to be notified of when I/O can be done without blocking; +they use the io_async call to indicate this to the server. In the +io_async call the user provides a port on which will be sent sig_post +messages as I/O becomes possible. The server should return a port which +will be used as a reference port in sig_post messages. Each io_async +call should generate a new reference port. (See the C library manual +for information on how to send sig_post messages.) + +The server should send one SIGIO signal to each registered async user +everytime I/O becomes possible. I/O is possible if at least one byte +can be read or written immediately. (The definition of ``immediately'' +must be the same as for the implementation of the O_NONBLOCK flag.) +Everytime io_read or io_write is called, another signal should be sent +to each user if I/O is still possible. + +Some objects may also define "urgent" conditions. Such servers should +send the SIGURG signal to each registered async user anytime an urgent +condition appears. After any RPC that has the possibility of clearing +the urgent condition, the signal should again be sent to all registered +users if the urgent condition is still present. + +A more fine-grained mechanism for doing async I/O is the io_select call. +The user specifies the kind of access desired, and a send-once right. +If I/O of the kind the user desires is immediately possible, then the +server should return so indicating, and destroy the send-once right. If +I/O is not immediately possible, the server should save the send-once +right, and send a select_done message as soon as I/O becomes immediately +possible. (Again, the definition of ``immediate'' must be the same for +io_select, io_async, and O_NONBLOCK.) + +For compatibility, a deprecated feature (known as icky async I/O) is +provided. The calls io_mod_owner and io_get_owner are used to set the +``owner'' of the object; either a pid or a pgrp (negative) is provided. +Whenever the I/O server is sending messages to all the io_async users, +if the O_ASYNC bit is set for any user of the object, it should also +send a signal to the owning pid/pgrp. The ID port for this call should +be different from all the io_async id ports given to users. Users may +find out what ID port will be used by calling io_get_icky_async_id. + +@node Information queries +@section Information queries + +Users may call io_stat to find out information about the I/O object. +Most of the fieds of a struct stat are meaningful only for files. All +objects, however, are required to support the fields st_fstype, st_fsid, +st_ino, st_atime, st_atime_usec, st_mtime_user, st_ctime, st_ctime_usec, +st_blksize. + +st_fstype, st_fsid, and st_ino must be unique for the underlying object +across the entire system. + +st_atime and st_atime_usec hold the seconds and microseconds, +respectively, of the system clock at the last time the object was +read with io_read. + +st_mtime and st_mtime_usec hold the second and microseconds, +respectively, of the system clock at the last time the object was +written with io_write. + +st_ctime and st_ctime_usec hold the seconds and microseconds, +respectively, of the system clock at the last time permanent meta-data +associated with the object was changed. The exact operations which +couse such an update are server-dependent, but must include the creation +of the object. + +st_blksize gives the optimal I/O size for io_read and io_write; users +should endeavor to read and write amounts which are multiples of the +optimal size, and to use offsets which are multiples of the optimal +size. + +In addition, objects which are seekable should set st_size to the +"maximum correct value" described above in the description of the +O_APPEND flag. + +The st_uid and st_gid fields are unrelated to the ``owner'' as described +above for icky async I/O. + +Users may find out the version of the server they are talking to by +calling io_server_version; this should return strings and integers +describing the version number of the server, as well as its name. + +@node Mapped data +@section Mapped data + +Servers may optionally implement the io_map call; they may do so even if +the do not implement the facilities described in the following chapter. +The ports returned by io_map must implement the XP kernel interface and +be suitable as arguments to vm_map. + +Seekable objects must allow access from 0 to the "maximum correct value" +described for O_APPEND. Whether they provide access beyond such a point +is server dependent; in addition, the meaning of such an object for a +non-seekable object is server dependent. However, servers which +implement the facilities of the next section are bound to certain +requirements about which addresses in the memory objects provided by +io_map must be valid. Simply put, any user following the rules +described in the next chapter should not get any memory faults except as +explicitly permitted by the next chapter. + +@node Shared I/O +@chapter Shared I/O + +I/O servers may, optionally, provide the services described in this +chapter in addition to the generic services described in the previous +chapter. These facilities allow users to read and write I/O objects +without making RPC's to the server in most circumstances. + +@menu +* Rules:: The rules users must obey in using + shared I/O. +* Examples:: Examples of the way different types + of servers could implement shared I/O. +@end menu + +@node Rules +@section Rules + +Any server implementing the facilities of this chapter must also support +the io_map call as described in the previous chapter. + +Users of the shared I/O facilities must call io_map_cntl; this will +return a memory object, called the shared page object. One page of this +object should be mapped from offset zero into the user's address space. +At the front of this page is a struct shared_io as described in +<hurd/shared.h>. Frequent reference will be made to the members of this +structure in this chapter, without further qualification. + +Only one shared user can be active on a given port at a time. If +io_map_cntl is called for a port on which a user is already active, the +server should return EBUSY, and which point the user should call +io_duplicate to obtain a new port, and call io_map_cntl there. + +@menu +* Conch:: How access to the shared page is mediated +* Access rules:: Where in the io_map memory objects users + may peek and poke +* Behavior modification:: Modifications of behavior +* Status notifications:: Calls users should make at certain + times to keep the server abreast of the + current state of the object +* Violations:: When the rules are broken + +@end menu + +@node Conch +@subsection Conch + +Access to the shared page is mediated through a facility known as the +``conch''. The ``lock'' field of the shared page protects the +conch_status field; this lock must be acquired with spin_lock before +conch_status may be modified or examined. + +If the conch_status field is USER_HAS_CONCH or USER_RELEASE_CONCH, then +the user has the conch, and may access the shared page after releasing +the spin lock. . If the conch status field is USER_COULD_HAVE_CONCH, +then the user may immediately set conch_status to USER_HAS_CONCH, and +proceed to access the shared page after releasing the spin lock. If the +conch status is USER_HAS_NOT_CONCH, then the user should release the +spin lock, and call io_get_conch. Upon return from io_get_conch, the +user should reacquire the spin lock and check the conch status again. + +When the user is through accessing the shared page, the user should +acquire the spin lock and examine the conch_status field. If it has +been set to USER_RELEASE_CONCH, then the user should release the spin +lock and call io_release_conch. + +The implementation of io_read and io_write must not modify the file +contents except when the server is holding the conch; users who wish to +be atomic with respect to those functions should be similarly reticent. + +The server should guarantee that at most one user has the conch at a +time; the server may only have the conch if no user does. The server +may not modify conch_status or the shared page if the status is +USER_HAS_CONCH except to set it to USER_RELEASE_CONCH, thus requesting a +call to io_release_conch. + +@node Access rules +@subsection Access rules + +The conch fields file_size, read_size, and prenotify_size affect which +areas of the data objects may be accessed. In addition, for +non-seekable objects, the file pointers rd_file_pointer, +wr_file_pointer, and xx_file_pointer affect which areas may be accessed. + +For seekable objects, the read object may be read from offset 0 through +the minimum of file_size and read_size. + +For seekable objects, the write object may be modified from offset 0 +through the prenotify_size. + +For nonseekable objects, the read object may be read from +rd_file_pointer through the minimum of file_size and read_size. + +For nonseekable objects, the write object may be modified from +wr_file_pointer through prenotify_size. + +The server may permit access outside these regions, but data will not +necessarily be preserved for any length of time if so written. If the +server wishes to deny such access, it should fault with EIO. Servers +may also issue faults on modifications of the write object for reasons +such as EDQUOT and ENOSPC, as well as reporting hardware errors with +EIO. Serveys may only fault valid addresses in the read object with EIO +to indicate hardware failure. + +The foo field should be ignored if the value use_foo is clear in the +shared page; this may result in there being no maximum valid address for +a particular access. In that case, the object may be accessed to the +end of its virtual address space. + +If use_file_size is set, the user may increase the file_size, but may +not decrease it, to indicate the new "maximum correct value" as +described for O_APPEND. Normally writes which extend beyond the current +file_size should extend it to the end of the write. + +The xx_file_pointer for seekable objects is the same as the default file +pointer used by io_read and io_write. + +If use_read_size is set and the user wishes to read past read_size, she +may call io_readsleep, which will return as soon as read_size is +increased. If read_block_reason is set to RBR_BUFFER_FULL, then the +read_size will not be increased until the rd_file_pointer is increased. + +If use_prenotify_size is set and the user wishes to write past +prenotify_size, she may call io_prenotify, specifying the maximum offset +the user intends to write. The server should return when prenotify_size +has been increased, but is not obligated to extend it as far as the user +wishes. In addition, io_prenotify may return errors such as ENOSPC, +indicating that the prenotify_size cannot be increased. + +Seekable objects may modify the xx_file_pointer at will (including +pointing past read_size, file_size, or prenotify_size). Non-seekable +objects, however, may only increase the rd_file_pointer and +wr_file_pointer. In addition, they may not modify them to point past +the valid data as described above. Failing to advance them may prevent +the read_size or prenotify_size from being increased. + +If eof_notify is set, then the user may attempt to have the file_size to +be increased by calling io_eofnotify after "noticing" the current file +size limit. io_eofnotify must return immediately, but need not increase +the file_size or clear user_file_size. (However, if it is impossible +for io_eofnotify to ever do anything, then the server should not set +eof_notify.) + +@node Behavior modification +@subsection Behavior modification + +The server flag append_mode is a copy of the O_APPEND open mode bit; if +it is set, then the user should do writes at file_size and set the file +pointer appropriately (this applies only if the user would be writing at +the file pointer in the first place). + +@node Status notification +@subsection Status notification + +The flag do_sigio requests the user to call io_sigio every time the file +pointers or the file_size have been changed. + +If use_postnotify_size is set, then the user should call io_postnotify +after writing data that extends past postnotify_size. Writes beyond +postnotify_size may be buffered internally to the server for arbitrarily +long periods until io_postnotify is called, regardless of@c the setting +of the O_FSYNC bit. + +After modifying or reading the object contents, the user should set the +written or accessed fields respectively. (Users who fail to set these +fields will not thereby defeat the mtime/atime mechanism.) + +If the flag use_eof is set, then users should call io_eofnotify after +reading up to the file_size and noticing it. + +@node Violations +@subsection Violations + +Users who hold the conch for too long while conch_status is set to +USER_RELEASE_CONCH may have the conch stolen from them and their +conch_status unilaterally downgraded to USER_HAS_NOT_CONCH. Users who +hold the spin lock for too long (where this ``too long'' is much much +shorter than the previous one) will have the spin lock stolen from them. + +Users who read or write outside the valid regions described above may +get memory faults and may not expect data written to be saved in any +fashion. + +Users who write the read object (when it is different from the write +object) may or may not get faults; they may not expect such data to be +saved in any fashion. + +Users who fail to call io_postnotify may cause data to be buffered for +arbitrarily long periods. + +Users who reduce rd_file_pointer, wr_file_pointer, or file_size will +have such modifications ignored. + +Users may not call any server functions (whether in the I/O protocol or +another) while holding the conch except for those specified in this +chapter. Such calls may block or fail silently. + + |