summaryrefslogtreecommitdiff
path: root/doc/hurd.texi
blob: e7cc9e920c1928dc4469c9a1a927df67dc71151f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
\input texinfo  @c -*-texinfo-*-
@setfilename hurd.texi

@ifinfo
@format
START-INFO-DIR-ENTRY
* Hurd: (hurd).                 The interfaces of the GNU Hurd.
END-INFO-DIR-ENTRY
@end format
@end ifinfo

@ifinfo
Copyright @copyright{} 1994 Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

@ignore
Permission is granted to process this file through TeX and print the
results, provided the printed document carries a copying permission
notice identical to this one except for the removal of this paragraph
(this paragraph not being relevant to the printed manual).

@end ignore

Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided also that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions.
@end ifinfo

@setchapternewpage odd
@settitle Hurd Interface Manual
@titlepage
@finalout
@title The GNU Hurd Interface Manual
@author Michael I. Bushnell
@page

@vskip 0pt plus 1filll
Copyright @copyright{} 1994 Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided also that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions.
@end titlepage

@node Top
@top Introduction

This manual describes the interfaces that make up the GNU Hurd.  It is
assumed that the reader is familiar with the features of the Mach
kernel, and with using the Hurd interfaces as a user, and all of the
associated C library calls.  It concentrates on requirements and advice
for the writing of Hurd servers, as well as describing the libraries
that come with the GNU Hurd.

@menu
* I/O interface::           The interface for reading and writing
                            I/O channels
* Shared I/O::              The interface for doing input and output
                            using shared memory
* File interface::          The interface for modifying file-specific
                            characteristics
* Filesystem interface::    Interfaces supported to control file-servers
* Socket interface::        Interfaces used for manipulating sockets

* Ports library::           A library to manage port rights for servers
* Iohelp library::          A library to implement some common parts
                            of the I/O and shared I/O interfaces.
* Fshelp library::          A library to implement some common parts
                            of the file interface.
* Pager library::           A library to implement complex
                            multi-threaded pagers.
* Diskfs library::          A library to do almost all the work of
                            implementing a disk-based filesystem.
* Trivfs library::          A library to do the work of handling the
                            file protocol for directory-less
                            filesystems.
* Mapped data::             Getting memory objects referring to the
                            data of an I/O object.

@node I/O interface
@chapter I/O interface
                           
The I/O interface is used to interact with almost all servers in the
GNU Hurd.  It provides facilities for reading and writing I/O streams.
The I/O interface facilities are described in <hurd/io.defs> and
<hurd/shared.h>  The latter portion of <hurd/io.defs> and all of
<hurd/shared.h> describe how to implement shared-memory I/O operations,
and are described later.  The present chapter is concerned with
RPC-based I/O operations.

@menu
* I/O object ports::              How ports to I/O objects work
* Simple operations::             Read, write, and seek
* Open modes::                    State bits that affect pieces of
                                  operation
* Asynchronous I/O::              How to get notified when I/O is possible
* Information queries::           How to implement io_stat and 
                                  io_server_version
@end menu

@node I/O object ports
@section I/O object ports

Each port to an I/O server should be associated with a particular set of
uids and gids, identifying the user who is responsible for operations on
the port.  Every port to an I/O server should also support either the
file protocol or the socket protocol; naked I/O ports are not allowed.

In addition, each port is associated with a default file pointer, a set
of open mode bits, a pid (called the ``owner''), and some underlying
object which can absorb data (for write) or provide data (for read).

The uid and gid sets associated with a port may not be visibly shared
with other ports; nor may they ever change.  The identification of a set
of uids and gids with a particular port must be fixed at the moment of
the port's creation.  The other characteristics of an I/O port may be
shared with other users.  The manner in which these characteristics are
shared is not part of the I/O server interface; however, the file and
socket interfaces make further requirements about what sharing is
expected and prohibited from occurring.

In general, users get send-rights to I/O ports by some mechanism that is
external to the I/O protocol.  (For example file servers give out I/O
ports in response to the dir_pathtrans and fsys_getroot calls.  Socket
servers give out ports in response to the socket_create and
socket_accept calles.)  However, the I/O protocol provides methods of
obtaining new ports that refer to the same underlying object as another
port.  In response to all of these calls, all underlying state
(including, but not limited to, the default file poirter, open mode
bits, and underlying object) must be shared between the old and new
ports.  In the following descriptions of these calls, this is what is
meant by saying that the new port is "identical" to the old port.  They
all must return send-rights to a newly-constructed Mach port.

The io_duplicate call simply returns another port which is identical
to an existing port and has the same uid and gid set.

The io_restrict_auth call should return another port, identical to the
provided port, but which has a smaller associated uid and gid set.  The
uid and gid sets of the new port should be the intersection of the set
on the existing port and the lists of uids and gids provided in the
call.

The io_reauthenticate call is used when users wish to have an entirely
new set of uids or gids associated with a port.  When such a call is
received, the server must create a new port, and then make the call
auth_server_authenticate to the auth server.  The rendezvous port for
the auth_server_authenticate call is the I/O port to which was made the
io_reauthenticate call.  The rend_int parameter should be copied from
the io_reauthenticate call.  The I/O server also gives the auth server a
new port; this should be a newly created port identical to the old port.
The auth server will return the set of uids and gids associated with the
user, and guarantees that the new port will go directly to the user that
possessed the associated authentication port.

@node Simple operations
@section Simple operations

Users write to I/O ports by calling the io_write RPC.  They specify an
offset parameter; if the object supports writing at arbitrary offsets,
this should be honored.  If -1 is passed as the offset, then the default
file pointer should be used.  The server should return the amount of
data which was successfully written.  If the operation was interrupted
after some but not all of the data was written, then it is considered to
have succeeded and should return the amount written.  If the port is not
an I/O port at all, the error EOPNOTSUPP should be returned.  If the
port is an I/O port, but does not happen to support writing, then EBADF
should be returned.  

Users read from I/O ports by calling the io_read RPC.  The specify the
amount of data they wish to read and the offset.  The offset has the
same meaning as for io_write above.  The server should return the data
read.  If the call is interrupted after same data has been read (and the
operation is not idempotent) then the server should return the amount
read, even if less than the amount requested.  The server should return
as much data as possible, but never more than requested by the user.  If
there is no data, but there might be later, the call should block until
data becomes available.  End-of-file conditions are indicated by
returning zero bytes.  If the call is interrupted after some data has
been read, but the call is idempotent, then the server may return EINTR
rather than actually filling the buffer (taking care that any
modifications of the default file pointer have been reversed).

Objects are divided into two categories: seekable and non-seekable.
Seekable objects are required to accept arbitrary offset parameters in
the io_read and io_write calls, and to implement the io_seek call.
Nonseekable objects must ignore the offset parameters to io_read and
io_write, and should return ESPIPE to the io_seek call.  

On seekable objects, io_seek is used to change the default file pointer
for reads and writes.  (See the C library manual for the interpretation
of the WHENCE and OFFSET arguments, and why the grammatically incorrect
term `whence' is used.)  It returns the new offset as modified by
io_seek.  

The io_readable interface should return the amount of data which can be
immediately read.  For the special technical meaning of "immediately",
see the description of asynchronous I/O.  (*Note: Asynchronous I/O.)

@node Open modes
@section Open modes

Each port is identified with a set of bits that affect its operation.
These bits are modified with the io_set_all_openmodes call and fetched
with the io_get_openmodes.  In addition, the io_set_some_openmodes and
io_clear_some_openmodes do atomic read/writes of the openmodes.

The O_APPEND bit, when set, changes the behavior of io_write when it
uses the default file pointer on seekable objects.  When io_write is
done on a port with the O_APPEND bit set, is must set the filepointer to
one more than the "maximum correct value" (described below) before doing
the write (which would then increment the file pointer as usual).  This
update must be atomically bound to the actual data write with respect to
other users of io_read, io_write, and io_seek.

A "correct value" for the file pointer which, when provided to io_read,
will successfully return at least one byte of data and not end-of-file.
The "maximum correct value" referred to in the description of O_APPEND
is the maximum such correct value.  (For ordinary files [see the
description of the file protocol for more information] this is the same
as the current file size.)

The O_FSYNC bit, when set, should cause io_write not to delay writing
data to underlying media in any fashion.  

The O_NONBLOCK bit, when set, should prevent read and write from
blocking.  They should copy such data as is immediately available.  If
no data is immediately available they should return EWOULDBLOCK.  

The definition of "immediate" is more or less server dependent.  Some
servers (disk based file servers, most notable) regard all data as
immediatebly available.  The one criterion is that something which must
happen immediately may not wait for any user-synchronizable event.  

The O_ASYNC bit is deprecated; its use is documented in the following
section.  This bit must be shared between all users of the same
underlying object.

@node Asynchronous I/O
@section Asynchronous I/O

Users may wish to be notified of when I/O can be done without blocking;
they use the io_async call to indicate this to the server.  In the
io_async call the user provides a port on which will be sent sig_post
messages as I/O becomes possible.  The server should return a port which
will be used as a reference port in sig_post messages.  Each io_async
call should generate a new reference port.  (See the C library manual
for information on how to send sig_post messages.)

The server should send one SIGIO signal to each registered async user
everytime I/O becomes possible.  I/O is possible if at least one byte
can be read or written immediately.  (The definition of ``immediately''
must be the same as for the implementation of the O_NONBLOCK flag.)
Everytime io_read or io_write is called, another signal should be sent
to each user if I/O is still possible.

Some objects may also define "urgent" conditions.  Such servers should
send the SIGURG signal to each registered async user anytime an urgent
condition appears.  After any RPC that has the possibility of clearing
the urgent condition, the signal should again be sent to all registered
users if the urgent condition is still present.

A more fine-grained mechanism for doing async I/O is the io_select call.
The user specifies the kind of access desired, and a send-once right.
If I/O of the kind the user desires is immediately possible, then the
server should return so indicating, and destroy the send-once right.  If
I/O is not immediately possible, the server should save the send-once
right, and send a select_done message as soon as I/O becomes immediately
possible.  (Again, the definition of ``immediate'' must be the same for
io_select, io_async, and O_NONBLOCK.)

For compatibility, a deprecated feature (known as icky async I/O) is
provided.  The calls io_mod_owner and io_get_owner are used to set the
``owner'' of the object; either a pid or a pgrp (negative) is provided.
Whenever the I/O server is sending messages to all the io_async users,
if the O_ASYNC bit is set for any user of the object, it should also
send a signal to the owning pid/pgrp.  The ID port for this call should
be different from all the io_async id ports given to users.  Users may
find out what ID port will be used by calling io_get_icky_async_id.

@node Information queries
@section Information queries

Users may call io_stat to find out information about the I/O object.
Most of the fieds of a struct stat are meaningful only for files.  All
objects, however, are required to support the fields st_fstype, st_fsid,
st_ino, st_atime, st_atime_usec, st_mtime_user, st_ctime, st_ctime_usec,
st_blksize.

st_fstype, st_fsid, and st_ino must be unique for the underlying object
across the entire system.

st_atime and st_atime_usec hold the seconds and microseconds,
respectively, of the system clock at the last time the object was
read with io_read.

st_mtime and st_mtime_usec hold the second and microseconds,
respectively, of the system clock at the last time the object was
written with io_write.

st_ctime and st_ctime_usec hold the seconds and microseconds,
respectively, of the system clock at the last time permanent meta-data
associated with the object was changed.  The exact operations which
couse such an update are server-dependent, but must include the creation
of the object.

st_blksize gives the optimal I/O size for io_read and io_write; users
should endeavor to read and write amounts which are multiples of the
optimal size, and to use offsets which are multiples of the optimal
size.  

In addition, objects which are seekable should set st_size to the
"maximum correct value" described above in the description of the
O_APPEND flag.

The st_uid and st_gid fields are unrelated to the ``owner'' as described
above for icky async I/O.  

Users may find out the version of the server they are talking to by
calling io_server_version; this should return strings and integers
describing the version number of the server, as well as its name.

@node Mapped data
@section Mapped data

Servers may optionally implement the io_map call; they may do so even if
the do not implement the facilities described in the following chapter.
The ports returned by io_map must implement the XP kernel interface and
be suitable as arguments to vm_map.

Seekable objects must allow access from 0 to the "maximum correct value"
described for O_APPEND.  Whether they provide access beyond such a point
is server dependent; in addition, the meaning of such an object for a
non-seekable object is server dependent.  However, servers which
implement the facilities of the next section are bound to certain
requirements about which addresses in the memory objects provided by
io_map must be valid.  Simply put, any user following the rules
described in the next chapter should not get any memory faults except as
explicitly permitted by the next chapter.

@node Shared I/O
@chapter Shared I/O

I/O servers may, optionally, provide the services described in this
chapter in addition to the generic services described in the previous
chapter.  These facilities allow users to read and write I/O objects
without making RPC's to the server in most circumstances.  

@menu
* Rules::               The rules users must obey in using
                        shared I/O.
* Examples::            Examples of the way different types
                        of servers could implement shared I/O.
@end menu

@node Rules
@section Rules

Any server implementing the facilities of this chapter must also support
the io_map call as described in the previous chapter.

Users of the shared I/O facilities must call io_map_cntl; this will
return a memory object, called the shared page object.  One page of this
object should be mapped from offset zero into the user's address space.
At the front of this page is a struct shared_io as described in
<hurd/shared.h>.  Frequent reference will be made to the members of this
structure in this chapter, without further qualification.  

Only one shared user can be active on a given port at a time.  If
io_map_cntl is called for a port on which a user is already active, the
server should return EBUSY, and which point the user should call
io_duplicate to obtain a new port, and call io_map_cntl there.

@menu
* Conch::                       How access to the shared page is mediated
* Access rules::                Where in the io_map memory objects users
                                may peek and poke
* Behavior modification::       Modifications of behavior
* Status notifications::        Calls users should make at certain
                                times to keep the server abreast of the
                                current state of the object
* Violations::                  When the rules are broken

@end menu

@node Conch
@subsection Conch

Access to the shared page is mediated through a facility known as the
``conch''.  The ``lock'' field of the shared page protects the
conch_status field; this lock must be acquired with spin_lock before
conch_status may be modified or examined.  

If the conch_status field is USER_HAS_CONCH or USER_RELEASE_CONCH, then
the user has the conch, and may access the shared page after releasing
the spin lock. .  If the conch status field is USER_COULD_HAVE_CONCH,
then the user may immediately set conch_status to USER_HAS_CONCH, and
proceed to access the shared page after releasing the spin lock.  If the
conch status is USER_HAS_NOT_CONCH, then the user should release the
spin lock, and call io_get_conch.  Upon return from io_get_conch, the
user should reacquire the spin lock and check the conch status again.

When the user is through accessing the shared page, the user should
acquire the spin lock and examine the conch_status field.  If it has
been set to USER_RELEASE_CONCH, then the user should release the spin
lock and call io_release_conch.

The implementation of io_read and io_write must not modify the file
contents except when the server is holding the conch; users who wish to
be atomic with respect to those functions should be similarly reticent.

The server should guarantee that at most one user has the conch at a
time; the server may only have the conch if no user does.  The server
may not modify conch_status or the shared page if the status is
USER_HAS_CONCH except to set it to USER_RELEASE_CONCH, thus requesting a
call to io_release_conch.

@node Access rules
@subsection Access rules

The conch fields file_size, read_size, and prenotify_size affect which
areas of the data objects may be accessed.  In addition, for
non-seekable objects, the file pointers rd_file_pointer,
wr_file_pointer, and xx_file_pointer affect which areas may be accessed.

For seekable objects, the read object may be read from offset 0 through
the minimum of file_size and read_size.

For seekable objects, the write object may be modified from offset 0
through the prenotify_size.  

For nonseekable objects, the read object may be read from
rd_file_pointer through the minimum of file_size and read_size. 

For nonseekable objects, the write object may be modified from
wr_file_pointer through prenotify_size.

The server may permit access outside these regions, but data will not
necessarily be preserved for any length of time if so written.  If the
server wishes to deny such access, it should fault with EIO.  Servers
may also issue faults on modifications of the write object for reasons
such as EDQUOT and ENOSPC, as well as reporting hardware errors with
EIO.  Serveys may only fault valid addresses in the read object with EIO
to indicate hardware failure.

The foo field should be ignored if the value use_foo is clear in the
shared page; this may result in there being no maximum valid address for
a particular access.  In that case, the object may be accessed to the
end of its virtual address space.

If use_file_size is set, the user may increase the file_size, but may
not decrease it, to indicate the new "maximum correct value" as
described for O_APPEND.  Normally writes which extend beyond the current
file_size should extend it to the end of the write.

The xx_file_pointer for seekable objects is the same as the default file
pointer used by io_read and io_write.

If use_read_size is set and the user wishes to read past read_size, she
may call io_readsleep, which will return as soon as read_size is
increased.  If read_block_reason is set to RBR_BUFFER_FULL, then the
read_size will not be increased until the rd_file_pointer is increased.

If use_prenotify_size is set and the user wishes to write past
prenotify_size, she may call io_prenotify, specifying the maximum offset
the user intends to write.  The server should return when prenotify_size
has been increased, but is not obligated to extend it as far as the user
wishes.  In addition, io_prenotify may return errors such as ENOSPC,
indicating that the prenotify_size cannot be increased.

Seekable objects may modify the xx_file_pointer at will (including
pointing past read_size, file_size, or prenotify_size).  Non-seekable
objects, however, may only increase the rd_file_pointer and
wr_file_pointer.  In addition, they may not modify them to point past
the valid data as described above.  Failing to advance them may prevent
the read_size or prenotify_size from being increased.

If eof_notify is set, then the user may attempt to have the file_size to
be increased by calling io_eofnotify after "noticing" the current file
size limit.  io_eofnotify must return immediately, but need not increase
the file_size or clear user_file_size.  (However, if it is impossible
for io_eofnotify to ever do anything, then the server should not set
eof_notify.)

@node Behavior modification
@subsection Behavior modification

The server flag append_mode is a copy of the O_APPEND open mode bit; if
it is set, then the user should do writes at file_size and set the file
pointer appropriately (this applies only if the user would be writing at
the file pointer in the first place).

@node Status notification
@subsection Status notification

The flag do_sigio requests the user to call io_sigio every time the file
pointers or the file_size have been changed.

If use_postnotify_size is set, then the user should call io_postnotify
after writing data that extends past postnotify_size.  Writes beyond
postnotify_size may be buffered internally to the server for arbitrarily
long periods until io_postnotify is called, regardless of@c the setting
of the O_FSYNC bit.

After modifying or reading the object contents, the user should set the
written or accessed fields respectively.  (Users who fail to set these
fields will not thereby defeat the mtime/atime mechanism.)

If the flag use_eof is set, then users should call io_eofnotify after
reading up to the file_size and noticing it.

@node Violations
@subsection Violations

Users who hold the conch for too long while conch_status is set to
USER_RELEASE_CONCH may have the conch stolen from them and their
conch_status unilaterally downgraded to USER_HAS_NOT_CONCH.  Users who
hold the spin lock for too long (where this ``too long'' is much much
shorter than the previous one) will have the spin lock stolen from them. 

Users who read or write outside the valid regions described above may
get memory faults and may not expect data written to be saved in any
fashion.

Users who write the read object (when it is different from the write
object) may or may not get faults; they may not expect such data to be
saved in any fashion.

Users who fail to call io_postnotify may cause data to be buffered for
arbitrarily long periods.

Users who reduce rd_file_pointer, wr_file_pointer, or file_size will
have such modifications ignored.

Users may not call any server functions (whether in the I/O protocol or
another) while holding the conch except for those specified in this
chapter.  Such calls may block or fail silently.