1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
|
-*- Mode: outline -*-
* Interfaces
** All interfaces specified by IEEE Std 1003.1-2001 are present, however,
pthread_kill and pthread_sigmask are defined in <pthread.h> and not
<signal.h> as they should be. Once we are compiled with glibc,
this should be eaiser.
* Test cases. Can never have enough.
* Ports
Port to other kernels (e.g. Linux and FreeBSD) and test on other
platforms.
* Implementation details
** pthread_atfork
This cannot be implemented without either changing glibc to export
some hooks (c.f. libc/sysdeps/mach/hurd/fork.c) or by providing a
custom fork implementation that wraps the origial using dlopen et
al.
** Scheduling and priorities
We do not support scheduling right now in any way what so ever.
This affects:
pthread_attr_getinheritsched
pthread_attr_setinheritsched
pthread_attr_getschedparam
pthread_attr_setschedparam
pthread_attr_getschedpolicy
pthread_attr_setschedpolicy
pthread_attr_getscope
pthread_attr_setscope
pthread_mutexattr_getprioceiling
pthread_mutexattr_setprioceiling
pthread_mutexattr_getprotocol
pthread_mutexattr_setprotocol
pthread_mutex_getprioceiling
pthread_mutex_setprioceiling
pthread_setschedprio
pthread_getschedparam
pthread_setschedparam
** Alternate stacks
Supporting alternate stacks (via pthread_attr_getstackaddr,
pthread_attr_setstackaddr, pthread_attr_getstack,
pthread_attr_setstack, pthread_attr_getstacksize and
pthread_attr_setstacksize) is no problem as long as they are of the
correct size and have the correct alignment. This is due to
limitations in the Hurd TSD implementation
(c.f. <hurd/threadvar.h>).
** Cancelation
*** Cancelation points
The only cancelation points are pthread_join, pthread_cond_wait,
pthead_cond_timedwait and pthread_testcancel. Need to explore if
the hurd_sigstate->cancel_hook (c.f. <hurd/signal.h>) provides the
desired semantics. If not, must either wrap the some functions
using dlsym or wait until integration with glibc.
*** Async cancelation
We inject a new IP into the cancelled (running) thread and then
run the cancelation handlers
(c.f. sysdeps/mach/hurd/pt-docancel.c). The handlers need to have
access to the stack as they may use local variables. I think that
this method may leave the frame pointer in a corrupted state if
the thread was in, for instance, the middle of a function call.
The robustness needs to be confirmed.
** Process Shared Attribute
Currently, there is no real support for the process shared
attribute. spinlocks work because we just use a test and set loop,
however, barriers, conditions mutexes and rwlocks, however, signal
wakeups via ports of which the names are process local.
We could have some process local data that is hashed to via the
address of the data structure. Then the first thread that blocks
per process would spin on the shared memory area and all others
would then block as normal. When the resource became available,
the first thread would signal the other local threads as necessary.
Alternatively, there could be some server, however, this opens a
new question: what can we use as an authentication agent.
** Locking algorithm
When a thread blocks, it puts itself on a queue and then waits for
a message on a thread local port. The thread which eventually does
the wakeup sends a message to the waiter thereby waking it up. If
the wakeup is a broadcast wakeup (e.g. pthread_cond_broadcast,
pthread_barrier_wait and pthread_rdlock_unlock), the thread must
send O(N) messages where N is the number of waiting threads. If
all the threads instead received on a lock local (rather than
thread local) port then the thread which eventually does the wake
need just do one operation, mach_port_destroy and all of the
waiting threads would wakeup and get MACH_RCV_PORT_DIED back from
mach_msg. Note that the trade off is that the port must be
recreated. This needs to be benchmarked.
A possible problem with this is scheduling priorities. There may
be a preference for certain threads to wakeup before others
(especially if we are not doing a broadcast, for instance,
pthread_mutex_unlock and pthread_cond_signal). If we take this
approach, the kernel chooses which threads are awakened. If we
find that the kernel makes the wrong choices, we can still overcome
this by merging the two algorithms: have a list of ports sorted in
priority order and the waker does a mach_port_destroy on each as
appropriate.
** Barriers
Barriers can be very slow and the contention can be very high. The
above algorithm is very appealing, however, this may be augmented
with an initial number of spins and yields. It is expected that
all of the threads reach the barrier within close succession, thus
queuing a message may be more expensive. This needs to be
benchmarked.
** Clocks
*** pthread_condattr_setclock allows a process to specify a clock for
use with pthread_cond_timedwaits. What is the correct default for
this, right now, we use CLOCK_REALTIME, however, we are really
using the system clock which, if I understand correctly, is
completely different.
*** Could we even use other clocks? mach_msg uses a relative time against
the system clock.
*** pthread_getcpuclockid just returns CLOCK_THREAD_CPUTIME_ID if defined.
Is this the correct behavior?
** Timed Blocking
*** pthread_cond_timedwait, pthead_mutex_timedlock, pthread_rwlock_timedrdlock
and pthread_rwlock_timedwrlock all take absolute times. We need
to convert them to relative times for mach_msg. Is there a way
around this? How will clock skew affect us?
** weak aliases
Use them consistently and correctly and start by reading
http://sources.redhat.com/ml/libc-alpha/2002-08/msg00278.html.
* L4 Specific Issues
** Stack
*** Size
The stack size is defined to be a single page in
sysdeps/l4/hurd/pt-sysdep.h. Once we are able to setup regions,
this can be expanded to two megs as suggested by the Mach version.
Until then, however, we need to allocate too much physical memory.
*** Deallocation
__thread_stack_dealloc currently does not deallocate the stack.
For a proper implementation, we need a working memory manager.
** Scheduling
*** yield
[L4] We cannot use yield for spin locks as L4 only yields to threads of
priority which are greater than or equal to the yielding thread.
If there are threads of lower priority, they are not considered;
the yielding thread is just placed back on the processor. This
introduces priority inversion quite quickly. L4 will not add a
priority suppression function call. As such, we need to do
an ipc with a small time out and then use exponential back off to
do the actual waiting. This sucks.
** Stub code
[L4] We include <task_client.h> in pt-start.c, however, we need a library
so we do not have to play with the corba stuff.
** Root server and Task server
*** Getting the tids.
pt-start.c has a wonderfully evil hack that will never work well.
** Paging
We set the pager to the root server. Evil. Fix this in pt-start.c.
|