1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
|
# "Supporting Larger ext2 File Systems in the Hurd"
# Written by Ognyan Kulev for presentation at FOSDEM 2005.
# Content of this file is in public domain.
%include "default.mgp"
%page
%nodefault
%center, font "thick", size 5
Supporting Larger ext2 File Systems in the Hurd
%font "standard", size 4
Ognyan Kulev
%size 3
<ogi@fmi.uni-sofia.bg>
%size 4
FOSDEM 2005
%page
Need for supporting larger file systems
Active development during 1995-1997
Hurd 0.2 was released in 1997 and it was very buggy
Many bugs are fixed since then
The 2G limit for ext2 file systems becomes more and more annoying
%page
Timeline
2002: Time for graduating, fixing the 2G limit in Hurd's ext2fs and implementing ext3fs were chosen for MSc thesis
2003: First alfa quality patch
2004: Graduation, ext2fs patch in Debian, but ext3fs is unstable
%page
User pager in GNU Mach
Address space
memory_object_data_supply
memory_object_data_return
Memory object (Mach concept)
pager_read_page
pager_write_page
User-supplied backstore (libpager concept)
%page
Current ext2fs
Memory mapping of the whole store
Applies only for metadata!
bptr (block -> data pointer)
= image pointer + block * block_size
Inode and group descriptor tables are used as if they are continous in memory
%page
Patched ext2fs, part one
Address space region
mapping
Array of buffers
association
Store
Association of buffers changes (reassocation)
It's important reassociation to occur on buffers that are not in core
%page
Patched ext2fs, part two
Always use buffer guarded by
disk_cache_block_ref (block -> buffer)
disk_cache_block_deref (release buffer)
Buffer = data + reference count + flags (e.g. INCORE)
Calling some functions implies releasing buffer:
pokel_add (pokels are list of dirty buffers)
record_global_poke (use pokel_add)
sync_global_ptr (sync immediately)
record_indir_poke (use pokel_add)
Use ihash for mapping block to buffer
%page
When unassociated block is requested
%font "typewriter", size 4, cont
retry:
i = hint;
while (buffers[i] is referenced or in core) {
i = (i + 1) % nbuffers;
if (i == hint) {
return_unreferenced_buffers ();
goto retry;
}
}
hint = i + 1;
deassociate (buffers[i]);
associate (buffers[i], block);
return buffers[i];
%page
Notification for evicted pages
Notification is essential for optimal reassociation
Precious pages in Mach
Slight change to API and ABI of libpager is required
Mach sometimes doesn't notify!
%page
Pager optimization
1. Mach returns page to pager without leaving it in core
2. Pager becomes unlocked because of calling callback pager_write_page
3. User task touches the page
4. Mach requests the same page from pager
5. XXX Pager supplies the page that was returned by Mach, instead of calling callback pager_read_page
%page
Future directions
Committing in the Hurd :-)
Block sizes of 1K and 2K
Run-time option for buffer array size (?)
Compile-time option for memory-mapping the whole store
Upgrade of UFS
Extended attributes (EAs) and Access control lists (ACLs)
# Local Variables:
# mgp-options: "-g 640x480"
# End:
|