summaryrefslogtreecommitdiff
path: root/hurd/translator/tmpfs/notes_various.mdwn
blob: d16210ca0b2a7b803d32919df9a1f3bcdfcafe0f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
[[!meta copyright="Copyright © 2005, 2006, 2007, 2008, 2009, 2011 Free Software
Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_hurd]]

    <antrik> hde: what's the status on tmpfs?
    <hde> Broke
    <hde> k0ro traced the errors like the assert show above to a pager problem.
      See the pager cannot handle request from multiple ports and tmpfs sends
      request using two differ ports, so to fix it the pager needs to be hacked
      to support multiple requests.
    <hde> You can enable debugging in the pager by changing a line from dprintf
      to ddprintf I can tell you how if you want.
    <antrik> and changing tmpfs to use a single port isn't possible?...
    <hde> antrik, I am not sure.
    <hde> IIRC k0ro was saying it cannot be changed and I cannot recall his
      reasons why.
    <sdschulze> antrik: Doing it the quick&dirty way, I'd just use an N-ary
      tree for representing the directory structure and mmap one new page (or
      more) for each file.
    <hde> sdschulze, What are you talking about?
    <sdschulze> hde: about how I would implement tmpfs
    <hde> O
    <azeem> sdschulze: you don't need to reimplement it, just fix it :)
    <sdschulze> azeem: Well, it seems a bit more difficult than I considered.
    <sdschulze> I had assumed it was implemented the way I described.
    <hde> O and the assert above gets triggered if you don't have a
      default-pager setup on /servers/default-pager
    <hde> the dir.c:62 assert that is.
    <azeem> hde: you sure?  I think I have one
    <hde> I am almost sure.
    <azeem> mbanck@beethoven:~$ showtrans /servers/default-pager
    <azeem> /hurd/proxy-defpager
    <azeem> isn't that enough?
    <hde> It is suppose to be.
    <hde> Try it as root
    <hde> I was experiecing alot of bugs as a normal user, but according to
      marcus it is suppose to work as root, but I was getting alot of hangs.
    <azeem> hde: same issue, sudo doesn't work
    <hde> sucky, well then there are alot of bugs. =)
    <azeem> eh, no
    <azeem> I still get the dir.c assert
    <sdschulze> me too
    <sdschulze> Without it, I already get an error message trying to set tmpfs
      as an active translator.

---

    <hde> I think I found the colprit.
    <hde> default_pager_object_set_size --> This is were tmpfs is hanging.
    <hde> mmm Hangs on the message to the default-pager.

---

    <hde> Well it looks like tmpfs is sending a message to the default-pager,
      the default-pager then receives the message and, checks the seqno.  I
      checked the mig gen code and noticed that the seqno is the reply port, it
      this does not check out then the default pager is put into a what it
      seems infinte condition_wait hoping to get the correct seqno.
    <hde> Now I am figuring out how to fix it, and debugging some more.

---

    <marco_g> hde: Still working on tmpfs?
    <hde> Yea
    <marco_g> Did you fix a lot already?
    <hde> No, just trying to narrow down the reason why we cannot write file
      greater then 4.5K.
    <marco_g> ahh
    <marco_g> What did you figure out so far?
    <hde> I used the quick marcus fix for the reading assert.
    <marco_g> reading assert?
    <hde> Yea you know ls asserted.
    <marco_g> oh? :)
    <hde> Because, the offsets changed in sturct dirent in libc.
    <hde> They added 64 bit checks.
    <hde> So marcus suggested a while ago on bug-hurd to just add some padding
      arrays to the struct tmpfs_dirent.
    <hde> And low and behold it works.
    <marco_g> Oh, that fix.
    <hde> Yup
    <hde> marco_g, I have figured out that tmpfs sends a message to the
      default-pager, the default-pager does receive the message, but then
      checks the seqno(The reply port) and if it is not the same as the
      default-pagers structure->seqno then she waits hoping to get the correct
      one.  Unfortantly it puts the pager into a infinite lock and never come
      out of it.
    <marco_g> hde: That sucks...
    <marco_g> But at least you know what the problem is.
    <hde> marco_g, Yea, now I am figuring out how to fix it.
    <hde> Which requires more debugging lol.
    <hde> There is also another bug, default_pager_object_set_size in
    <hde> mach-defpager does never return when called and makes tmpfs hang. I
    <hde> will have a closer look at this later this week.

---

    <hde> Cool, now that I have two pagers running, hopefully I will have less
      system crashes.
    <marcus> running more than one pager sounds like trouble to me, but maybe
      hde means something different than I think
    <hde> Well the other pager is only for tmpfs to use.
    <hde> So I can debug the pager without messing with the entire system.
    <hde> marcus, I am trying ti figure out why diskfs_object_set_size waits
      forever.  This way when the pager becomes locked forever I can turn it
      off and restart it.  When I was doing this with only one mach-defpager
      running the system would crash.
    <marcus> hde: how were you able to start two default pagers??
    <hde> Well you most likely will not think my way of doing it was correct,
      and I am also not sure if it is lol.  I made my hacked version not stop
      working if one is alreay started.

---

    <hde> See, the default-pager has a function called
      default_pager_object_set_size this sets the size for a memory object,
      well it checks the seqno for each object if it is wrong it goes into a
      condition_wait, and waits for another thread to give it a correct seqno,
      well this never happens.
    <hde> Thus, you get a hung tmpfs and default-pager.
    <hde> pager_memcpy (pager=0x0, memobj=33, offset=4096, other=0x20740,
      size=0x129df54, prot=3) at pager-memcpy.c:43
    <hde> bddebian, See the problem?
    <bddebian>  pager=0x0?
    <hde> Yup
    <hde> Now wtf is the deal, I must debug.
    <hde>  -- Function: struct pager * diskfs_get_filemap_pager_struct
    <hde>           (struct node *NP)
    <hde>      Return a `struct pager *' that refers to the pager returned by
    <hde>      diskfs_get_filemap for locked node NP, suitable for use as an
    <hde>      argument to `pager_memcpy'.
    <hde> That is failing.
    <hde> If it is not one thing it is another.
    <bddebian> All of Mach fails ;-)
    <hde> It is alot of work to make a test program that uses libdiskfs.

---

    <bing> to run tmpfs as a regular user, /servers/default-pager must be
      executable by that user.  by default it seems to be set to read/write.
    <bing> $ sudo chmod ugo+x /servers/default-pager
    <bing> you can see the O_EXEC in tmpfs.c
    <bing> maybe this is just a debian packaging problem
    <bing> it's probably a fix to native-install i'd guess

---

    <bing> tmpfs is failing on default_pager_object_create with -308, which
      means server died
    <bing> i'm running it as a regular user, so it gets it's pager from
      /servers/default-pager
    <bing> and showtrans /servers/default-pager shows /hurd/proxy-defpager
    <bing> so i'm guessing that's the server that died

---

    <bing> this is about /hurd/tmpfs
    <bing> a filesystem in memory
    <bing> such that each file is it's own memory object
    <andar> what does that mean exactly?  it differs from a "ramdisk"?
    <bing> instead of the whole fs being a memory object
    <andar> it only allocates memory as needed?
    <bing> each file is it's own
    <bing> andar: yeah
    <bing> it's not ext2 or anything
    <andar> yea
    <bing> it's tmpfs :-)
    <bing> first off, echo "this" > that
    <bing> fails
    <bing> with a hang
    <bing> on default_pager_object_create
    <andar> so writing to the memory object fails
    <bing> well, it's on the create
    <andar> ah
    <bing> and it returns -308
    <bing> which is server died
    <bing> in mig-speak
    <bing> but if i run it as root
    <bing> things behave differently
    <bing> it gets passed the create
    <bing> but then i don't know what
    <bing> i want to make it work for the regular user
    <bing> it doesn't work as root either, it hangs elsewhere
    <andar> but it at least creates the memory object
    <bing> that's the braindump
    <bing> but it's great for symlinks!
    <andar> do you know if it creates it?
    <bing> i could do stowfs in it

---

    <antrik> bing: k0ro (I think) analized the tmpfs problem some two years ago
      or so, remember?...
    <antrik> it turns out that it broke due to some change in other stuff
      (glibc I think)
    <antrik> problem was something like getting RPCs to same port from two
      different sources or so
    <antrik> and the fix to that is non-trivial
    <antrik> I don't remember in what situations it broke exactly, maybe when
      writing larger files?
    <bing> antrik: yeah i never understood the explanation
    <bing> antrik: right now it doesn't write any files
    <bing> the change in glibc was to struct dirent
    <antrik> seems something more broke in the meantime :-(
    <antrik> ah, right... but I the main problem was some other change
    <antrik> (or maybe it never really worked, not sure anymore)