summaryrefslogtreecommitdiff
path: root/open_issues/fakeroot_eagain.mdwn
blob: 168ddf7dc459163da8a2f7028136e60665d8ad44 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
[[!meta copyright="Copyright © 2012, 2013 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_glibc open_issue_porting]]


# IRC, freenode, #hurd, 2012-12-05

    <braunr> rbraun   18813 R        2hrs ln -sf ../af_ZA/LC_NUMERIC
      debian/locales-all/usr/lib/locale/en_BW/LC_NUMERIC
    <braunr> when building glibc
    <braunr> is this a known issue ?
    <tschwinge> braunr: No.  Can you get a backtrace?
    <braunr> tschwinge: with gdb you mean ?
    <tschwinge> Yes.  If you have any debugging symbols (glibc?).
    <braunr> or the build log leading to that ?
    <braunr> ok, i will next time i have it
    <tschwinge> OK.
    <braunr> (i regularly had it when working on the pthreads port)
    <braunr> tschwinge:
      http://www.sceen.net/~rbraun/hurd_glibc_build_deadlock_trace
    <braunr> youpi: ^
    <youpi> Mmm, there's not so much we can do about this one
    <braunr> youpi: what do you mean ?
    <youpi> the problem is that it's really a reentrency issue of the libc
      locale
    <youpi> it would happen just the same on linux
    <braunr> sure
    <braunr> but hat doesn't mean we can't report and/or fix it :)
    <youpi> (the _nl_state_lock)
    <braunr> do you have any workaround in mind ?
    <youpi> no
    <youpi> actually that's what I meant by "there's not so much we can do
      about this"
    <braunr> ok
    <youpi> because it's a bad interaction between libfakeroot and glibc
    <youpi> glibc believe fxtstat64 would never call locale functions
    <youpi> but with libfakeroot it does
    <braunr> i see
    <youpi> only because we get an EAGAIN here
    <braunr> but hm, doesn't it happen on linux ?
    <youpi> EAGAIN doesn't happen on linux for fxstat64, no :)
    <braunr> why does it happen on the hurd ?
    <youpi> I mean for fakeroot stuff
    <youpi> probably because fakeroot uses socket functions
    <youpi> for which we probably don't properly handleEAGAIN
    <youpi> I've already seen such kind of issue
    <youpi> in buildd failures
    <braunr> ok
    <youpi> (so the actual bug here is EAGAIN
    <youpi> )
    <braunr> yes, so we can do something about it
    <braunr> worth a look
    <pinotree> (implement sysv semaphores)
    <youpi> pinotree: if we could also solve all these buildd EAGAIN issues
      that'd be nice :)
    <braunr> that EAGAIN error might also be what makes exim behave badly and
      loop forever
    <youpi> possibly
    <braunr> i've updated the trace with debugging symbols
    <braunr> it fails on connect
    <pinotree> like http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=563342 ?
    <braunr> it's EAGAIN, not ECONNREFUSED
    <pinotree> ah ok
    <braunr> might be an error in tcp_v4_get_port


## IRC, freenode, #hurd, 2012-12-06

    <braunr> hmm, tcp_v4_get_port sometimes fails indeed
    <gnu_srs> braunr: may I ask how you found out, adding print statements in
      pfinet, or?
    <braunr> yes
    <gnu_srs> OK, so that's the only (easy) way to debug.
    <braunr> that's the last resort
    <braunr> gdb is easy too
    <braunr> i could have added a breakpoint too
    <braunr> but i didn't want to block pfinet while i was away
    <braunr> is it possible to force the use of fakeroot-tcp on linux ?
    <braunr> the problem seems to be that fakeroot doesn't close the sockets
      that it connected to faked-tcp
    <braunr> which, at some point, exhauts the port space
    <pinotree> braunr: sure
    <pinotree> change the fakeroot dpkg alternative
    <braunr> ok
    <pinotree> calling it explicitly `fakeroot-tcp command` or
      `dpkg-buildpackage -rfakeroot-tcp ...` should work too
    <braunr> fakeroot-tcp looks really evil :p
    <braunr> hum, i don't see any faked-tcp process on linux :/
    <pinotree> not even with `fakeroot-tcp bash -c "sleep 10"`?
    <braunr> pinotree: now yes
    <braunr> but, does it mean faked-tcp is started for *each* process loading
      fakeroot-tcp ?
    <braunr> (the lib i mean)
    <pinotree> i think so
    <braunr> well the hurd doesn't seem to do that at all
    <braunr> or maybe it does and i don't see it
    <braunr> the stale faked-tcp processes could be those that failed something
      only
    <pinotree> yes, there's also that issue: sometimes there are stake
      faked-tcp processes
    <braunr> hum no, i see one faked-tcp that consumes cpu when building glibc
    <pinotree> *stale
    <braunr> it's the same process for all commands
    <pinotree> <braunr> but, does it mean faked-tcp is started for *each*
      process loading fakeroot-tcp ?
    <pinotree> → everytime you start fakeroot, there's a new faked-xxx for it
    <braunr> it doesn't look that way
    <braunr> again, on the hurd, i see one faked-tcp, consuming cpu while
      building so i assume it services libfakeroot-tcp requests
    <pinotree> yes
    <braunr> which means i probably won't reproduce the problem on linux
    <pinotree> it serves that fakeroot under which the binary(-arch) target is
      run
    <braunr> or perhaps it's the normal fakeroot-tcp behaviour on sid
    <braunr> pinotree: a faked-tcp that is started for each command invocation
      will implicitely make the network stack close all its sockets when
      exiting
    <braunr> pinotree: as our fakeroot-tcp uses the same instance of faked-tcp,
      it's a lot more likely to exhaust the port space
    <pinotree> i see
    <braunr> i'll try on sid and see how it behaves
    <braunr> pinotree: on the other hand, forking so many processes at each
      command invocation may make exec leak a lot :p
    <braunr> or rather, a lot more
    <braunr> (or maybe not, since it leaks only in some cases)

[[exec_memory_leaks]].

    <braunr> pinotree: actually, the behaviour under linux is the same with the
      alternative correctly set, whereas faked-tcp is restarted (if used at
      all) with -rfakeroot-tcp
    <braunr> hm no, even that isn't true
    <braunr> grr
    <braunr> pinotree: i think i found a handy workaround for fakeroot
    <braunr> pinotree: the range of local ports in our networking stack is a
      lot more limited than what is configured in current systems
    <braunr> by extending it, i can now build glibc \o/
    <pinotree> braunr: what are the current ours and the usual one?
    <braunr> see pfinet/linux-src/net/ipv4/tcp_ipv4.c
    <braunr> the modern ones are the ones suggested in the comment
    <braunr> sysctl_local_port_range is the symbol storing the range
    <pinotree> i see
    <pinotree> what's the current range on linux?
    <braunr> 20:44 < braunr> the modern ones are the ones suggested in the
      comment
    <pinotree> i see
    <braunr> $ cat /proc/sys/net/ipv4/ip_local_port_range 
    <braunr> 32768   61000
    <braunr> so, i'm not sure why we have the problem, since even on linux,
      netstat doesn't show open bound ports, but it does help
    <braunr> the fact faked-tcp can remain after its use is more problematic
    <pinotree> (maybe pfinet could grow a (startup-only?) option to change it,
      similar to that sysctl)
    <braunr> but it can also stems from the same issue gnu_srs found about
      closed sockets that haven't been shut down
    <braunr> perhaps
    <braunr> but i don't see the point actually
    <braunr> we could simply change the values in the code

    <braunr> youpi: first, in pfinet, i increased the range of local ports to
      reduce the likeliness of port space exhaustion
    <braunr> so we should get a lot less EAGAIN after that
    <braunr> (i've not committed any of those changes)
    <youpi> range of local ports?
    <braunr> see pfinet/linux-src/net/ipv4/tcp_ipv4.c, tcp_v4_get_port function
      and sysctl_local_port_range array
    <youpi> oh
    <braunr> EAGAIN is caused by tcp_v4_get_port failing at
    <braunr>                 /* Exhausted local port range during search? */
    <braunr>                 if (remaining <= 0)
    <braunr>                         goto fail;
    <youpi> interesting
    <youpi> so it's not a hurd bug after all
    <youpi> just a problem in fakeroot eating a lot of ports
    <braunr> maybe because of the same issue gnu_srs worked on (bad socket
      close when no clean shutdown)
    <braunr> maybe, maybe not
    <braunr> but increasing the range is effective
    <braunr> and i compared with what linux does today, which is exactly what
      is in the comment above sysctl_local_port_range
    <braunr> so it looks safe
    <youpi> so that means that the pfinet just uses ports 1024- 4999 for
      auto-allocated ports?
    <braunr> i guess so
    <youpi> the linux pfinet I meant
    <braunr> i haven't checked the whole code but it looks that way
    <youpi> ./sysctl_net_ipv4.c:static int ip_local_port_range_min[] = { 1, 1
      };
    <youpi> ./sysctl_net_ipv4.c:static int ip_local_port_range_max[] = { 65535,
      65535 };
    <youpi> looks like they have increased it since then :)
    <braunr> hum :)
    <braunr> $ cat /proc/sys/net/ipv4/ip_local_port_range 
    <braunr> 32768   61000
    <youpi> yep, same here
    <youpi> ./inet_connection_sock.c:	.range = { 32768, 61000 },
    <youpi> so there are two things apparently
    <youpi> but linux now defaults to 32k-61k
    <youpi> braunr: please just push the port range upgrade to 32Ki-61K
    <braunr> ok, will do
    <youpi> there's not reason not to do it


## IRC, freenode, #hurd, 2012-12-11

    <braunr> youpi: at least, i haven't had any failure building eglibc since
      the port range patch
    <youpi> good :)