[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_glibc open_issue_porting]] # IRC, freenode, #hurd, 2012-12-05 rbraun 18813 R 2hrs ln -sf ../af_ZA/LC_NUMERIC debian/locales-all/usr/lib/locale/en_BW/LC_NUMERIC when building glibc is this a known issue ? braunr: No. Can you get a backtrace? tschwinge: with gdb you mean ? Yes. If you have any debugging symbols (glibc?). or the build log leading to that ? ok, i will next time i have it OK. (i regularly had it when working on the pthreads port) tschwinge: http://www.sceen.net/~rbraun/hurd_glibc_build_deadlock_trace youpi: ^ Mmm, there's not so much we can do about this one youpi: what do you mean ? the problem is that it's really a reentrency issue of the libc locale it would happen just the same on linux sure but hat doesn't mean we can't report and/or fix it :) (the _nl_state_lock) do you have any workaround in mind ? no actually that's what I meant by "there's not so much we can do about this" ok because it's a bad interaction between libfakeroot and glibc glibc believe fxtstat64 would never call locale functions but with libfakeroot it does i see only because we get an EAGAIN here but hm, doesn't it happen on linux ? EAGAIN doesn't happen on linux for fxstat64, no :) why does it happen on the hurd ? I mean for fakeroot stuff probably because fakeroot uses socket functions for which we probably don't properly handleEAGAIN I've already seen such kind of issue in buildd failures ok (so the actual bug here is EAGAIN ) yes, so we can do something about it worth a look (implement sysv semaphores) pinotree: if we could also solve all these buildd EAGAIN issues that'd be nice :) that EAGAIN error might also be what makes exim behave badly and loop forever possibly i've updated the trace with debugging symbols it fails on connect like http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=563342 ? it's EAGAIN, not ECONNREFUSED ah ok might be an error in tcp_v4_get_port ## IRC, freenode, #hurd, 2012-12-06 hmm, tcp_v4_get_port sometimes fails indeed braunr: may I ask how you found out, adding print statements in pfinet, or? yes OK, so that's the only (easy) way to debug. that's the last resort gdb is easy too i could have added a breakpoint too but i didn't want to block pfinet while i was away is it possible to force the use of fakeroot-tcp on linux ? the problem seems to be that fakeroot doesn't close the sockets that it connected to faked-tcp which, at some point, exhauts the port space braunr: sure change the fakeroot dpkg alternative ok calling it explicitly `fakeroot-tcp command` or `dpkg-buildpackage -rfakeroot-tcp ...` should work too fakeroot-tcp looks really evil :p hum, i don't see any faked-tcp process on linux :/ not even with `fakeroot-tcp bash -c "sleep 10"`? pinotree: now yes but, does it mean faked-tcp is started for *each* process loading fakeroot-tcp ? (the lib i mean) i think so well the hurd doesn't seem to do that at all or maybe it does and i don't see it the stale faked-tcp processes could be those that failed something only yes, there's also that issue: sometimes there are stake faked-tcp processes hum no, i see one faked-tcp that consumes cpu when building glibc *stale it's the same process for all commands but, does it mean faked-tcp is started for *each* process loading fakeroot-tcp ? → everytime you start fakeroot, there's a new faked-xxx for it it doesn't look that way again, on the hurd, i see one faked-tcp, consuming cpu while building so i assume it services libfakeroot-tcp requests yes which means i probably won't reproduce the problem on linux it serves that fakeroot under which the binary(-arch) target is run or perhaps it's the normal fakeroot-tcp behaviour on sid pinotree: a faked-tcp that is started for each command invocation will implicitely make the network stack close all its sockets when exiting pinotree: as our fakeroot-tcp uses the same instance of faked-tcp, it's a lot more likely to exhaust the port space i see i'll try on sid and see how it behaves pinotree: on the other hand, forking so many processes at each command invocation may make exec leak a lot :p or rather, a lot more (or maybe not, since it leaks only in some cases) [[exec_leak]]. pinotree: actually, the behaviour under linux is the same with the alternative correctly set, whereas faked-tcp is restarted (if used at all) with -rfakeroot-tcp hm no, even that isn't true grr pinotree: i think i found a handy workaround for fakeroot pinotree: the range of local ports in our networking stack is a lot more limited than what is configured in current systems by extending it, i can now build glibc \o/ braunr: what are the current ours and the usual one? see pfinet/linux-src/net/ipv4/tcp_ipv4.c the modern ones are the ones suggested in the comment sysctl_local_port_range is the symbol storing the range i see what's the current range on linux? 20:44 < braunr> the modern ones are the ones suggested in the comment i see $ cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000 so, i'm not sure why we have the problem, since even on linux, netstat doesn't show open bound ports, but it does help the fact faked-tcp can remain after its use is more problematic (maybe pfinet could grow a (startup-only?) option to change it, similar to that sysctl) but it can also stems from the same issue gnu_srs found about closed sockets that haven't been shut down perhaps but i don't see the point actually we could simply change the values in the code youpi: first, in pfinet, i increased the range of local ports to reduce the likeliness of port space exhaustion so we should get a lot less EAGAIN after that (i've not committed any of those changes) range of local ports? see pfinet/linux-src/net/ipv4/tcp_ipv4.c, tcp_v4_get_port function and sysctl_local_port_range array oh EAGAIN is caused by tcp_v4_get_port failing at /* Exhausted local port range during search? */ if (remaining <= 0) goto fail; interesting so it's not a hurd bug after all just a problem in fakeroot eating a lot of ports maybe because of the same issue gnu_srs worked on (bad socket close when no clean shutdown) maybe, maybe not but increasing the range is effective and i compared with what linux does today, which is exactly what is in the comment above sysctl_local_port_range so it looks safe so that means that the pfinet just uses ports 1024- 4999 for auto-allocated ports? i guess so the linux pfinet I meant i haven't checked the whole code but it looks that way ./sysctl_net_ipv4.c:static int ip_local_port_range_min[] = { 1, 1 }; ./sysctl_net_ipv4.c:static int ip_local_port_range_max[] = { 65535, 65535 }; looks like they have increased it since then :) hum :) $ cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000 yep, same here ./inet_connection_sock.c: .range = { 32768, 61000 }, so there are two things apparently but linux now defaults to 32k-61k braunr: please just push the port range upgrade to 32Ki-61K ok, will do there's not reason not to do it ## IRC, freenode, #hurd, 2012-12-11 youpi: at least, i haven't had any failure building eglibc since the port range patch good :)