IRC, freenode, #hurd, 2012-12-05

<braunr> rbraun   18813 R        2hrs ln -sf ../af_ZA/LC_NUMERIC
  debian/locales-all/usr/lib/locale/en_BW/LC_NUMERIC
<braunr> when building glibc
<braunr> is this a known issue ?
<tschwinge> braunr: No.  Can you get a backtrace?
<braunr> tschwinge: with gdb you mean ?
<tschwinge> Yes.  If you have any debugging symbols (glibc?).
<braunr> or the build log leading to that ?
<braunr> ok, i will next time i have it
<tschwinge> OK.
<braunr> (i regularly had it when working on the pthreads port)
<braunr> tschwinge:
  http://www.sceen.net/~rbraun/hurd_glibc_build_deadlock_trace
<braunr> youpi: ^
<youpi> Mmm, there's not so much we can do about this one
<braunr> youpi: what do you mean ?
<youpi> the problem is that it's really a reentrency issue of the libc
  locale
<youpi> it would happen just the same on linux
<braunr> sure
<braunr> but hat doesn't mean we can't report and/or fix it :)
<youpi> (the _nl_state_lock)
<braunr> do you have any workaround in mind ?
<youpi> no
<youpi> actually that's what I meant by "there's not so much we can do
  about this"
<braunr> ok
<youpi> because it's a bad interaction between libfakeroot and glibc
<youpi> glibc believe fxtstat64 would never call locale functions
<youpi> but with libfakeroot it does
<braunr> i see
<youpi> only because we get an EAGAIN here
<braunr> but hm, doesn't it happen on linux ?
<youpi> EAGAIN doesn't happen on linux for fxstat64, no :)
<braunr> why does it happen on the hurd ?
<youpi> I mean for fakeroot stuff
<youpi> probably because fakeroot uses socket functions
<youpi> for which we probably don't properly handleEAGAIN
<youpi> I've already seen such kind of issue
<youpi> in buildd failures
<braunr> ok
<youpi> (so the actual bug here is EAGAIN
<youpi> )
<braunr> yes, so we can do something about it
<braunr> worth a look
<pinotree> (implement sysv semaphores)
<youpi> pinotree: if we could also solve all these buildd EAGAIN issues
  that'd be nice :)
<braunr> that EAGAIN error might also be what makes exim behave badly and
  loop forever
<youpi> possibly
<braunr> i've updated the trace with debugging symbols
<braunr> it fails on connect
<pinotree> like http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=563342 ?
<braunr> it's EAGAIN, not ECONNREFUSED
<pinotree> ah ok
<braunr> might be an error in tcp_v4_get_port

IRC, freenode, #hurd, 2012-12-06

<braunr> hmm, tcp_v4_get_port sometimes fails indeed
<gnu_srs> braunr: may I ask how you found out, adding print statements in
  pfinet, or?
<braunr> yes
<gnu_srs> OK, so that's the only (easy) way to debug.
<braunr> that's the last resort
<braunr> gdb is easy too
<braunr> i could have added a breakpoint too
<braunr> but i didn't want to block pfinet while i was away
<braunr> is it possible to force the use of fakeroot-tcp on linux ?
<braunr> the problem seems to be that fakeroot doesn't close the sockets
  that it connected to faked-tcp
<braunr> which, at some point, exhauts the port space
<pinotree> braunr: sure
<pinotree> change the fakeroot dpkg alternative
<braunr> ok
<pinotree> calling it explicitly `fakeroot-tcp command` or
  `dpkg-buildpackage -rfakeroot-tcp ...` should work too
<braunr> fakeroot-tcp looks really evil :p
<braunr> hum, i don't see any faked-tcp process on linux :/
<pinotree> not even with `fakeroot-tcp bash -c "sleep 10"`?
<braunr> pinotree: now yes
<braunr> but, does it mean faked-tcp is started for *each* process loading
  fakeroot-tcp ?
<braunr> (the lib i mean)
<pinotree> i think so
<braunr> well the hurd doesn't seem to do that at all
<braunr> or maybe it does and i don't see it
<braunr> the stale faked-tcp processes could be those that failed something
  only
<pinotree> yes, there's also that issue: sometimes there are stake
  faked-tcp processes
<braunr> hum no, i see one faked-tcp that consumes cpu when building glibc
<pinotree> *stale
<braunr> it's the same process for all commands
<pinotree> <braunr> but, does it mean faked-tcp is started for *each*
  process loading fakeroot-tcp ?
<pinotree> → everytime you start fakeroot, there's a new faked-xxx for it
<braunr> it doesn't look that way
<braunr> again, on the hurd, i see one faked-tcp, consuming cpu while
  building so i assume it services libfakeroot-tcp requests
<pinotree> yes
<braunr> which means i probably won't reproduce the problem on linux
<pinotree> it serves that fakeroot under which the binary(-arch) target is
  run
<braunr> or perhaps it's the normal fakeroot-tcp behaviour on sid
<braunr> pinotree: a faked-tcp that is started for each command invocation
  will implicitely make the network stack close all its sockets when
  exiting
<braunr> pinotree: as our fakeroot-tcp uses the same instance of faked-tcp,
  it's a lot more likely to exhaust the port space
<pinotree> i see
<braunr> i'll try on sid and see how it behaves
<braunr> pinotree: on the other hand, forking so many processes at each
  command invocation may make exec leak a lot :p
<braunr> or rather, a lot more
<braunr> (or maybe not, since it leaks only in some cases)

?exec memory leaks.

<braunr> pinotree: actually, the behaviour under linux is the same with the
  alternative correctly set, whereas faked-tcp is restarted (if used at
  all) with -rfakeroot-tcp
<braunr> hm no, even that isn't true
<braunr> grr
<braunr> pinotree: i think i found a handy workaround for fakeroot
<braunr> pinotree: the range of local ports in our networking stack is a
  lot more limited than what is configured in current systems
<braunr> by extending it, i can now build glibc \o/
<pinotree> braunr: what are the current ours and the usual one?
<braunr> see pfinet/linux-src/net/ipv4/tcp_ipv4.c
<braunr> the modern ones are the ones suggested in the comment
<braunr> sysctl_local_port_range is the symbol storing the range
<pinotree> i see
<pinotree> what's the current range on linux?
<braunr> 20:44 < braunr> the modern ones are the ones suggested in the
  comment
<pinotree> i see
<braunr> $ cat /proc/sys/net/ipv4/ip_local_port_range 
<braunr> 32768   61000
<braunr> so, i'm not sure why we have the problem, since even on linux,
  netstat doesn't show open bound ports, but it does help
<braunr> the fact faked-tcp can remain after its use is more problematic
<pinotree> (maybe pfinet could grow a (startup-only?) option to change it,
  similar to that sysctl)
<braunr> but it can also stems from the same issue gnu_srs found about
  closed sockets that haven't been shut down
<braunr> perhaps
<braunr> but i don't see the point actually
<braunr> we could simply change the values in the code

<braunr> youpi: first, in pfinet, i increased the range of local ports to
  reduce the likeliness of port space exhaustion
<braunr> so we should get a lot less EAGAIN after that
<braunr> (i've not committed any of those changes)
<youpi> range of local ports?
<braunr> see pfinet/linux-src/net/ipv4/tcp_ipv4.c, tcp_v4_get_port function
  and sysctl_local_port_range array
<youpi> oh
<braunr> EAGAIN is caused by tcp_v4_get_port failing at
<braunr>                 /* Exhausted local port range during search? */
<braunr>                 if (remaining <= 0)
<braunr>                         goto fail;
<youpi> interesting
<youpi> so it's not a hurd bug after all
<youpi> just a problem in fakeroot eating a lot of ports
<braunr> maybe because of the same issue gnu_srs worked on (bad socket
  close when no clean shutdown)
<braunr> maybe, maybe not
<braunr> but increasing the range is effective
<braunr> and i compared with what linux does today, which is exactly what
  is in the comment above sysctl_local_port_range
<braunr> so it looks safe
<youpi> so that means that the pfinet just uses ports 1024- 4999 for
  auto-allocated ports?
<braunr> i guess so
<youpi> the linux pfinet I meant
<braunr> i haven't checked the whole code but it looks that way
<youpi> ./sysctl_net_ipv4.c:static int ip_local_port_range_min[] = { 1, 1
  };
<youpi> ./sysctl_net_ipv4.c:static int ip_local_port_range_max[] = { 65535,
  65535 };
<youpi> looks like they have increased it since then :)
<braunr> hum :)
<braunr> $ cat /proc/sys/net/ipv4/ip_local_port_range 
<braunr> 32768   61000
<youpi> yep, same here
<youpi> ./inet_connection_sock.c:   .range = { 32768, 61000 },
<youpi> so there are two things apparently
<youpi> but linux now defaults to 32k-61k
<youpi> braunr: please just push the port range upgrade to 32Ki-61K
<braunr> ok, will do
<youpi> there's not reason not to do it

IRC, freenode, #hurd, 2012-12-11

<braunr> youpi: at least, i haven't had any failure building eglibc since
  the port range patch
<youpi> good :)