[[!meta copyright="Copyright © 2013, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_glibc]] Issues with the current 2.17 version of glibc/EGLIBC in Debian experimental. Now in unstable. # IRC, OFTC, #debian-hurd, 2013-03-14 I have a strange tcp via localhost question: The other side closes the connection, but I haven't read all data, yet. I should still be able to read the pending data, no? At least it seems to work that way on Linux, but not on Hurd. Got a simple repro with nc, if you're interested... markus_wanner: yes, we're interested youpi: okay, here we go: session 1: nc -l -p 7777 localhost session 2: nc 127.0.0.1 7777 session 2: a b c session 1: [ pause with Ctrl-Z ] session 2: [ send more data ] d e f session 2: [ quit with Ctrl-C ] session 1: [ resume with 'fg' ] The server on session 1 doesn't get the data sent after it paused and before the client closed the connection. I'm not sure if that's a valid TCP thing. However, on Linux, the server still gets the data. On hurd it doesn't. I'm working on a C-code test case, ATM. markus_wanner: on which box are you seeing this behavior? exodar does not have it i.e. I do get the d e f a private VM (I'm not a DD) ..updated to latest experimental stuff. GNU lematur 0.3 GNU-Mach 1.3.99-486/Hurd-0.3 i686-AT386 GNU ok, I can't reproduce it on my vm either maybe the C program will help Hm.. cannot corrently reproduce that in C. (Netcat still shows the issue, though). I'll try to strace netcat... ..Meh. strace not available on Hurd? no, but there is rpctrace to show the various rpc Cool, looks helpful. Thx Uh.. that introduces another error: rpctrace: ../../utils/rpctrace.c:1287: trace_and_forward: Assertion `reply_type == 18' failed. [[hurd/debugging/rpctrace]]. I'm checking on a box without ipv6 configuration maybe that's the difference between you and me I guess your /etc/alternatives/nc is /bin/nc.traditional ? Yup, nc.traditional. Looks like that box only has IPv4 configured. Something very strange is going on here. No matter how hard I try, I cannot reproduce this with netcat, anymore. not even after a reboot? Woo.. here, it happened, again! This is driving me crazy! Now, nc seemingly connects, but is unable to send data between the two. Netcat would somehow complain, if it failed to connect, no? No it worked. So this seems to be an intermittent issue. So far, I could only ever repro it as a normal user, not as root. May be coincidental, though. Now, 'a' and 'b' made it through, but not the 'c' sent manually just after that. Something with that TCP/IP stack is definitely fishy. Anything I can try to investigate? Or shall I simply restart and see if the problem persists? maybe restart, yes did you restart since the upgrade ? Yes, I restarted after that. Hm.. okay, restarted. Some problem persists. I currently have two netcat processes connected, the listening one got some first two messages and seems stuck now. With the client, I tried to send more data, but the server doesn't get it, anymore. Any idea on what I can do to analyze the situation? for the netcat issue, I haven't experienced this are you running in kvm or virtualbox or something else? I'm currently puzzled about what "experimental" actually ships. On kvm. My libc0.3 used to be 2.13-39+hurd.3. But packages.d.o already shows 2.17.0experimental2. experimental ships experimental versions, which you aren't supposed to use unless you know what you are doing iirc 2.17 is known to be quite broken for now Okay. So I guess I'll try to "downgrade" to unstable, then. Phew, okay, successfully downgraded to unstable. Hopefully monotone's test suite runs through fine, now. Yup, WORKING! Looks like some experimental packages caused the problem. The netcat test as well as that one failing monotone test work fine, now. ## IRC, OFTC, #debian-hurd, 2013-03-19 pinotree, youpi: Is there anything from that markus_wanner discussion about pfinet/netcat/signals that needs to be filed? I guess we don't know what exactly he changed so that everything workedd fine eventually? (Some experimental package(s), but which?) that was libc0.3 packages which are indeed known to break the network # IRC, freenode, #hurd, 2013-06-18 root@darnassus:~# dpkg-reconfigure locales Generating locales (this might take a while)... en_US.UTF-8...Segmentation fault is it known ? uh, no ## IRC, OFTC, #debian-hurd, 2013-06-19 btw i saw too the segmentation fault when generating locales ## IRC, freenode, #hurd, 2014-02-04 hello I just updated Setting up locales (2.17-98~0) ... Generating locales (this might take a while)... en_US.UTF-8...Segmentation fault done bu^: That's known, it still seems to work, though. If you have the time please debug. I've tried but not found the solution yet:-( ok, just wanted to notify ## IRC, freenode, #hurd, 2014-02-19 for info, the localedef segfault has been fixed upstream or rather, upstream has been written in a way that won't trigger the segfault it is caused by the locale archive code that maps the locale archive file in the address space, enlarging the mapping as needed, but unmaps the complete reserved size of 512M on close munmap is implemented through vm_deallocate, but it looks like the latter doesn't allow deallocating unmapped regions of the address space (to be confirmed) upstream code tracks the mapping size so vm_deallocate won't whine i expect we'll have that in eglibc 2.18 hm actually, posix says munmap must refer to memory obtained with mmap :) (or actually, that the behaviour is undefined, which most unix systems allow anyway, but not us) also, before i leave, i have partially traced the localedef segfault ah, cool localedef maps the locale archive, and enlarges the mapping as needed but munmaps the complete 512m reserved area and i strongly suspect it unmaps something it shouldn't on the hurd since linux mmap has different boundaries depending on the mapping use while our glibc will happily maps stacks below text the good news is that it looks fixed upstream ah :) https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=17db6e8d6b12f55e312fcab46faf5d332c806fb6 see the change about close_archive i haven't tested it though ## IRC, freenode, #hurd, 2014-02-21 just upgraded to 2.18, locales still segfaults ok ## IRC, freenode, #hurd, 2014-02-23 ok, as expected, the localdef bug is because of some mmap issue [[glibc/mmap]]. looks like our mmap doesn't like mapping files with PROT_NONE shouldn't be too hard to fix gg0: i should have a fix ready soon for localedef youpi: i have a patch for glibc about the localedef segfault is that the backport we talked about, or something else? something else in short mmap() PROT_NONE on files return 0 ok seems like fixable indeed nothing is mapped, and the localdef code doesn't consider this an error my current fix is to handle PROT_NONE like PROT_READ doesn't vm_protect allow to map something without giving read right? it probably does the problem is in glibc ok when i say like PROT_READ, i mean a memory object gets a reference on the read port returned by io_map since it's not accessible anyway, it shouldn't make a difference but i preferred to have the memory object referenced anyway to match what i expect is done by other systems ## IRC, freenode, #hurd, 2014-02-24 braunr: ah ok ok that mmap fix looks fine, i'll add comments and commit it soon # IRC, OFTC, #debian-hurd, 2013-06-20 damn hang at ext2fs boot static linking issue, clearly ## IRC, freenode, #hurd, 2013-06-30 Mmm __access ("/etc/ld.so.nohwcap", F_OK) at startup of ext2fs deemed to fail.... when does that happen? at hwcap initialization at least that's were ext2fs.static linked against libc 2.17 hangs at startup and this is indeed a very good culprit :) ah, a debian patch does anybody know a quick way to know whether one is the / ext2fs ? :) isn't the root fs given a special port? I was thinking about something like this, yes ok, boots I'll build a 8~0 that includes the fix so people can easily build the hurd package Mmm, no, the bootstrap port is also NULL for normally-started processes :/ I don't understand why ah, only translators get a bootstrap port :/ perhaps CRDIR then (which makes a lot of sense) ## IRC, freenode, #hurd, 2013-07-01 youpi: what is local-no-bootstrap-fs-access.diff supposed to fix ? ext2fs.static linked againt debian glibc 2.17 well, as long as you don't build & use ext2fs.static with it... that's thing, i want to :) +the I'd warmly welcome a way to detect whether being the / translator process btw it seems far from trivial # glibc 2.18 vs. GCC 4.8 ## IRC, freenode, #hurd, 2013-11-25 grmbl, installing a glibc 2.18 rebuilt with gcc-4.8 brings an unbootable system ## IRC, freenode, #hurd, 2013-11-29 so, what do I do? rebuild the glibc 2.18 package with gcc4.8 and see what breaks ? when I boot a system with that libc that is ? I wish youpi would have been more specific, I've never built the libc before... debian/rules build in the debian package ctrl-c when you see gcc invocations cd buildir; make lib others although hm what breaks is at boot time right ? yes heh .. then dpkg-buildpackage DEB_BUILD_OPTIONS=nocheck speeds things up just answer on the mailing list and ask him he usually answers quickly ## IRC, freenode, #hurd, 2013-12-18 teythoon: k!, any luck with eglibc-2.18? tbh i didn't look into this after two unsuccessful attempts at building the libc package there was a post over at the libc-alpha list that sounded familiar http://www.cygwin.com/ml/libc-alpha/2013-12/msg00281.html wow ? this looks tricky and why ia64 only indeed it's rare to see aurel32 ask such questions ## IRC, freenode, #hurd, 2014-01-22 btw, did anybody investigate the glibc-built-with-gcc-4.8 issue? oddly enough, a subhurd boots completely fine with it i didn't no, sorry I was wondering whether the bogus deallocation at boot might have something to do which one ? ah yes maybe quoted earlier here