path: root/open_issues/libpthread.mdwn
diff options
authorThomas Schwinge <>2012-11-29 01:33:22 +0100
committerThomas Schwinge <>2012-11-29 01:33:22 +0100
commit5bd36fdff16871eb7d06fc26cac07e7f2703432b (patch)
treeb430970a01dfc56b8d41979552999984be5c6dfd /open_issues/libpthread.mdwn
parent2603401fa1f899a8ff60ec6a134d5bd511073a9d (diff)
Diffstat (limited to 'open_issues/libpthread.mdwn')
1 files changed, 668 insertions, 0 deletions
diff --git a/open_issues/libpthread.mdwn b/open_issues/libpthread.mdwn
index 03a52218..81f1a382 100644
--- a/open_issues/libpthread.mdwn
+++ b/open_issues/libpthread.mdwn
@@ -566,3 +566,671 @@ There is a [[!FF_project 275]][[!tag bounty]] on this task.
<braunr> ouch
<bddebian> braunr: Do you have debugging enabled in that custom kernel you
installed? Apparently it is sitting at the debug prompt.
+## IRC, freenode, #hurd, 2012-08-12
+ <braunr> hmm, it seems the hurd notion of cancellation is actually not the
+ pthread one at all
+ <braunr> pthread_cancel merely marks a thread as being cancelled, while
+ hurd_thread_cancel interrupts it
+ <braunr> ok, i have a pthread_hurd_cond_wait_np function in glibc
+## IRC, freenode, #hurd, 2012-08-13
+ <braunr> nice, i got ext2fs work with pthreads
+ <braunr> there are issues with the stack size strongly limiting the number
+ of concurrent threads, but that's easy to fix
+ <braunr> one problem with the hurd side is the condition implications
+ <braunr> i think it should be deal separately, and before doing anything
+ with pthreads
+ <braunr> but that's minor, the most complex part is, again, the term server
+ <braunr> other than that, it was pretty easy to do
+ <braunr> but, i shouldn't speak too soon, who knows what tricky bootstrap
+ issue i'm gonna face ;p
+ <braunr> tschwinge: i'd like to know how i should proceed if i want a
+ symbol in a library overriden by that of a main executable
+ <braunr> e.g. have libpthread define a default stack size, and let
+ executables define their own if they want to change it
+ <braunr> tschwinge: i suppose i should create a weak alias in the library
+ and a normal variable in the executable, right ?
+ <braunr> hm i'm making this too complicated
+ <braunr> don't mind that stupid question
+ <tschwinge> braunr: A simple variable definition would do, too, I think?
+ <tschwinge> braunr: Anyway, I'd first like to know why we can'T reduce the
+ size of libpthread threads from 2 MiB to 64 KiB as libthreads had. Is
+ that a requirement of the pthread specification?
+ <braunr> tschwinge: it's a requirement yes
+ <braunr> the main reason i see is that hurd threadvars (which are still
+ present) rely on common stack sizes and alignment to work
+ <tschwinge> Mhm, I see.
+ <braunr> so for now, i'm using this approach as a hack only
+ <tschwinge> I'm working on phasing out threadvars, but we're not there yet.
+ <tschwinge> Yes, that's fine for the moment.
+ <braunr> tschwinge: a simple definition wouldn't work
+ <braunr> tschwinge: i resorted to a weak symbol, and see how it goes
+ <braunr> tschwinge: i supposed i need to export my symbol as a global one,
+ otherwise making it weak makes no sense, right ?
+ <braunr> suppose*
+ <braunr> tschwinge: also, i'm not actually sure what you meant is a
+ requirement about the stack size, i shouldn't have answered right away
+ <braunr> no there is actually no requirement
+ <braunr> i misunderstood your question
+ <braunr> hm when adding this weak variable, starting a program segfaults :(
+ <braunr> apparently on ___pthread_self, a tls variable
+ <braunr> fighting black magic begins
+ <braunr> arg, i can't manage to use that weak symbol to reduce stack sizes
+ :(
+ <braunr> ah yes, finally
+ <braunr> git clone /path/to/glibc.git on a pthread-powered ext2fs server :>
+ <braunr> tschwinge: seems i have problems using __thread in hurd code
+ <braunr> tschwinge: they produce undefined symbols
+ <braunr> tschwinge: forget that, another mistake on my part
+ <braunr> so, current state: i just need to create another patch, for the
+ code that is included in the debian hurd package but not in the upstream
+ hurd repository (e.g. procfs, netdde), and i should be able to create
+ hurd packages taht completely use pthreads
+## IRC, freenode, #hurd, 2012-08-14
+ <braunr> tschwinge: i have weird bootstrap issues, as expected
+ <braunr> tschwinge: can you point me to important files involved during
+ bootstrap ?
+ <braunr> my ext2fs.static server refuses to start as a rootfs, whereas it
+ seems to work fine otherwise
+ <braunr> hm, it looks like it's related to global signal dispositions
+## IRC, freenode, #hurd, 2012-08-15
+ <braunr> ahah, a subhurd running pthreads-powered hurd servers only
+ <LarstiQ> braunr: \o/
+ <braunr> i can even long on ssh
+ <braunr> log
+ <braunr> pinotree: for reference, i uploaded my debian-specific changes
+ there :
+ <braunr>
+ <braunr> darnassus is now running a pthreads-enabled hurd system :)
+## IRC, freenode, #hurd, 2012-08-16
+ <braunr> my pthreads-enabled hurd systems can quickly die under load
+ <braunr> youpi: with hurd servers using pthreads, i occasionally see thread
+ storms apparently due to a deadlock
+ <braunr> youpi: it makes me think of the problem you sometimes have (and
+ had often with the page cache patch)
+ <braunr> in cthreads, mutex and condition operations are macros, and they
+ check the mutex/condition queue without holding the internal
+ mutex/condition lock
+ <braunr> i'm not sure where this can lead to, but it doesn't seem right
+ <pinotree> isn't that a bit dangerous?
+ <braunr> i believe it is
+ <braunr> i mean
+ <braunr> it looks dangerous
+ <braunr> but it may be perfectly safe
+ <pinotree> could it be?
+ <braunr> aiui, it's an optimization, e.g. "dont take the internal lock if
+ there are no thread to wake"
+ <braunr> but if there is a thread enqueuing itself at the same time, it
+ might not be waken
+ <pinotree> yeah
+ <braunr> pthreads don't have this issue
+ <braunr> and what i see looks like a deadlock
+ <pinotree> anything can happen between the unlocked checking and the
+ following instruction
+ <braunr> so i'm not sure how a situation working around a faulty
+ implementation would result in a deadlock with a correct one
+ <braunr> on the other hand, the error youpi reported
+ ( seems
+ to indicate something is deeply wrong with libports
+ <pinotree> it could also be the current code does not really "works around"
+ that, but simply implicitly relies on the so-generated behaviour
+ <braunr> luckily not often
+ <braunr> maybe
+ <braunr> i think we have to find and fix these issues before moving to
+ pthreads entirely
+ <braunr> (ofc, using pthreads to trigger those bugs is a good procedure)
+ <pinotree> indeed
+ <braunr> i wonder if tweaking the error checking mode of pthreads to abort
+ on EDEADLK is a good approach to detecting this problem
+ <braunr> let's try !
+ <braunr> youpi: eh, i think i've spotted the libports ref mistake
+ <youpi> ooo!
+ <youpi> .oOo.!!
+ <gnu_srs> Same problem but different patches
+ <braunr> look at libports/bucket-iterate.c
+ <braunr> in the HURD_IHASH_ITERATE loop, pi->refcnt is incremented without
+ a lock
+ <youpi> Mmm, the incrementation itself would probably be compiled into an
+ INC, which is safe in UP
+ <youpi> it's an add currently actually
+ <youpi> 0x00004343 <+163>: addl $0x1,0x4(%edi)
+ <braunr> 40c4: 83 47 04 01 addl $0x1,0x4(%edi)
+ <youpi> that makes it SMP unsafe, but not UP unsafe
+ <braunr> right
+ <braunr> too bad
+ <youpi> that still deserves fixing :)
+ <braunr> the good side is my mind is already wired for smp
+ <youpi> well, it's actually not UP either
+ <youpi> in general
+ <youpi> when the processor is not able to do the add in one instruction
+ <braunr> sure
+ <braunr> youpi: looks like i'm wrong, refcnt is protected by the global
+ libports lock
+ <youpi> braunr: but aren't there pieces of code which manipulate the refcnt
+ while taking another lock than the global libports lock
+ <youpi> it'd not be scalable to use the global libports lock to protect
+ refcnt
+ <braunr> youpi: imo, the scalability issues are present because global
+ locks are taken all the time, indeed
+ <youpi> urgl
+ <braunr> yes ..
+ <braunr> when enabling mutex checks in libpthread, pfinet dies :/
+ <braunr> grmbl, when trying to start "ls" using my deadlock-detection
+ libpthread, the terminal gets unresponsive, and i can't even use ps .. :(
+ <pinotree> braunr: one could say your deadlock detection works too
+ good... :P
+ <braunr> pinotree: no, i made a mistake :p
+ <braunr> it works now :)
+ <braunr> well, works is a bit fast
+ <braunr> i can't attach gdb now :(
+ <braunr> *sigh*
+ <braunr> i guess i'd better revert to a cthreads hurd and debug from there
+ <braunr> eh, with my deadlock-detection changes, recursive mutexes are now
+ failing on _pthread_self(), which for some obscure reason generates this
+ <braunr> => 0x0107223b <+283>: jmp 0x107223b
+ <__pthread_mutex_timedlock_internal+283>
+ <braunr> *sigh*
+## IRC, freenode, #hurd, 2012-08-17
+ <braunr> aw, the thread storm i see isn't a deadlock
+ <braunr> seems to be mere contention ....
+ <braunr> youpi: what do you think of the way
+ ports_manage_port_operations_multithread determines it needs to spawn a
+ new thread ?
+ <braunr> it grabs a lock protecting the number of threads to determine if
+ it needs a new thread
+ <braunr> then releases it, to retake it right after if a new thread must be
+ created
+ <braunr> aiui, it could lead to a situation where many threads could
+ determine they need to create threads
+ <youpi> braunr: there's no reason to release the spinlock before re-taking
+ it
+ <youpi> that can indeed lead to too much thread creations
+ <braunr> youpi: a harder question
+ <braunr> youpi: what if thread creation fails ? :/
+ <braunr> if i'm right, hurd servers simply never expect thread creation to
+ fail
+ <youpi> indeed
+ <braunr> and as some patterns have threads blocking until another produce
+ an event
+ <braunr> i'm not sure there is any point handling the failure at all :/
+ <youpi> well, at least produce some output
+ <braunr> i added a perror
+ <youpi> so we know that happened
+ <braunr> async messaging is quite evil actually
+ <braunr> the bug i sometimes have with pfinet is usually triggered by
+ fakeroot
+ <braunr> it seems to use select a lot
+ <braunr> and select often destroys ports when it has something to return to
+ the caller
+ <braunr> which creates dead name notifications
+ <braunr> and if done often enough, a lot of them
+ <youpi> uh
+ <braunr> and as pfinet is creating threads to service new messages, already
+ existing threads are starved and can't continue
+ <braunr> which leads to pfinet exhausting its address space with thread
+ stacks (at about 30k threads)
+ <braunr> i initially thought it was a deadlock, but my modified libpthread
+ didn't detect one, and indeed, after i killed fakeroot (the whole
+ dpkg-buildpackage process hierarchy), pfinet just "cooled down"
+ <braunr> with almost all 30k threads simply waiting for requests to
+ service, and the few expected select calls blocking (a few ssh sessions,
+ exim probably, possibly others)
+ <braunr> i wonder why this doesn't happen with cthreads
+ <youpi> there's a 4k guard between stacks, otherwise I don't see anything
+ obvious
+ <braunr> i'll test my pthreads package with the fixed
+ ports_manage_port_operations_multithread
+ <braunr> but even if this "fix" should reduce thread creation, it doesn't
+ prevent the starvation i observed
+ <braunr> evil concurrency :p
+ <braunr> youpi: hm i've just spotted an important difference actually
+ <braunr> youpi: glibc sched_yield is __swtch(), cthreads is
+ <braunr> i'll change the glibc implementation, see how it affects the whole
+ system
+ <braunr> youpi: do you think bootsting the priority or cancellation
+ requests is an acceptable workaround ?
+ <braunr> boosting
+ <braunr> of*
+ <youpi> workaround for what?
+ <braunr> youpi: the starvation i described earlier
+ <youpi> well, I guess I'm not into the thing enough to understand
+ <youpi> you meant the dead port notifications, right?
+ <braunr> yes
+ <braunr> they are the cancellation triggers
+ <youpi> cancelling whaT?
+ <braunr> a blocking select for example
+ <braunr> ports_do_mach_notify_dead_name -> ports_dead_name ->
+ ports_interrupt_notified_rpcs -> hurd_thread_cancel
+ <braunr> so it's important they are processed quickly, to allow blocking
+ threads to unblock, reply, and be recycled
+ <youpi> you mean the threads in pfinet?
+ <braunr> the issue applies to all servers, but yes
+ <youpi> k
+ <youpi> well, it can not not be useful :)
+ <braunr> whatever the choice, it seems to be there will be a security issue
+ (a denial of service of some kind)
+ <youpi> well, it's not only in that case
+ <youpi> you can always queue a lot of requests to a server
+ <braunr> sure, i'm just focusing on this particular problem
+ <braunr> hm
+ <braunr> i'd say POLICY_TIMESHARE just in case
+ <braunr> (and i'm not sure mach handles fixed priority threads first
+ actually :/)
+ <braunr> hm my current hack which consists of calling swtch_pri(0) from a
+ freshly created thread seems to do the job eh
+ <braunr> (it may be what cthreads unintentionally does by acquiring a spin
+ lock from the entry function)
+ <braunr> not a single issue any more with this hack
+ <bddebian> Nice
+ <braunr> bddebian: well it's a hack :p
+ <braunr> and the problem is that, in order to boost a thread's priority,
+ one would need to implement that in libpthread
+ <bddebian> there isn't thread priority in libpthread?
+ <braunr> it's not implemented
+ <bddebian> Interesting
+ <braunr> if you want to do it, be my guest :p
+ <braunr> mach should provide the basic stuff for a partial implementation
+ <braunr> but for now, i'll fall back on the hack, because that's what
+ cthreads "does", and it's "reliable enough"
+ <antrik> braunr: I don't think the locking approach in
+ ports_manage_port_operations_multithread() could cause issues. the worst
+ that can happen is that some other thread becomes idle between the check
+ and creating a new thread -- and I can't think of a situation where this
+ could have any impact...
+ <braunr> antrik: hm ?
+ <braunr> the worst case is that many threads will evalute spawn to 1 and
+ create threads, whereas only one of them should have
+ <antrik> braunr: I'm not sure perror() is a good way to handle the
+ situation where thread creation failed. this would usually happen because
+ of resource shortage, right? in that case, it should work in non-debug
+ builds too
+ <braunr> perror isn't specific to debug builds
+ <braunr> i'm building glibc packages with a pthreads-enabled hurd :>
+ <braunr> (which at one point run the test allocating and filling 2 GiB of
+ memory, which passed)
+ <braunr> (with a kernel using a 3/1 split of course, swap usage reached
+ something like 1.6 GiB)
+ <antrik> braunr: BTW, I think the observation that thread storms tend to
+ happen on destroying stuff more than on creating stuff has been made
+ before...
+ <braunr> ok
+ <antrik> braunr: you are right about perror() of course. brain fart -- was
+ thinking about assert_perror()
+ <antrik> (which is misused in some places in existing Hurd code...)
+ <antrik> braunr: I still don't see the issue with the "spawn"
+ locking... the only situation where this code can be executed
+ concurrently is when multiple threads are idle and handling incoming
+ request -- but in that case spawning does *not* happen anyways...
+ <antrik> unless you are talking about something else than what I'm thinking
+ of...
+ <braunr> well imagine you have idle threads, yes
+ <braunr> let's say a lot like a thousand
+ <braunr> and the server gets a thousand requests
+ <braunr> a one more :p
+ <braunr> normally only one thread should be created to handle it
+ <braunr> but here, the worst case is that all threads run internal_demuxer
+ roughly at the same time
+ <braunr> and they all determine they need to spawn a thread
+ <braunr> leading to another thousand
+ <braunr> (that's extreme and very unlikely in practice of course)
+ <antrik> oh, I see... you mean all the idle threads decide that no spawning
+ is necessary; but before they proceed, finally one comes in and decides
+ that it needs to spawn; and when the other ones are scheduled again they
+ all spawn unnecessarily?
+ <braunr> no, spawn is a local variable
+ <braunr> it's rather, all idle threads become busy, and right before
+ servicing their request, they all decide they must spawn a thread
+ <antrik> I don't think that's how it works. changing the status to busy (by
+ decrementing the idle counter) and checking that there are no idle
+ threads is atomic, isn't it?
+ <braunr> no
+ <antrik> oh
+ <antrik> I guess I should actually look at that code (again) before
+ commenting ;-)
+ <braunr> let me check
+ <braunr> no sorry you're right
+ <braunr> so right, you can't lead to that situation
+ <braunr> i don't even understand how i can't see that :/
+ <braunr> let's say it's the heat :p
+ <braunr> 22:08 < braunr> so right, you can't lead to that situation
+ <braunr> it can't lead to that situation
+## IRC, freenode, #hurd, 2012-08-18
+ <braunr> one more attempt at fixing netdde, hope i get it right this time
+ <braunr> some parts assume a ddekit thread is a cthread, because they share
+ the same address
+ <braunr> it's not as easy when using pthread_self :/
+ <braunr> good, i got netdde work with pthreads
+ <braunr> youpi: for reference, there are now glibc, hurd and netdde
+ packages on my repository
+ <braunr> youpi: the debian specific patches can be found at my git
+ repository ( and
+ <braunr> except a freeze during boot (between exec and init) which happens
+ rarely, and the starvation which still exists to some extent (fakeroot
+ can cause many threads to be created in pfinet and pflocal), the
+ glibc/hurd packages have been working fine for a few days now
+ <braunr> the threading issue in pfinet/pflocal is directly related to
+ select, which the io_select_timeout patches should fix once merged
+ <braunr> well, considerably reduce at least
+ <braunr> and maybe fix completely, i'm not sure
+## IRC, freenode, #hurd, 2012-08-27
+ <pinotree> braunr: wrt a78a95d in your pthread branch of hurd.git,
+ shouldn't that job theorically been done using pthread api (of course
+ after implementing it)?
+ <braunr> pinotree: sure, it could be done through pthreads
+ <braunr> pinotree: i simply restricted myself to moving the hurd to
+ pthreads, not augment libpthread
+ <braunr> (you need to remember that i work on hurd with pthreads because it
+ became a dependency of my work on fixing select :p)
+ <braunr> and even if it wasn't the reason, it is best to do these tasks
+ (replace cthreads and implement pthread scheduling api) separately
+ <pinotree> braunr: hm ok
+ <pinotree> implementing the pthread priority bits could be done
+ independently though
+ <braunr> youpi: there are more than 9000 threads for /hurd/streamio kmsg on
+ ironforge oO
+ <youpi> kmsg ?!
+ <youpi> it's only /dev/klog right?
+ <braunr> not sure but it seems so
+ <pinotree> which syslog daemon is running?
+ <youpi> inetutils
+ <youpi> I've restarted the klog translator, to see whether when it grows
+ again
+ <braunr> 6 hours and 21 minutes to build glibc on darnassus
+ <braunr> pfinet still runs only 24 threads
+ <braunr> the ext2 instance used for the build runs 2k threads, but that's
+ because of the pageouts
+ <braunr> so indeed, the priority patch helps a lot
+ <braunr> (pfinet used to have several hundreds, sometimes more than a
+ thousand threads after a glibc build, and potentially increasing with
+ each use of fakeroot)
+ <braunr> exec weights 164M eww, we definitely have to fix that leak
+ <braunr> the leaks are probably due to wrong mmap/munmap usage
+### IRC, freenode, #hurd, 2012-08-29
+ <braunr> youpi: btw, after my glibc build, there were as little as between
+ 20 and 30 threads for pflocal and pfinet
+ <braunr> with the priority patch
+ <braunr> ext2fs still had around 2k because of pageouts, but that's
+ expected
+ <youpi> ok
+ <braunr> overall the results seem very good and allow the switch to
+ pthreads
+ <youpi> yep, so it seems
+ <braunr> youpi: i think my first integration branch will include only a few
+ changes, such as this priority tuning, and the replacement of
+ condition_implies
+ <youpi> sure
+ <braunr> so we can push the move to pthreads after all its small
+ dependencies
+ <youpi> yep, that's the most readable way
+## IRC, freenode, #hurd, 2012-09-03
+ <gnu_srs> braunr: Compiling yodl-3.00.0-7:
+ <gnu_srs> pthreads: real 13m42.460s, user 0m0.000s, sys 0m0.030s
+ <gnu_srs> cthreads: real 9m 6.950s, user 0m0.000s, sys 0m0.020s
+ <braunr> thanks
+ <braunr> i'm not exactly certain about what causes the problem though
+ <braunr> it could be due to libpthread using doubly-linked lists, but i
+ don't think the overhead would be so heavier because of that alone
+ <braunr> there is so much contention sometimes that it could
+ <braunr> the hurd would have been better off with single threaded servers
+ :/
+ <braunr> we should probably replace spin locks with mutexes everywhere
+ <braunr> on the other hand, i don't have any more starvation problem with
+ the current code
+### IRC, freenode, #hurd, 2012-09-06
+ <gnu_srs> braunr: Yes you are right, the new pthread-based Hurd is _much_
+ slower.
+ <gnu_srs> One annoying example is when compiling, the standard output is
+ written in bursts with _long_ periods of no output in between:-(
+ <braunr> that's more probably because of the priority boost, not the
+ overhead
+ <braunr> that's one of the big issues with our mach-based model
+ <braunr> we either give high priorities to our servers, or we can suffer
+ from message floods
+ <braunr> that's in fact more a hurd problem than a mach one
+ <gnu_srs> braunr: any immediate ideas how to speed up responsiveness the
+ pthread-hurd. It is annoyingly slow (slow-witted)
+ <braunr> gnu_srs: i already answered that
+ <braunr> it doesn't look that slower on my machines though
+ <gnu_srs> you said you had some ideas, not which. except for mcsims work.
+ <braunr> i have ideas about what makes it slower
+ <braunr> it doesn't mean i have solutions for that
+ <braunr> if i had, don't you think i'd have applied them ? :)
+ <gnu_srs> ok, how to make it more responsive on the console? and printing
+ stdout more regularly, now several pages are stored and then flushed.
+ <braunr> give more details please
+ <gnu_srs> it behaves like a loaded linux desktop, with little memory
+ left...
+ <braunr> details about what you're doing
+ <gnu_srs> apt-get source any big package and: fakeroot debian/rules binary
+ 2>&1 | tee ../binary.logg
+ <braunr> isee
+ <braunr> well no, we can't improve responsiveness
+ <braunr> without reintroducing the starvation problem
+ <braunr> they are linked
+ <braunr> and what you're doing involes a few buffers, so the laggy feel is
+ expected
+ <braunr> if we can fix that simply, we'll do so after it is merged upstream
+### IRC, freenode, #hurd, 2012-09-07
+ <braunr> gnu_srs: i really don't feel the sluggishness you described with
+ hurd+pthreads on my machines
+ <braunr> gnu_srs: what's your hardware ?
+ <braunr> and your VM configuration ?
+ <gnu_srs> Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
+ <gnu_srs> kvm -m 1024 -net nic,model=rtl8139 -net
+ user,hostfwd=tcp::5562-:22 -drive
+ cache=writeback,index=0,media=disk,file=hurd-experimental.img -vnc :6
+ -cdrom isos/netinst_2012-07-15.iso -no-kvm-irqchip
+ <braunr> what is the file system type where your disk image is stored ?
+ <gnu_srs> ext3
+ <braunr> and how much physical memory on the host ?
+ <braunr> (paste meminfo somewhere please)
+ <gnu_srs> 4G, and it's on the limit, 2 kvm instances+gnome,etc
+ <gnu_srs> 80% in use by programs, 14% in cache.
+ <braunr> ok, that's probably the reason then
+ <braunr> the writeback option doesn't help a lot if you don't have much
+ cache
+ <gnu_srs> well the other instance is cthreads based, and not so sluggish.
+ <braunr> we know hurd+pthreads is slower
+ <braunr> i just wondered why i didn't feel it that much
+ <gnu_srs> try to fire up more kvm instances, and do a heavy compile...
+ <braunr> i don't do that :)
+ <braunr> that's why i never had the problem
+ <braunr> most of the time i have like 2-3 GiB of cache
+ <braunr> and of course more on shattrath
+ <braunr> (the host of the hurdboxes, which has 16 GiB of ram)
+### IRC, freenode, #hurd, 2012-09-11
+ <gnu_srs> Monitoring the cthreads and the pthreads load under Linux shows:
+ <gnu_srs> cthread version: load can jump very high, less cpu usage than
+ pthread version
+ <gnu_srs> pthread version: less memory usage, background cpu usage higher
+ than for cthread version
+ <braunr> that's the expected behaviour
+ <braunr> gnu_srs: are you using the lifothreads gnumach kernel ?
+ <gnu_srs> for experimental, yes.
+ <gnu_srs> i.e. pthreads
+ <braunr> i mean, you're measuring on it right now, right ?
+ <gnu_srs> yes, one instance running cthreads, and one pthreads (with lifo
+ gnumach)
+ <braunr> ok
+ <gnu_srs> no swap used in either instance, will try a heavy compile later
+ on.
+ <braunr> what for ?
+ <gnu_srs> E.g. for memory when linking. I have swap available, but no swap
+ is used currently.
+ <braunr> yes but, what do you intend to measure ?
+ <gnu_srs> don't know, just to see if swap is used at all. it seems to be
+ used not very much.
+ <braunr> depends
+ <braunr> be warned that using the swap means there is pageout, which is one
+ of the triggers for global system freeze :p
+ <braunr> anonymous memory pageout
+ <gnu_srs> for linux swap is used constructively, why not on hurd?
+ <braunr> because of hard to squash bugs
+ <gnu_srs> aha, so it is bugs hindering swap usage:-/
+ <braunr> yup :/
+ <gnu_srs> Let's find them thenO:-), piece of cake
+ <braunr> remember my page cache branch in gnumach ? :)
+ <gnu_srs> not much
+ <braunr> i started it before fixing non blocking select
+ <braunr> anyway, as a side effect, it should solve this stability issue
+ too, but it'll probably take time
+ <gnu_srs> is that branch integrated? I only remember slab and the lifo
+ stuff.
+ <gnu_srs> and mcsims work
+ <braunr> no it's not
+ <braunr> it's unfinished
+ <gnu_srs> k!
+ <braunr> it correctly extends the page cache to all available physical
+ memory, but since the hurd doesn't scale well, it slows the system down
+## IRC, freenode, #hurd, 2012-09-14
+ <braunr> arg
+ <braunr> darnassus seems to eat 100% cpu and make top freeze after some
+ time
+ <braunr> seems like there is an important leak in the pthreads version
+ <braunr> could be the lifothreads patch :/
+ <cjbirk> there's a memory leak?
+ <cjbirk> in pthreads?
+ <braunr> i don't think so, and it's not a memory leak
+ <braunr> it's a port leak
+ <braunr> probably in the kernel
+### IRC, freenode, #hurd, 2012-09-17
+ <braunr> nice, the port leak is actually caused by the exim4 loop bug
+### IRC, freenode, #hurd, 2012-09-23
+ <braunr> the port leak i observed a few days ago is because of exim4 (the
+ infamous loop eating the cpu we've been seeing regularly)
+ <youpi> oh
+ <braunr> next time it happens, and if i have the occasion, i'll examine the
+ problem
+ <braunr> tip: when you can't use top or ps -e, you can use ps -e -o
+ pid=,args=
+ <youpi> or -M ?
+ <braunr> haven't tested
+## IRC, freenode, #hurd, 2012-09-23
+ <braunr> tschwinge: i committed the last hurd pthread change,
+ <braunr> tschwinge: please tell me if you consider it ok for merging
+### IRC, freenode, #hurd, 2012-11-27
+ <youpi> braunr: btw, I forgot to forward here, with the glibc patch it does
+ boot fine, I'll push all that and build some almost-official packages for
+ people to try out what will come when eglibc gets the change in unstable
+ <braunr> youpi: great :)
+ <youpi> thanks for managing the final bits of this
+ <youpi> (and thanks for everybody involved)
+ <braunr> sorry again for the non obvious parts
+ <braunr> if you need the debian specific parts refined (e.g. nice commits
+ for procfs & others), i can do that
+ <youpi> I'll do that, no pb
+ <braunr> ok
+ <braunr> after that (well, during also), we should focus more on bug
+ hunting
+## IRC, freenode, #hurd, 2012-10-26
+ <mcsim1> hello. What does following error message means? "unable to adjust
+ libports thread priority: Operation not permitted" It appears when I set
+ translators.
+ <mcsim1> Seems has some attitude to libpthread. Also following appeared
+ when I tried to remove translator: "pthread_create: Resource temporarily
+ unavailable"
+ <mcsim1> Oh, first message appears very often, when I use translator I set.
+ <braunr> mcsim1: it's related to a recent patch i sent
+ <braunr> mcsim1: hurd servers attempt to increase their priority on startup
+ (when a thread is created actually)
+ <braunr> to reduce message floods and thread storms (such sweet names :))
+ <braunr> but if you start them as an unprivileged user, it fails, which is
+ ok, it's just a warning
+ <braunr> the second way is weird
+ <braunr> it normally happens when you're out of available virtual space,
+ not when shutting a translator donw
+ <mcsim1> braunr: you mean this patch: libports: reduce thread starvation on
+ message floods?
+ <braunr> yes
+ <braunr> remember you're running on darnassus
+ <braunr> with a heavily modified hurd/glibc
+ <braunr> you can go back to the cthreads version if you wish
+ <mcsim1> it's better to check translators privileges, before attempting to
+ increase their priority, I think.
+ <braunr> no
+ <mcsim1> it's just a bit annoying
+ <braunr> privileges can be changed during execution
+ <braunr> well remove it
+ <mcsim1> But warning should not appear.
+ <braunr> what could be done is to limit the warning to one occurrence
+ <braunr> mcsim1: i prefer that it appears
+ <mcsim1> ok
+ <braunr> it's always better to be explicit and verbose
+ <braunr> well not always, but very often
+ <braunr> one of the reasons the hurd is so difficult to debug is the lack
+ of a "message server" à la dmesg