summaryrefslogtreecommitdiff
path: root/community/gsoc
diff options
context:
space:
mode:
Diffstat (limited to 'community/gsoc')
-rw-r--r--community/gsoc/2013/hacklu.mdwn2099
-rw-r--r--community/gsoc/2013/nlightnfotis.mdwn3037
-rw-r--r--community/gsoc/project_ideas/download_backends.mdwn11
-rw-r--r--community/gsoc/project_ideas/mtab/discussion.mdwn2072
-rw-r--r--community/gsoc/project_ideas/object_lookups.mdwn29
-rw-r--r--community/gsoc/project_ideas/sound/discussion.mdwn47
6 files changed, 7290 insertions, 5 deletions
diff --git a/community/gsoc/2013/hacklu.mdwn b/community/gsoc/2013/hacklu.mdwn
new file mode 100644
index 00000000..b7de141b
--- /dev/null
+++ b/community/gsoc/2013/hacklu.mdwn
@@ -0,0 +1,2099 @@
+[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!toc]]
+
+
+# IRC, freenode, #hurd, 2013-06-23
+
+ <hacklu> braunr: sorry for the late reply. Honestly to say, the school
+ works had taken most of my time these days. I haven't got any
+ siginificant progress now. I am trying to write a little debugger demo on
+ Hurd.
+ <hacklu> braunr: things goes more hard than I think, these are some
+ differences between ptrace() on Hurd and Linux. I am trying to solve
+ this.
+
+
+# IRC, freenode, #hurd, 2013-06-24
+
+ <hacklu> this is my weekly report
+ http://hacklu.com/blog/gsoc-weekly-report1-117/.
+ <hacklu> and I have two main questions when I read the gdb source code.
+ <hacklu> 1/What is the S_exception_raise_request()? 2/what is the role of
+ ptrace in gdb port on Hurd?
+ <youpi> hacklu: where did you see S_exception_raise_request?
+ <hacklu> in gdb/gnu-nat.c
+ <youpi> ah, in gdb
+ <hacklu> yeah. and I have read the <The hurd hacking guide>. is says the S_
+ start means server stub.
+ <youpi> yes
+ <youpi> what happens is that gnu_wait keeps calling mach_msg
+ <youpi> to get a message
+ <youpi> then it passes that message to the various stubs servers
+ <youpi> see just below, it calls exc_server, among others
+ <youpi> and that's exc_server which ends up calling
+ S_exception_raise_request, if the message is an exception_raise request
+ <youpi> exc_server is a mere multiplexer, actually
+ <tschwinge> S_exception_raise_request is the implementation of the request
+ part (so one half of a typical RPC) of the Mach exception interface.
+ <tschwinge> See gdb/exc_request.defs in GDB and include/mach/exc.defs in
+ Mach.
+ <hacklu> youpi: how gnu_wait pass one message to exc_server? in which
+ function?
+ <youpi> in gnu_wait()
+ <youpi> && !exc_server (&msg.hdr, &reply.hdr)
+ <hacklu> oh, I see this.
+ <hacklu> firstly I think it is a type check simply.
+ <youpi> see the comment: "handle what we got"
+ <tschwinge> The Hurd's proc server also is involved in the exception
+ passing protocol (see its source code).
+ <hacklu> tschwinge: I will check the source code later. is the exception
+ take place in this way: 1. the inferior call ptrace(TRACE_ME). 2.the gdb
+ call task_set_exception_port. 3. mach send a notification to the
+ exception port set before. 4. gdb take some action.
+ <tschwinge> hacklu: Yes, that's it, roughly. The idea is that GDB replaces
+ a process' standard exception port, and replaces it "with itself", so
+ that when the process that is debugged receives and exception (by Mach
+ sending a exception_raise RPC), GDB can then catch that and act
+ accordingly.
+ <tschwinge> hacklu: As for your other questions, about ptrace: As you can
+ see in [glibc]/sysdeps/mach/hurd/ptrace.c, ptrace on Hurd is simply a
+ wrapper around vm_read/write and more interfaces.
+ <tschwinge> hacklu: As the GDB port for Hurd is specific to Hurd by
+ definition, you can also directly use these calls in GDB for Hurd.
+ <tschwinge> ..., as it is currently done.
+ <hacklu> and in detail, the part 3 mach send a notification to the
+ excetption port is like this: gnu_wait get the message in mach_msg, and
+ then pass it to exc_serer by exc_server(),then exc_server call
+ S_exception_raise_request()? ?
+ <hacklu> tschwinge: yeah, I have see the ptrace.c. I was wonder about
+ nobody use ptrace in Hurd except TRACEME...
+ <tschwinge> hacklu: Right about »and in detail, [...]«.
+ <tschwinge> hacklu: It would be very good (and required for your
+ understanding anyway), if you could write up a list of things that
+ happens when a process (both under the control of GDB as well as without
+ GDB) is sent an exception (due to a breakpoint instruction, for example).
+ <tschwinge> Let me look something up.
+ <hacklu> tschwinge: what's the function of exc_server? if I can get the
+ notification in mach_msg().
+ <youpi> to multiplex the message
+ <youpi> i.e. decoding it, etc. up to calling the S_ function with the
+ proper parameters
+ <youpi> exc_server being automatically generated, that saves a lot of code
+ <tschwinge> That is generated by MIG from the gdb/exc_request.defs file.
+ <tschwinge> You'll find the generated file in the GDB build directory.
+ <hacklu> I have wrote down the filenames. after this I will check that.
+ <tschwinge> hacklu: I suggest you also have a look at the Mach 3 Kernel
+ Principles book,
+ <http://www.gnu.org/software/hurd/microkernel/mach/documentation.html>.
+ <tschwinge> This also has some explanation of the thread/task's exception
+ mechanism.
+ <tschwinge> And of course, explains the RPC mechanism, which the exception
+ mechanism is built upon.
+ <tschwinge> And then, really make a step-by-step list of what happens; this
+ should help to better visualize what's going on.
+ <hacklu> ok. later I will update this list on my blog.
+ <tschwinge> hacklu: I cannot tell off-hand why GDB on Hurd is using
+ ptrace(PTRACE_TRACEME) instead of doing these calls manually. I will
+ have to look that up, too.
+ <hacklu> tschwinge: thanks.
+ <tschwinge> hacklu: Anyway -- you're asking sensible questions, so it seems
+ you're making progress/are on track. :-)
+ <hacklu> tschwinge: there is something harder than I had thought, I haven't
+ got any meaningful progress. sorry for this.
+ <tschwinge> hacklu: That is fine, and was expected. :-) (Also, you're
+ still busy with university.)
+ <hacklu> I will show more time and enthusiasm on this.
+ <tschwinge> hacklu: Oh, and one thing that may be confusing: as you may
+ have noticed, the names of the same RPC functions are sometimes slightly
+ different if different *.defs files. What is important is the subsystem
+ number, such as 2400 in [GDB]/gdb/exc_request.defs (and then incremented
+ per each routine/simpleroutine/skip directive).
+ <tschwinge> hacklu: Just for completeness, [hurd]/hurd/subsystems has a
+ list of RPC subsystems we're using.
+ <tschwinge> And the name given to routine 2400, for example, is just a
+ "friendly name" that is then used locally in the code where the *.defs
+ file has been processed by MIG.
+ <tschwinge> What a clumsy explanation of mine. But you'll get the idea, I
+ think. ;-)
+ <tschwinge> hacklu: And don't worry about your progress -- you're making a
+ lot of progress already (even if it doesn't look like it, because you're
+ not writing code), but the time spent on understanding these complex
+ issues (such as the RPC mechanism) definitely counts as progress, too.
+ <hacklu> tschwinge: not clearly to got it as I am not sensitive with the
+ MIG's grammer. But I know, the exc is the routine 2400's alias name?
+ <tschwinge> hacklu: I'd like to have you spend enough time to understand
+ these fundamental concepts now, and then switch to "hacking mode" (write
+ code) later, instead of now writing code but not understanding the
+ concepts behind it.
+ <hacklu> I have wrote a bit code to validate my understanding when I read
+ the soruce code. But the code not run. http://pastebin.com/r3wC5hUp
+ <tschwinge> The subsystem directive [...]. As well, let me just point you
+ to the documentation:
+ <http://www.gnu.org/software/hurd/microkernel/mach/mig/documentation.html>,
+ MIG - THE MACH INTERFACE GENERATOR, chapter 2.1 Subsystem identification.
+ <tschwinge> hacklu: Yes, writing such code for testing also is a good
+ approach. I will have to look at that in more detail, too.
+ * tschwinge guesses hacklu is probably laughing when seeing the years these
+ documents were written in (1989, etc.). ;-)
+ <hacklu> mach_msg make no sense in my code, and the process just hang. kill
+ -9 can't stop is either.
+ <braunr> hacklu: do you understand why kill -KILL might not work now ?
+ <hacklu> braunr: no, but I found I can use gdb to attach to that process,
+ then quit in gdb, the process quit too.
+ <hacklu> maybe that process was waiting a resume.
+ <braunr> something like that yes
+ <braunr> iirc it's related to a small design choice in the proc server
+ <braunr> something that put processes in an uninterruptible state when
+ being debugged
+ <hacklu> iirc ?
+ <braunr> if i recall cl=orrectly
+ <braunr> correctly*
+ <hacklu> like D status in linux?
+ <braunr> or T
+ <braunr> there has been a lot of improvements regarding signal handling in
+ linux over time so it's not really comparable now
+ <braunr> but that's the idea
+ <hacklu> in ps, i see the process STAT is THumx
+ <braunr> did you see that every process on the hurd has at least two
+ threads ?
+ <hacklu> no, but I have see that in hurd, the exception handler can't live
+ in the same context with the victim. so there must be at least two
+ threads. I think
+ <braunr> hacklu: yes
+ <braunr> that thread also handles regular signals
+ <braunr> in addition to mach exceptions
+ <braunr> (there are two levels of multiplexing in servers, first locating
+ the subsystem, then the server function)
+ <braunr> hacklu: if what i wrote is confusing, don't hesitate to ask for
+ clarifications (i really don't intend to make things confusing)
+ <hacklu> braunr: I don't know what you say about the "multiplexing in
+ servers". For instance, is it means how to pass message from mach_msg to
+ exc_server in gnu_wait()?
+ <braunr> hacklu: i said that the "message thread" handles both mach
+ exceptions and unix signals
+ <braunr> hacklu: these are two different interfaces (and in mach terms,
+ subsystems)
+ <braunr> hacklu: see hurd/msg.defs for the msg interface (which handles
+ unix signals)
+ <braunr> hacklu: to handle multiple interfaces in the same thread, servers
+ need to first find the right subsystem
+ <braunr> this is done by subsequently calling all demux functions until one
+ returns true
+ <braunr> (finding the right server function is done by these demux
+ functions)
+ <braunr> hacklu: see hurd/msgportdemux.c in glibc to see how it's done
+ there
+ <braunr> it's short actually, i'll past it here :
+ <braunr> return (_S_exc_server (inp, outp) ||
+ <braunr> _S_msg_server (inp, outp));
+ <braunr> hacklu: did that help ?
+ <hacklu> braunr: a bit more confusing. one "message thread" handles
+ exceptions and signals, means the message thread need to recive message
+ from two port. then pass the message to the right server which handle the
+ message. the server also should pick the right subsystem from a lot of
+ subsystems to handle the msg. is this ?
+ <braunr> the message thread is a server thread
+ <braunr> (which means every normal process is actually also a server,
+ receiving exceptions and signals)
+ <braunr> there may be only two ports, or more, it doesn't matter much, the
+ port set abstraction takes care of that
+ <hacklu> so the message thread directly pass the msg to the right
+ subsystem?
+ <braunr> not directly as you can see
+ <braunr> it tries them all until one is able to handle the incoming message
+ <braunr> i'm not sure it will help you with gdb, but it's important to
+ understand for a better general understanding of the system
+ <braunr> ugly sentence
+ <hacklu> ah, I see. like this in gnu-nat.c if(!notify_server(&msg.hdr,
+ &reply.hdr) && !exc_server(&msg.hdr...)
+ <braunr> yes
+ <hacklu> the thread just ask one by one.
+ <braunr> be careful about the wording
+ <braunr> the thread doesn't "send requests"
+ <braunr> it runs functions
+ <braunr> (one might be tempted to think there are other worker threads
+ waiting for a "main thread" to handle demultiplexing messages)
+ <hacklu> I got it.
+ <hacklu> the notify_server function is just run in the same context in
+ "message thread",and there is no RPC here.
+ <braunr> yes
+ <hacklu> and the notify_server code is generater by mig automatically.
+ <braunr> yes
+
+
+# IRC, freenode, #hurd, 2013-06-29
+
+[[!tag open_issue_documentation]]
+
+ <hacklu> I just failed to build the demo on
+ this. http://walfield.org/pub/people/neal/papers/hurd-misc/ipc-hello.c
+ <hacklu> or, example in machsys.doc called simp_ipc.c
+ <pinotree> we don't use cthreads anymore, but pthreads
+ <hacklu> pinotree: em.. and I also failed to find the <servers/env_mgr.h>
+ in example of <A programmer's guide to MACH system call>
+ <pinotree> that i don't know
+ <hacklu> maybe the code in that book out-of-date
+ <teythoon> hacklu: mig and mach ipc documentation is quite dated
+ unfortunately, and so are many examples floating around the net
+
+ <hacklu> btw, I have one more question. when I read <Mach 3 kernel
+ interface>. I find this state: When an exception occurs in a thread, the
+ thread sends an exception message to
+ <hacklu> its exception port, blocking in the kernel waiting for the receipt
+ of a reply. It is
+ <hacklu> assumed that some task is listening to this
+ <hacklu> port, using the exc_serverfunction to decode the messages and
+ then call the
+ <hacklu> linked in catch_exception_raise. It is the job of
+ catch_exception_raiseto handle the exception and decide the course of
+ action for thread.
+ <hacklu> that says, it assumed another task to recieve the msg send to one
+ thread's exception port. why another task?
+ <hacklu> I remmebered, there are at least two threads in one task, one is
+ handle the exception stuffs.
+ <braunr> there are various reasons
+ <braunr> first is, the thread causing the exception is usually not waiting
+ for a message
+ <braunr> next, it probably doesn't have all the info needed to resolve the
+ exception
+ <braunr> (depending on the system design)
+ <braunr> and yes, the second thread in every hurd process is the msg
+ thread, handling both mach exceptions and hurd signals
+ <hacklu> but in this state, I can't find any thing with the so called msg
+ thread
+ <braunr> ?
+ <hacklu> if exist a task to do the work, why we need this thread?
+ <braunr> this thread is the "task"
+ <hacklu> ?
+ <braunr> the msg thread is the thread handling exceptions for the other
+ threads in one task
+ <braunr> wording is important here
+ <braunr> a task is a collection of resources
+ <braunr> so i'm only talking about threads really
+ <braunr> 14:11 < hacklu> assumed that some task is listening to this
+ <braunr> this is wrong
+ <braunr> a task can't listen
+ <braunr> only a thread can
+ <hacklu> in you words, the two thread is in the same task?
+ <braunr> yes
+ <braunr> 14:32 < braunr> and yes, the second thread in every hurd process
+ is the msg thread, handling both mach exceptions and hurd signals
+ <braunr> process == task here
+ <hacklu> yeah, I always think the two thread stay in one task. but I found
+ that state in <mach 3 kernel interface>. so I confuzed
+ <hacklu> s/confuzed/confused
+ <braunr> statement you mean
+ <hacklu> if two thread stay in the same task. and the main thread throw a
+ exception, the other thread to handle it?
+ <braunr> depends on how it's configured
+ <braunr> the thread receiving the exceptions might not be in the same task
+ at all
+ <braunr> on the hurd, only the second thread of a task receives exception
+ <braunr> s
+ <hacklu> I just wonder how can the second thread catch the exception from
+ its containning task
+ <braunr> forget about tasks
+ <braunr> tasks are resource containers
+ <braunr> they don't generate or catch exceptions
+ <braunr> only threads do
+ <braunr> for each thread, there is an exception port
+ <braunr> that is, one receive right, and potentially many send rights
+ <braunr> the kernel uses a send right to send exceptions
+ <braunr> the msg thread waits for messages on the receive right
+ <braunr> that's all
+ <hacklu> ok. if I divide zero in main thread, the kernel will send a msg to
+ the main thread's exception port. and then, the second thread(in the same
+ task) is waiting on that port. so he get the msg. is it right?
+ <braunr> don't focus on main versus msg thread
+ <braunr> it applies to all other threads
+ <braunr> as well
+ <braunr> otherwise, you're right
+ <hacklu> ok, just s/main/first
+ <braunr> no
+ <braunr> main *and* all others except msg
+ <hacklu> main *and* all others except msg ?
+ <braunr> the msg thread gets exception messages for all other threads in
+ its task
+ <braunr> (at least, that's how the hurd configures things)
+ <hacklu> got it.
+ <hacklu> if the msg thread throw exception either, who server for himself?
+ <braunr> i'm not sure but i guess it's simply forbidden
+ <hacklu> i used gdb to attach a little progrom which just contains a divide
+ zero. and I only found the msg thread is in the glibc.
+ <braunr> yes
+ <hacklu> where is the msg thread located in.
+ <braunr> it's created by glibc
+ <hacklu> is it glibc/hurd/catch-exc.c?
+ <braunr> that's the exception handling code, yes
+ <hacklu> there are some differences between the code and the state in <mach
+ 3 system interface>.
+ <braunr> state or statement ?
+ <hacklu> staement
+ <braunr> which one ?
+ <hacklu> http://pastebin.com/ZTBrUAsV
+ When an exception occurs in a thread, the thread sends an exception
+ message to
+ its exception port, blocking in the kernel waiting for the receipt of a
+ reply. It is
+ assumed that some task is listening (most likely with mach_msg_server)
+ to this
+ port, using the exc_serverfunction to decode the messages and then
+ call the
+ linked in catch_exception_raise. It is the job of
+ catch_exception_raiseto handle the exception and decide the course of
+ action for thread. The state of the
+ blocked thread can be examined with thread_get_state.
+ <braunr> what difference ?
+ <hacklu> in the code, I can't find things like exc_server,mach_msg_server
+ <braunr> uh
+ <braunr> ok it's a little tangled
+ <braunr> but not that much
+ <braunr> you found the exception handling code, and now you're looking for
+ what calls it
+ <braunr> simple
+ <braunr> see _hurdsig_fault_init
+ <hacklu> from that statemnet I thought there are another _task_ do the
+ exception things for all of the systems thread before you have told me
+ the task means the msg thread.
+ <braunr> again
+ <braunr> 14:47 < braunr> forget about tasks
+ <braunr> 14:47 < braunr> tasks are resource containers
+ <braunr> 14:47 < braunr> they don't generate or catch exceptions
+ <braunr> 14:47 < braunr> only threads do
+ <hacklu> yeah, I think that document need update.
+ <braunr> no
+ <braunr> it's a common misnomer
+ <braunr> once you're used to mach concepts, the statement is obvious
+ <hacklu> braunr: so I need read more :)
+ <hacklu> _hurdsig_fault_init send exceptions for the signal thread to the
+ proc server?
+ <hacklu> why come about _proc_ server?
+ <braunr> no it gives the proc server a send right for signals
+ <braunr> exceptions are a mach thing, signals are a hurd thing
+ <braunr> the important part is
+ <braunr> err = __thread_set_special_port (_hurd_msgport_thread,
+ <braunr> THREAD_EXCEPTION_PORT, sigexc);
+ <hacklu> this one set the exception port?
+ <braunr> yes
+ <braunr> hm wait
+ <braunr> actually no, wrong part :)
+ <braunr> this sets the excpetion port for the msg thread (which i will call
+ the signal thread as mentioned in glibc)
+ <hacklu> but the comment above this line, Direct signal thread exceptions
+ to the proc server means what?
+ <braunr> that the proc server handles exceptions on the signal thread
+ <hacklu> the term signal thread equals the term msg thread?
+ <braunr> yes
+ <hacklu> so, the proc server handles the exceptions throwed by the msg
+ thread?
+ <braunr> looks that way
+ <hacklu> feels a little strange.
+ <braunr> why ?
+ <braunr> this thread isn't supposed to cause exceptions
+ <braunr> if it does, something is deeply wrong, and something must clean
+ that task up
+ <braunr> and the proc server seems to be the most appropriate place from
+ where to do it
+ <hacklu> why need a special server to just work the msg thread? I don't
+ think that thread will throw exception frequentlly
+ <braunr> what does frequency have to do with anything here ?
+ <braunr> ok the appropriate code is _hurdsig_init
+ <braunr> the port for receiving exceptions is _hurd_msgport
+ <braunr> the body of the signal thread is _hurd_msgport_receive
+ <hacklu> aha, in the _hurd_msgport_receive I have finally found the
+ while(1) loop mach_msg_server().
+ <hacklu> so the code is conform with the documents.
+ <hacklu> braunr: [21:18] <braunr> what does frequency have to do with
+ anything here ? yes, I have totally understood your words now. thank you
+ very much.
+ <braunr> :)
+
+
+# IRC, freenode, #hurd, 2013-07-01
+
+ <hacklu> hi. this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report2-124/ welcome to any
+ comment
+ <hacklu> teythoon: I only get clear about the rpc stuff. seems a lot behind
+ my plan
+ <youpi> good progress :)
+ <hacklu> I have wrote the details of the exception handle which was asked
+ by tschwing_ last week. Am I all right in my post?
+ <youpi> hacklu: as far as I understand signals, yes :)
+ <hacklu> youpi: thanks for god, I am on the right way finally... :)
+ <hacklu> the mig book says simpleroutine is the one use to implement asyn
+ RPCs which doesn't expect an reply. But I have found a place to pass an
+ reply port to the RPC interface which has been declared as simpleroutine
+ <youpi> hacklu: probably the simpleroutine hardcodes a reply port?
+
+ <youpi> hacklu: about _hurd_internal_post_signal, this is the hairiest part
+ of GNU/Hurd, signal handling
+ <youpi> simply because it's the hairiest part of POSIX :)
+ <youpi> you probably want to just understand that it implements the
+ POSIXity of signal delivering
+ <youpi> i.e. deliver/kill/suspend the process as appropriate
+ <youpi> I don't think you'll need to dive more
+ <hacklu> aha.
+ <hacklu> it will save a lot of time.
+ <hacklu> it seems like the wait_for_inferior() in gdb. which also has too
+ many lines and too many goto
+ <youpi> hacklu: btw, which simpleroutine were you talking about ?
+ <hacklu> I forget where it is, I am finding it now.
+ <youpi> which version of gdb are you looking the source of?
+ <youpi> (in mine, wait_for_inferior is only 45 lines long)
+ <hacklu> I dont know how to pick the verison, I just use the git
+ version. maybe I give a wrong name.
+ <youpi> ok
+ <hacklu> youpi:I remembered, my experience comes from here
+ http://www.aosabook.org/en/gdb.html. (All of this activity is managed by
+ wait_for_inferior. Originally this was a simple loop, waiting for the
+ target to stop and then deciding what to do about it, but as ports to
+ various systems needed special handling, it grew to a thousand lines,
+ with goto statements criss-crossing it for poorly understood
+ <hacklu> reasons.)
+ <hacklu> youpi: the simpleroutine is gdb/gdb/exc_request.defs
+ <youpi> so there is indeed an explicit reply port
+ <hacklu> but simpleroutine is for no-reply use. why use reply port here?
+ <youpi> AIUI, it's simply a way to make the request asynchronous, but still
+ permit an answer
+ <hacklu> ok, I will read the mig book carefully.
+ <braunr> hacklu: as youpi says
+ <braunr> a routine can be broken into two simpleroutines
+ <braunr> that's why some interfaces have interface.defs,
+ interface_request.defs and interface_reply.defs files
+ <braunr> nlightnfotis: in mach terminology, a right *is* a capability
+ <braunr> the only thing mach doesn't easily provide is a way to revoke them
+ individually
+ <nlightnfotis> braunr: Right. And ports are associated with the process
+ server and the kernel right? I mean, from what I have understood, if a
+ process wants to send a signal to another one, it has to do so via the
+ ports to that process held by the process server
+ <nlightnfotis> and it has to establish its identity before doing so, so
+ that it can be checked if it has the right to send to that port.
+ <braunr> yes
+ <nlightnfotis> do process own any ports? or are all their ports associated
+ with the process server?
+ <nlightnfotis> *processes
+ <braunr> mach ports were intended for a lot of different uses
+ <braunr> but in the hurd, they mostly act as object references
+ <braunr> the process owning the receive right (one at most per port)
+ implements the object
+ <braunr> processes owning send rights invoke methods on the object
+ <braunr> use portinfo to find out about the rights in a task
+ <braunr> (process is the unix terminology, task is the mach terminologyà
+ <braunr> )
+ <braunr> i use them almost interchangeably
+ <nlightnfotis> ahh yes, I remember about the last bit. And mach tasks have
+ a 1 to 1 association with user level processes (the ones associated with
+ the process server)
+ <braunr> the proc server is a bit special because it has to know about all
+ processes
+ <braunr> yes
+
+In context of [[open_issues/libpthread/t/fix_have_kernel_resources]]:
+
+ <braunr> hacklu: if you ever find out about either glibc or the proc server
+ creating one receive right for each thread, please let me know
+
+
+# IRC, freenode, #hurd, 2013-07-07
+
+ <hacklu> how fork() goes?
+ <pinotree> see sysdeps/mach/hurd/fork.c in glibc' sources
+ <hacklu> when the father has two thread( main thread and the signal thead),
+ if the father call fork, then the child inmediatelly call exev() to
+ change the excute file. how many thread in the children?
+ <hacklu> For instance, the new execute file also have two thread.
+ <hacklu> will the exev() destroyed two threads and then create two new?
+ <hacklu> s/exev()/excv()
+ <hacklu> s/exev()/exec() :)
+
+ <hacklu> what libhurduser-2.13.so does?
+ <hacklu> where can I find this source?
+ <pinotree> contains all the client stubs for hurd-specific RPCs
+ <pinotree> it is generated and built automatically within the glibc build
+ process
+
+ <hacklu> and what is the "proc" server?
+ <pinotree> what handles in user spaces the processes
+ <hacklu> so if I call proc_wait_request(), I will go into the
+ S_proc_wait_reply?
+ <hacklu> thanks, I have found that.
+
+
+# IRC, freenode, #hurd, 2013-07-08
+
+ <hacklu> hi, this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report3-137/
+ <hacklu> this week I have met a lot of obstacles. And I am quite desired to
+ participate in this meeting.
+ <tschwinge> hacklu: So from your report, the short version is: you've been
+ able to figure out how the things work that you were looking at (good!),
+ and now there are some new open questions that you're working on now.
+ <tschwinge> hacklu: That sounds good. We can of course try to help with
+ your open questions, if you're stuck figuring them out on your own.
+ <hacklu> tschwinge: the most question is: what is the proc server? why need
+ to call proc_get_reqeust() before the mach_msg()?
+ <hacklu> and Is there exist any specific running sequence between father
+ and child task after fork()? And I found the inferior always call the
+ trace_me() in the same time(the trace me printf always in the same line
+ of the output log). which I have post in my report.
+ <tschwinge> hacklu: The fork man-page can provide a high-level answer to
+ your Q3: »The child process is created with a single thread—the one that
+ called fork(). The entire virtual address space of the parent is
+ replicated in the child, including the states of mutexes, condition
+ variables, and other pthreads objects [...]«
+ <tschwinge> hacklu: What happens in GNU Hurd is that the signal thread is
+ also "cloned" (additionally to the thread which called fork), but then it
+ (the signal thread) is re-started from the beginning. (So this is very
+ much equivalent to creating a new signal thread.)
+ <tschwinge> hacklu: Then, upon exec, a new memory image is created/loaded,
+ replacing the previous one. [glibc]/sysdeps/mach/hurd/execve.c. What
+ actually happens with the existing thread (in particular, the signal
+ thread) I don't know off-hand. Then answer is probably found in
+ [glibc]/hurd/hurdexec.c -- and perhaps some code of the exec server
+ ([hurd]/exec/).
+ <hacklu> I have checked the status of my regiter mail to FSF. it says it
+ had arrived in USA.
+ <tschwinge> hacklu: OK, good.
+ <tschwinge> hacklu: This is some basic information about the observer_*
+ functions is GDB:
+ http://sourceware.org/gdb/current/onlinedocs/gdbint/Algorithms.html#index-notifications-about-changes-in-internals-57
+ »3.10 Observing changes in gdb internals«.
+ <hacklu> tschwinge: not too clear. I will think this latter. and what is
+ the proc server?
+ <teythoon> hacklu: /hurd/proc, maps unix processes to mach threads afaiui
+ <hacklu> teythoon: question is, the mach_msg() will never return unless I
+ called proc_wait_request() first.
+ <teythoon> hacklu: sorry, I've no idea ;)
+ <hacklu> teythoon: :)
+ <tschwinge> hacklu: I will have to look into that myself, too; don't know
+ the answer off-hand.
+ <tschwinge> hacklu: In your blog you write proc_get_request -- but such a
+ functions doesn't seems to exist?
+ <hacklu> tschwinge: s/proc_get_request/proc_wait_request called in
+ gun_wait() [gnu-nat.c]
+ <tschwinge> hacklu: Perhaps the wait man-page's description of WUNTRACED
+ gives a clue: »also return if a child has stopped [...]«. But it also to
+ me is not yet clear, how this relates to the mach_mag call, and how the
+ proc server exactly is involved in it.
+ <tschwinge> I'm reading various source code files.
+ <tschwinge> At least, I don't undestand why it is required for an exception
+ to be forwarded.
+ <hacklu> if I need to read the proc server source code?
+ <tschwinge> I can see how it to become relevant for the case that GDB has
+ to be informed that the debugee has exited normally.
+ <tschwinge> hacklu: Yeah, probably you should spend some time with that, as
+ it will likely help to get a clearer picture of the situation, and is
+ relevant for other interactions in GDB, too.
+ <tschwinge> hacklu: By the way, if you find that pieces of the GDB source
+ code (especially the Hurd files of it) are insufficiently documented,
+ it's a very good idea, once you have figured out something, to add more
+ source code comments to the existing code. Or writed these down
+ separately, if that is easier.
+ <hacklu> which is the proc server? hurd/exec ?
+ <hacklu> that ok, I already comment things on my notes.
+ <tschwinge> hacklu: [Hurd]/proc/
+ <tschwinge> hacklu: And [Hurd]/hurd/process*.defs
+ <hacklu> got it
+ <tschwinge> hacklu: I'll have to experiment a bit with your HDebugger
+ example, but I'm out of time right now, sorry. Will continue later.
+ <hacklu> tschwinge: yep, the HDebugger has a problem, if you put the
+ sleep() after the printf in the just_print(), thing will hang.
+ <hacklu> tschwinge: and I am a little curious about how do you find my
+ code? I dont't remember I have mentioned that :)
+ <hacklu> tschwinge: I have post my gihub link in the last week report, I
+ found that.
+ <tschwinge> hacklu: That's how I found it, yes.
+ <hacklu> tschwinge: :)
+
+
+# IRC, freenode, #hurd, 2013-07-14
+
+ <hacklu> hi. what is a process's msgport?
+ <hacklu> And where can I find the msg_sig_post_untraced_request()?
+ <hacklu> (msg_sig_post* in [hurd]/hurd/msg_defs)
+ <hacklu> this is my debugger demo code
+ https://github.com/hacklu/HDebugger.git use make test to run the demo. I
+ put a breakpoint before the second printf in hello_world(inferior
+ program). but I can't resume execution from that.
+ <hacklu> could somebody give me some suggestions? thanks so much.
+ <teythoon> hacklu: % make test
+ <teythoon> make: *** No rule to make target `exc_request_S.c', needed by
+ `all'. Stop.
+ <hacklu_> teythoon: updated, forget to git add that file .
+ <teythoon> hacklu_: cool, seems to work now
+ <teythoon> will look into this tomorrow :)
+ <hacklu_> exit
+ <hacklu_> teythoon: not work. the code can,t resume from a breakpoint
+
+
+# IRC, freenode, #hurd, 2013-07-15
+
+ <hacklu> hi, this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report4-148/
+ <hacklu> sadly to unsolve the question of resume from breakpoint.
+ <teythoon> hacklu: have you tried to figure out what gdb does to resume a
+ process?
+ <hacklu> teythoon: hi. em, I have tried, but haven't find the magic in gdb
+ yet.
+ <teythoon> have you tried rpctrace'ing gdb?
+ <hacklu> no, rpctrace has too many noise. I turned on the debug in gdb.
+ <hacklu> I don't want rpctrace start gdb as its child task. if it can
+ attach at some point instead of at start
+ <teythoon> hacklu: you don't need to use gdb interactively, you could pipe
+ some commands to it
+ <hacklu> teythoon: that sounds a possible way. I am try it, thank you
+ <hacklu> youpi: gdb can't work correctlly with rpctrace even in batch
+ mode.
+ <hacklu> get something like this "rpctrace: get an unknown send right from
+ process 2151"
+ <youpi> hacklu: well, ideally, fix rpctrace );
+ <youpi> ;)
+ <youpi> hacklu: but you can also as on the list, perhaps somebody knows
+ what you need
+ <hacklu> ok.
+ <hacklu> or I should debug gdb more deeply.
+ <youpi> do both
+ <youpi> so either of them may win first
+
+ <hacklu> braunr: I have found that, if there is no exception appears, the
+ signal thread will not be createed. Then there is only one thread in the
+ task.
+
+
+# IRC, freenode, #hurd, 2013-07-17
+
+ <hacklu__> braunr: ping
+ <braunr> hacklu__: yes ?
+ <hacklu__> I have reply your email
+ <braunr> i don't understand
+ <braunr> "I used this (&_info)->suspend_count to get the sc value."
+ <braunr> before the thread_info call ?
+ <hacklu__> no, after the call
+ <braunr> but you have a null pointer
+ <braunr> the info should be returned in info, not _info
+ <hacklu__> strange thing is the info is a null pointer. but _info not
+ <braunr> _info isn't a pointer, that's why
+ <braunr> the kernel will use it if the data fits, which is usually the case
+ <hacklu__> in the begin , the info=&_info.
+ <braunr> and it will dynamically allocate memory if it doesn't
+ <braunr> yes
+ <braunr> info should still have that value after the call
+ <hacklu__> but the call had change it. this is what I can;t understand.
+ <braunr> are you completely sure err is 0 on return ?
+ <hacklu__> since the parameter is a pointer to pointer, the thread_info can
+ change it , but I don't think it is a good ideal to set it to null
+ pointer without any err .
+ <hacklu__> yes. i am sure
+ <braunr> info_len is wrong
+ <braunr> it should be the number of integers in _info
+ <braunr> i.e. sizeof(_info) / sizeof(unsigned int)
+ <braunr> i don't think that's the problem though
+ <braunr> yes, THREAD_BASIC_INFO_COUNT is already exactly that
+ <braunr> hm not exactly
+ <braunr> yes, exactly in fact
+ <hacklu__> I try to set it by hand, not use the macro.
+ <braunr> the macro is already defined as #define THREAD_BASIC_INFO_COUNT
+ (sizeof(thread_basic_info_data_t) / sizeof(natural_t))
+ <hacklu__> the info_len is 13. I checked.
+ <braunr> so, i said something wrong
+ <braunr> the call doesn't reallocate thread_info
+ <braunr> it uses the provided storage, nothing else
+ <braunr> yes, your call is wrong
+ <braunr> use thread_info (thread->port, THREAD_BASIC_INFO, (int *) info,
+ &info_len);
+ <hacklu__> em. thread_info (thread->port, THREAD_BASIC_INFO, (int *) &info,
+ &info_len);
+ <braunr> &info would make the kernel erase the memory where info (the
+ pointer) was stored
+ <braunr> info, not &info
+ <braunr> or &_info directly
+ <braunr> i don't see the need for an intermediate pointer here
+ <braunr> ideally, avoid the cast
+ <hacklu__> but in gnu-nat.c line 3338, it use &info.
+ <braunr> use a union with both thread_info_data_t and
+ thread_basic_info_data_t
+ <braunr> well, try it my way
+ <braunr> i think they're wrong
+ <hacklu__> ok, you are right, use info it is ok. the value is the same as
+ &_info after the call.
+ <hacklu__> but the suspend_count is zero again.
+ <braunr> check the rest of the result to see if it's consistent
+ <hacklu__> I think this line need a patch.
+ <hacklu__> what you mean the rest of the result?
+ <braunr> the thread info
+ <braunr> run_state, sleep_time, creation_time
+ <braunr> see if they make sense
+ <hacklu__> ok, I try to dump it
+ <braunr> bbl
+ <hacklu__> braunr: thread [118] suspend_count=0
+ <hacklu__> run_state=3, flags=1, sleep_time=0,
+ creation_time.second=1374079641
+ <hacklu__> something like this, seems no problems.
+
+
+# IRC, freenode, #hurd, 2013-07-18
+
+ <hacklu__> how to get the thread state from TH_STATE_WAITING to
+ TH_STATE_RUNNING
+ <braunr> hacklu__:
+ http://www.gnu.org/software/hurd/gnumach-doc/Thread-Execution.html#Thread-Execution
+ <braunr> hacklu__: ah waiting
+ <braunr> hacklu__: this means the thread is waiting for an event
+ <braunr> so probably waiting for a message
+ <braunr> or an internal kernel event
+ <hacklu__> braunr: so I need to send it a message. I think I maybe forget
+ to send some reply message.
+ <braunr> hacklu__: i'm really not sure about those low level details
+ <braunr> confirm before doing anything
+ <hacklu__> the gdb has called msg_sig_post_untraced_request(), I don't get
+ clear about this function, I just call it as the same, maybe I am wrong .
+ <hacklu__> how will if I send a CONT to the stopped process? maybe I should
+ try this.
+ <hacklu__> when the inferior is in waiting
+ status(TH_STATE_WAITING,suspend_count=0), I use kill to send a CONT. then
+ the become(TH_STATE_STOP,suspend_count=1). when I think I am near the
+ success,I call thread_resume(),inferior turn out to be (TH_STATE_WAITING,
+ suspend_count=0).
+ <braunr> so yes, probably waiting for a message
+ <hacklu__> braunr: after send a CONT to the inferior, then send a -9 to the
+ debugger, the inferior continue!!!
+ <braunr> probably because it was notified there wasn't any sender any more
+ <hacklu__> that's funny, I will look deep into thread_resume and kill
+ <braunr> (gdb being the sender here)
+ <hacklu__> in hurd, when gdb attach a inferior, send signal to the
+ inferior, who will get the signal first? the gdb or the inferior?
+ <hacklu__> quite differnet with linux. seems the inferior get first
+ <braunr> do you mean gdb catches its own signal through ptrace on linux ?
+ <hacklu__> kkk
+ <braunr> ?
+
+
+# IRC, freenode, #hurd, 2013-07-20
+
+ <hacklu> braunr: yeah, on Linux the gdb catch the signal from inferior
+ before the signal handler. And that day my network was broken, I can't
+ say goodbye to you. sorry for that.
+
+
+# IRC, freenode, #hurd, 2013-07-22
+
+ <hacklu> hi all, this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report5-152/
+ <teythoon> good to hear that you got the resume issue figured out
+ <hacklu> teythoon: thanks :)
+ <teythoon> hacklu: so your next step is to port gdbserver to hurd?
+ <hacklu> yep, I am already begin to.
+ <hacklu> before the mid-evaluate, I must submit something. I am far behind
+ my personal expections
+ <tschwinge> hacklu: You've made great progress! Sorry, for not being able
+ to help you very much: currently very busy with work. :-|
+ <tschwinge> hacklu: Working on gdbserver now is fine. I understand you
+ have been working on HDebugger to get an understanding of how everyting
+ works, outside of the huge GDB codebase. It's of course fine to continue
+ working on HDebugger to test things, etc., and that also counts very much
+ for the mid-term evaluation, so nothing to worry about. :-)
+ <hacklu> but I have far away behind my application on GSOC. I haven't
+ submit any patches. is it ok?
+ <tschwinge> hacklu: Don't worry. Before doing the actual work, things
+ always look much simpler than they will be. So I was expecting/planning
+ for that.
+ <tschwinge> The Hurd system is complex, with non-trivial and sometimes
+ asynchronous communication between the different components, and so it
+ takes some time to get an understanding of all that.
+ <hacklu> yes, I haven't get all clear about the signal post. that's too
+ mazy.
+ <tschwinge> hacklu: It surely is, yes.
+ <hacklu> tschwinge: may you help me to understand the msg_sig_post(). I
+ don't want to understand all details now, but I want to get the _right_
+ understanding of the gerneral.
+ <hacklu> as I have mentioned on my weekly report, gdb is listening on the
+ inferior's exception port, then gdb post a signal to that port. That
+ says: gdb post a message to herself, and handle it. is this right?
+ <hacklu> tschwinge: [gdb]/gdb/gnu-nat.c (line 1371), and
+ [glibc]/hurd/hurdsig.c(line 1390)
+ <tschwinge> hacklu: My current understanding is that this is a "real"
+ signal that is sent to the debugged process' signal thread (msgport), and
+ when that process is resumed, it will process that signal.
+ <tschwinge> hacklu: This is different from the Mach kernel sending an
+ exception signal to a thread's exception port, which GDB is listening to.
+ <tschwinge> Or am I confused?
+ <hacklu> is the msgport equal the exception port?
+ <hacklu> in my experience, when the thread haven't cause a exception, the
+ signal thread will not be created. after the exception occured, the
+ signal thread is come out. so somebody create it, who dose? the mach
+ kernel?
+ <tschwinge> hacklu: My understanding is that the signal thread would always
+ be present, because it is set up early in a process' startup.
+ <hacklu> but when I call task_threads() before the exception appears, only
+ on thread returned.
+ <tschwinge> "Interesting" -- another thing to look into.
+ <tschwinge> hacklu: Well, you must be right: GDB must also be listening to
+ the debugged process' msgport, because otherwise it wouldn't be able to
+ catch any signals the process receives. Gah, this is all too complex.
+ <hacklu> tschwinge: that's maybe not. gdb listening on the task's exception
+ port, and the signal maybe handle by the signal thread if it could
+ handle. otherwise the signal thread pass the exception to the task's
+ exception port where gdb catched.
+ <tschwinge> hacklu: Ah, I think I now get it. But let me first verify...
+ ;-)
+
+ <hacklu> something strange. I have write a program to check whether create
+ signal threads at begining, the all created!
+ <hacklu> tschwinge: this is my test code and
+ result. http://pastebin.com/xtM6DUnG
+ cat test.c
+ #define _GNU_SOURCE 1
+ #include <stdlib.h>
+ #include <stdio.h>
+ #include <errno.h>
+ #include <mach.h>
+ #include <mach_error.h>
+ int main(int argc,char** argv)
+ {
+ mach_port_t task_port;
+ thread_array_t threads[5];
+ mach_msg_type_number_t num_threads[5];
+ error_t err;
+ task_port = mach_task_self();
+ int i;
+ int j;
+ for(i=0;i<5;i++)
+ if(task_port){
+ err = task_threads(task_port,&threads[i],&num_threads[i]);
+ if(err)
+ printf("err\n");
+ }
+ for(i=0;i<5;i++){
+ printf("===============\n");
+ printf("has %d threads now\n",num_threads[i]);
+ for(j=0;j<num_threads[i];j++)
+ printf("thread[%d]=%d\n",j,threads[i][j]);
+ }
+ return 0;
+ }
+
+
+ and the output
+ ./a.out
+ ===============
+ has 2 threads now
+ thread[0]=87
+ thread[1]=97
+ ===============
+ has 2 threads now
+ thread[0]=87
+ thread[1]=97
+ ===============
+ has 2 threads now
+ thread[0]=87
+ thread[1]=97
+ ===============
+ has 2 threads now
+ thread[0]=87
+ thread[1]=97
+ ===============
+ has 2 threads now
+ thread[0]=87
+ thread[1]=97
+ <hacklu> tschwinge: the result is different with HDebugger case.
+
+ <tschwinge> hacklu: It is my understanding that the two sig_post_untraced
+ RPC calls in inf_signal indeed are invoked on the real msgport (signal
+ thread) if the debugged process.
+ <tschwinge> That port is retrieved via the
+ INF_MSGPORT_RPC/INF_RESUME_MSGPORT_RPC macro, which invoked
+ proc_getmsgport on the proc server, and that will return (unless
+ overridden by proc_setmsgport, but that isn't done in GDB) the msgport as
+ set by [glibc]/hurd/hurdinit.c:_hurd_new_proc_init or _hurd_setproc.
+ <tschwinge> inf_signal is called from gnu_resume, which is via
+ [target_ops]->to_resume is called from target.c:target_resume, which is
+ called several places, for example infrun.c:resume which is used to a)
+ just resume the debugged process, or b) resume it and have it handle a
+ Unix signal (such as SIGALRM, or so), when using the GDB command »signal
+ SIGALRM«, for example.
+ <tschwinge> So such a signal would then not be intercepted by GDB itself.
+ <tschwinge> By the way, this is all just from reading the code -- I hope I
+ got it all right.
+
+ <tschwinge> Another thing: In Mach 3 Kernel Principles, the standard
+ sequence described on pages 22, 23 is thread_suspend, thread_abort,
+ thread_set_state, thread_resume, so you should probably do that in
+ HDebugger too, and not call thread_set_state before.
+ <tschwinge> I would hope the GDB code also follows the standard sequence?
+ Can you please check that?
+
+ <tschwinge> The one thing I'm now confused about is where/how GDB
+ intercepts the standard setup (probably in glibc's signaling mess?) so
+ that it receives any signals raised in the debugged process.
+ <tschwinge> But I'll have to continue later.
+
+ <hacklu___> tschwinge: thanks for your detail answers. I don't realize that
+ the gnu_resume will resume for handle a signal, much thanks for point
+ this:)
+ <hacklu___> tschwinge: I am not exactly comply with <Mach 3 kernel
+ principles> when I call thread_set_state. but I have called a
+ task_suspend before. I think it's not too bad:)
+ <tschwinge> hacklu___: Yes, but be aware that gnu_resume is only relevant
+ if a signal is to be forwarded to the debugged process (to be handled
+ there), but not for the case where GDB intercepts the signal (such as
+ SIGSEGV), and handles it itself without then forwarding it to the
+ application. See the »info signals« GDB command.
+ <hacklu___> I also confused about when to start the signal thread. I will
+ do more experiment.
+ <hacklu___> I have found this: when the inferior is stop at a breakpoint, I
+ use kill to send a CONT to it, the HDebugger will get this message who
+ listening on the exception port.
+
+
+# IRC, freenode, #hurd, 2013-07-28
+
+ <hacklu_> how to understand the rpctrace output?
+ <hacklu_> like this. 142<--143(pid15921)->proc_mark_stop_request (19 0)
+ 125<--1
+ <hacklu_> 27(pid-1)->msg_sig_post_request (20 5 task108(pid15919));
+ <hacklu_> what is the (pid-1)? the kernel?
+ <teythoon> 1 is /hurd/init
+ <hacklu_> pid-1 not means minus 1?
+ <teythoon> ah, funny, you're right... I dunno then
+ <teythoon> 2 is the kernel though
+ <hacklu_> the 142<--143 is port name?
+ <teythoon> could very well be, but I'm not sure, sorry
+ <hacklu_> the number must be the port name.
+ <teythoon> anyone knows why /hurd/init does not get dead name notifications
+ for /hurd/exec like it does for any other essential server?
+ <teythoon> as far as I can see it successfully asks for them
+ <teythoon> about rpctrace, it poses as the kernel for its children, parses
+ and relays any messages sent over the childrens message port, right?
+
+
+# IRC, freenode, #hurd, 2013-07-29
+
+ <hacklu_> hi. this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report6-156/
+ <teythoon> hacklu_: the inferior voluntarily stops itself if it gets a
+ signal and notifies its tracer?
+ <hacklu_> yes
+ <teythoon> what if it chose not to do so? undebugable program?
+ <hacklu_> debugged program will be set an flag so called
+ hurdsig_traced. normal program will handle the signal by himself.
+ <hacklu_> in my env, I found that when GDB attach a running program, gdb
+ will not catch the signal send to the program. May help me try it?
+ <teythoon> it doesn't? I'll check...
+ <teythoon> hacklu_: yes, you're right
+ <hacklu_> you can just gdb a loop program, and kill -CONT to it. If I do
+ this I will get "Can't wait for pid 12332:NO child processes" warning.
+ <teythoon> yes, I noticed that too
+ <teythoon> does gdb reparent the tracee?
+ <hacklu_> I don't think this is a good behavior. gdb should get inferior's
+ signal
+ <teythoon> absolutely
+ <hacklu_> In linux it does, not sure about hurd. but I think it should.
+ <teythoon> definitively. there is proc_child in process.defs, but that may
+ only be used once to set the parent of a process
+ <hacklu_> gdb doesn't set the inferior as its child process if attached a
+ running procss in HURD.
+
+ <tschwinge> hacklu_: So you figured out this tracing/signal stuff. Great!
+ <hacklu_> tschwinge: Hi. not exactly.
+ <hacklu_> as I have mentioned, gdb can't get signal when attach to a
+ running process.
+ <hacklu_> I also want to know how to build glibc in hurd. I have got this "
+ relocation error: ./libc.so: symbol _dl_find_dso_for_object, version
+ GLIBC_PRIVATE not defined in file ld.so.1 with link time reference" when
+ use LD_PRELOAD=./my_build_glibc/libc.so
+ <tschwinge> hacklu: You can't just preload the new libc.so, but you'll also
+ need to use the new ld.so. Have a look at [glibc-build]/testrun.sh for
+ how to invoke these properly. Or, link with
+ »-Wl,-dynamic-linker=[glibc-build]/elf/ld.so,-rpath,[glibc-build]:[glibc-build]/elf
+ -L [glibc-build] -L [glibc-build]/elf«. If using the latter, I suggest
+ to also add »-Wl,-t« to verify that you're linking against the correct
+ libraries, and »ldd
+ <tschwinge> [executable]« to verify that [€xecutable] will load the correct
+ libraries when invoked.
+ <hacklu> I will try that, and I can't find this call
+ pthread_cond_broadcast(). which will called in the proc_mark_stop
+ <tschwinge> hacklu: Oh, right, you'll also need to add libpthread (I think
+ that's the directory name?) to the rpath and -L commands.
+ <hacklu> is libpthread a part of glibc or hurd?
+ <pinotree> glibc
+ <NlightNFotis> hacklu: it is a different repository available here
+ http://git.savannah.gnu.org/cgit/hurd/libpthread.git/
+ <hacklu> tschwinge: thanks for that, but I don't think I need help about
+ the comiler error now, it just say missing some C file. I will look into
+ the Makefile to verify.
+ <NlightNFotis> but I think it's a part of glibc as a whole
+ <tschwinge> hacklu: OK.
+ <tschwinge> glibc is/was a stand-alone package and library, but in Debian
+ GNU/Hurd is nowadays integrated into glibc's build process.
+ <hacklu> NlightNFotis: thanks. I only add hurd, glibc, gdb,mach code to my
+ cscope file. seems need to add libpthread.
+ <tschwinge> hacklu: If you use the Debian glibc package, our libpthread
+ will be in the libpthread subdirectory.
+ <tschwinge> Ignore nptl, which is used for the Linux kernel.
+ <hacklu> tschwinge:BTW, I have found that, to continue the inferior from a
+ breakpoint, doesn't need to call msg_sig_post_untraced. just call
+ thread_abort and thread_resume is already ok.
+ <hacklu> I get the glibc from http://git.savannah.gnu.org/cgit/hurd.
+ <tschwinge> hacklu: That sounds about right, because you want the inferior
+ to continue normally, instead of explicitly sending a (Unix) signal to
+ it.
+ <tschwinge> hacklu: I suggest you use: »apt-get source eglibc« on your Hurd
+ system.
+ <tschwinge> hacklu: The Savannah repository does not yet have libpthread
+ integrated. I have this on my TODO list...
+ <hacklu> tschwinge: no, apt-get source doesn't work in my Hurd. I got any
+ code from git clone ***
+ <pinotree> you most probably lack the deb-src entry in your sources.list
+ <tschwinge> hacklu: Do you have deb-src lines in /etc/apt/source-list? Or
+ how does it fail?
+ <hacklu> tschwinge: I have deb-src lines. and apt-get complain that: E:
+ Unable to find a source package for eglibc or E: Unable to find a source
+ package for glibc
+ <youpi> hacklu: which deb-src lines do you have?
+ <hacklu> and piece of my source_list : deb
+ http://ftp.debian-ports.org/debian unreleased main deb-src
+ http://ftp.debian-ports.org/debian unreleased main
+ <youpi> you also need a deb-src line with the main archive
+ <youpi> deb-src http://cdn.debian.net/debian unstable main
+ <tschwinge> hacklu: Oh, hmm. And you did run »apt-get update« before?
+ That aside, there also is <http://snapshot.debian.org/package/eglibc/>
+ that you can use. You'll need the *.dsc and *.debian.tar.xz files
+ corresponbding to your version of glibc, and the *.orig.tar.xz file. And
+ then run »dpkg-source -x *.dsc«.
+ <tschwinge> The Debian snapshot is often very helpful if you need source
+ packages that are no longer in the main Debian repository.
+ <youpi> or simply running dget on the dsc url
+ <tschwinge> Oh. Good to know.
+ <youpi> e.g. dget
+ http://cdn.debian.net/debian/pool/main/e/eglibc/eglibc_2.17-7.dsc
+ <hacklu> the network is slowly. and I am in apt-get update.
+ <youpi> I will be away from this evening until sunday, too
+ <hacklu> what the main difference between the source site?
+ <hacklu> is dget means wget?
+ <pinotree> no
+ <hacklu> not exist in linux?
+ <pinotree> it does, in devscripts
+ <pinotree> it's a debian tool
+ <hacklu> oh, yes, I have installed devscripts.
+ <hacklu> I have got the libphread code, thanks.
+
+ <braunr> teythoon: the simple fact that this msg thread exists to receive
+ requests and that these requests are sent by ps and procfs is a potential
+ DoS
+ <teythoon> braunr: but does that mean that on Hurd a process can prevent a
+ debugger from intercepting signals?
+ <braunr> teythoon: yes
+ <braunr> that's not a problem for interactive programs
+ <braunr> it's part of the hurd design that programs have limited trust in
+ each other
+ <braunr> a user can interrupt his debugger if he sees no activity
+ <braunr> that's more of a problem for non interactive system stuff like
+ init scripts
+ <braunr> or procfs
+ <hacklu> why gdb can't get inferior's signal if attach a running process?
+ <braunr> hacklu: try to guess
+ <hacklu> braunr: it is not a reasonable thing. I always think it should
+ catch the signal.
+ <braunr> hacklu: signals are a unix thing built on top of mach
+ <braunr> hacklu: think in terms of ports
+ <braunr> all communication on the hurd goes through ports
+ <hacklu> but when use gdb to start a process and debugg it, this way, gdb
+ can catch the signal
+ <braunr> hacklu: my guess is :
+ <braunr> when starting a process, gdb can act as a proxy, much like
+ rpctrace
+ <braunr> when attaching, it can't
+ <hacklu> braunr: ah, my question should ask like this: why gdb can't set
+ the inferior as its child process when attaching it? or it can not ?
+ <braunr> hacklu: i'm not sure, the proc server is one of the parts i know
+ the less
+ <braunr> but again, i guess there is no facility to update the msg port of
+ a process in the proc server
+ <braunr> check that before taking it as granted
+ <hacklu> braunr: aha, I alway think you know everything:)
+ <tschwinge> braunr: There is: setmsgport or similar.
+ <braunr> if there is one, gdb doesn't use it
+ <tschwinge> hacklu: That is a good question -- I can't answer it off-hand,
+ but it might be possible (by setting the tracing flag, and such things).
+ Perhaps it's just a GDB bug, which omits to do that. Perhaps just a
+ one-line code change, perhaps not. That's a new bug (?) report that we
+ may want to have a look at later on.
+ <tschwinge> hacklu: But also note, this new problem is not really related
+ to your gdbserver work -- but of course you're fine to have a look at it
+ if you'd like to.
+ <hacklu> I just to ask for whether this is a normal behavior. this is
+ related to my gdbserver work, as gdbserver also need to attach a running
+ process...
+ <braunr> gdbserver can start a process just like gdb does
+ <braunr> you may want to focus on that first
+ <tschwinge> Yes.
+ <tschwinge> Attaching to processes that are already running is, I think,
+ always more complicated compared to the case where GDB/gdbserver has
+ complete control about the inferior right from the beginning.
+ <hacklu> yes, I am only focus on start one. the attach way I haven't
+ research now.
+ <tschwinge> hacklu: That's totally fine. You can just say that attaching
+ to processes is not supported yet.
+ <hacklu> that's sound good:)
+ <tschwinge> Ther will likely be more things in gdbserver that you won't be
+ able to easily support, so it's fine to do it step-by-step.
+ <tschwinge> And then later add more features incrementally.
+ <tschwinge> That's also easier for reviewing the patches.
+
+ <hacklu> and one more question I have ask yestoday. what is the rpctrace
+ output (pid-1) mean?
+ <tschwinge> hacklu: Another thing I can't tell off-hand. I'll try to look
+ it up.
+ <teythoon> hacklu, tschwinge: my theory is that it is in fact an error
+ message, maybe the proc server did not now a pid for the task
+ <braunr> hacklu: utsl
+ <hacklu> tschwinge: for saving your time, I will look the code myself, I
+ don;t think this is a real hard question need you to help me by reading
+ the source code.
+ <tschwinge> teythoon, hacklu: Yes, from a quick inspection it looks like
+ task2pid returning a -1 PID -- but I can't tell yet what that is supposed
+ to mean, if it's an actualy bug, or just means there is no data
+ available, or similar.
+ <hacklu> braunr: utsl??
+ <tschwinge> hacklu: http://www.catb.org/~esr/jargon/html/U/UTSL.html
+ <hacklu> tschwinge: thank you. braunr like say abbreviation which I can't
+ google out.
+ <tschwinge> hacklu: Again, if this affects your work, it is fine to have a
+ look at that presumed rpctrace problem, if not, it is fine to have a look
+ at it if you'd like to, and otherwise, we'll file it as a possible bug to
+ be looked at laster.
+ <tschwinge> hacklu: Now you learned that one. :-)
+ <hacklu> tschwinge: ok , this doesn't affect me now. If I have time I will
+ figure out it.
+
+ <teythoon> btw, what about the copyright assignment process?
+ <tschwinge> teythoon, hacklu: You still haven't heard from the FSF about
+ your copyright assignments? What's the latest you have heard?
+ <hacklu> tschwinge: I have wrote a emali to ask for that, but no reply.
+ <teythoon> tschwinge: last and only response I got was on July 1st, the
+ last ping with explicit request for confirmation was on July the 12th
+ <tschwinge> hacklu: When did you send this email?
+ <hacklu> tschwinge: last week.
+ <tschwinge> teythoon: I suggest you send another inquiry, and please put me
+ in CC. And if there'S no answer within a couple days (well, I'm away
+ until Monday...), I'll follow up.
+ <tschwinge> hacklu: Likewise for you; depending on when exactly ;-) you
+ sent the last email. (Always allow for a few days until you exect an
+ answer, but if nothing happend within a week for such rather simple
+ administrative tasks, better ask again, unfrotunately.)
+ <hacklu> tschwinge:ok , I will email more
+
+ <hacklu> how to understand the asyn RPC?
+ <braunr> hacklu: hm ?
+ <hacklu> for instance, [hurd]/proc/main.c proc_server is loop in listening
+ message. and handle it by message_demuxer.
+ <hacklu> but when I send a request like proc_wait_request() to it, will it
+ block in the message_demuxer?
+ <hacklu> and where is the function of
+ ports_manage_port_operations_multithread()?
+ <braunr> this one is in libports
+ <braunr> it's the last thing a server calls after bootstrapping itself
+ <braunr> message_demuxer normally blocks, yes
+ <braunr> but it's not "async"
+ <hacklu> the names seems the proc_server is listening message with many
+ threads?
+ <braunr> every server in the hurd does
+ <braunr> threads are created by ports_manage_port_operations_multithread
+ when incoming messages can't be processed quick enough by the set of
+ already existing threads
+ <hacklu> if too many task send request to the server, will it ddos?
+ <braunr> yes
+ <teythoon> every server but /hurd/init
+ <braunr> (and /hurd/hello)
+ <braunr> hacklu: that's, in my opinion, a major design defect
+ <hacklu> yes, that is reasonable.
+ <braunr> that's what causes what i like to call thread storms on message
+ floods ... :)
+ <braunr> my hurd clone is intended to address such major issues
+ <teythoon> couldn't that be migitated by some kind of heuristic?
+ <braunr> it already is ..
+ <hacklu> I don't image that the port_manage_port_operations_multithread
+ will dynamically create threads. I thought the server will hang if all
+ work thread is in use.
+ <braunr> that would also be a major defect
+ <braunr> creating as many threads as necessary is a good thing
+ <braunr> the problem is the dos
+ <braunr> hacklu: btw, ddos is "distributed" dos, and it doesn't really
+ apply to what can happen on the hurd
+ <hacklu> why not ? as far as I known, the message transport is
+ transparent. hurd has the chance to be DDOSed
+ <braunr> we don't care about the distributed property of the dos
+ <hacklu> oh, I know what you mean.
+ <braunr> it simply doesn't matter
+ <braunr> on thread calling select in an event loop with a low timeout (high
+ frequency) on a bunch of file descriptors is already enough to generate
+ many dead-name notifications
+ <tschwinge> Oh! Based on what I've read in GDB source code, I thought the
+ proc server was single-threaded. However, it no longer is, after 1996's
+ Hurd commit fac6d9a6d59a83e96314103b3181f6f692537014.
+ <braunr> those notifications cause message flooding at servers (usually
+ pflocal/pfinet), which spawn a lot of threads to handle those messages
+ <braunr> one* thread
+ <hacklu> tschwinge: ah, the comment in gnu_nat.c is out of date!
+ <braunr> hacklu: and please, please, clean the hello_world processes you're
+ creating on darnassus
+ <braunr> i had to do it myself again :/
+ <hacklu> braunr: [hacklu@darnassus ~]$ ps ps: No applicable processes
+ <braunr> ps -eflw
+ <braunr> htop
+ <tschwinge> hacklu: Probably the proc_wait_pid and proc_waits_pending stuff
+ could be simplified then? (Not an urgent issue, of course, will file as
+ an improvement for later.)
+ <hacklu> braunr: ps -eflw |grep hacklu
+ <hacklu> 1038 12360 10746 26 26 2 87 22 148M 1.06M 97:21001 S
+ p1 0:00.00 grep --color=auto hacklu
+ <braunr> 15:08 < braunr> i had to do it myself again :/
+ <teythoon> braunr: so as a very common special case, a lot of dead name
+ notifications cause problems for pf*?
+ <braunr> and use your numeric uid
+ <braunr> teythoon: yes
+ <hacklu> braunr: I am so sorry. I only used ps to check. forgive me
+ <braunr> teythoon: simply put, a lot of messages cause problems
+ <braunr> select is one special use case
+ <teythoon> braunr: blocking other requests?
+ <braunr> the other is page cache writeback
+ <braunr> creating lots of threads
+ <braunr> potentially deadlocking on failure
+ <braunr> and in the case of writebacks, simply starving
+ <teythoon> braunr: but dead name notifications should mostly trigger
+ cleanup actions, couldn't those be handled by a different thread(pool)
+ than the rest?
+ <braunr> that's why you can bring down a hurd system with a simple cp
+ bigfile somewhere, bigfile being a few hundreds MiBs
+ <braunr> teythoon: it doesn't change the problem
+ <braunr> threads are per task
+ <braunr> and the contention would remain the same
+ <teythoon> hm
+ <braunr> since dead-name notifications are meant to release resources
+ created by what would then be "regular" threads
+ <braunr> don't worry, there is a solution
+ <braunr> it's simple
+ <braunr> it's well known
+ <braunr> it's just hard to directly apply to the hurd
+ <braunr> and impossible to enforce on mach
+ <hacklu> tschwinge: I am confuzed after I have look into S_proc_wait()
+ [hurd/proc/wait.c], it has relate pthread_hurd_cond_wait_np. I can't find
+ out when it will return. And the signal is report to the debuger by
+ S_proc_wait.
+ <teythoon> braunr: a pointer please ;)
+ <braunr> teythoon: basically, synchronous ipc
+ <braunr> then, enforcing one server thread per client thread
+ <braunr> and replace mach-generated notifications with messages sent from
+ client threads
+ <braunr> the only kind of notification required by the hurd are no-senders
+ notifications
+ <braunr> this happens when a client releases all references it has to a
+ resource
+ <braunr> so it's easy to make that synchronous as well
+ <braunr> trying to design RPCs as closely as system calls on monolithic
+ kernels helps in viewing how this works
+ <braunr> the only real additions are address space crossing, and capability
+ invocation
+ <teythoon> sounds reasonable, why is it hard to apply to the hurd? most
+ rpcs are synchonous, no?
+ <braunr> mach ipc isn't
+ <hacklu> braunr: When client C send a request to server S, but doesn't wait
+ for the reply message right now, for a while, C call mach_msg to recieve
+ reply. Can I think this is a synchronous RPC?
+ <braunr> a malicious client can still overflow message queues
+ <braunr> hacklu: no
+ <teythoon> yes, I can see how this is impossible to enforce, but still we
+ could all try to play nice :)
+ <braunr> teythoon: no
+ <braunr> :)
+ <braunr> async ipc is heavy, error-prone, less performant than sync ipc
+ <braunr> some async ipc is necessary to handle asynchronous events, but
+ something like unix signals is actually a lot more appropriate
+ <braunr> we're diverging from the gsoc though
+ <braunr> don't waste too much time on that
+ <teythoon> 15:13 < braunr> it's just hard to directly apply to the hurd
+ <teythoon> I wont
+ <teythoon> why is it hard
+ <braunr> almost everything is synchronous on the hurd
+ <braunr> except a few critical bits
+ <braunr> signals :)
+ <braunr> and select
+ <braunr> and pagecache writebacks
+ <braunr> fixing those parts require some work
+ <braunr> which isn't trivial
+ <braunr> for example, select should be rewritten not to use dead-name
+ notifications
+ <teythoon> adding a light weight signalling mechanism to mach and using
+ that instead of async ipc?
+ <braunr> instead of destroying ports once an event has been received, it
+ should (synchyronously) remove the requests installed at remote servers
+ <braunr> uh no
+ <braunr> well maybe but that would be even harder
+ <tschwinge> hacklu: This (proc/wait.c) is related to POSIX thread
+ cancellation -- I don't think you need to be concerned about that. That
+ function's "real" exit points are earlier above.
+ <braunr> teythoon: do you understand what i mean about select ?
+ <teythoon> ^^ is that a no go area?
+ <braunr> for now it is
+ <braunr> we don't want to change the mach interface too much
+ <teythoon> yes, I get the point about select, but I haven't looked at its
+ implementation yet
+ <hacklu> tschwinge: when I want to know the child task's state, I call
+ proc_wait_request(), unless the child's state not change. the
+ S_proc_wait() will not return?
+ <braunr> it creates ports, puts them in a port set, gives servers send
+ rights so they can notify about events
+ <teythoon> y not? it's not that hurd is portable to another mach, or is it?
+ and is there another that we want to be compatible with?
+ <braunr> when an event occurs, all ports are scanned
+ <braunr> then destroyed
+ <braunr> on destruction, servers are notified by mach
+ <braunr> the problem is that the client is free to continue and make more
+ requests while existing select requests are still being cancelled
+ <teythoon> uh, yeah, that sounds like a costly way of notifying somewone
+ <braunr> the cost isn't the issue
+ <braunr> select must do something like that on a multiserver system, you
+ can't do much about it
+ <braunr> but it should be synchronous, so a client can't make more requests
+ to a server until the current select call is complete
+ <braunr> and it shouldn't use a server approach at the client side
+ <braunr> client -> server should be synchronous, and server -> client
+ should be asynchronous (e.g. using a specific SIGSELECT signal like qnx
+ does)
+ <braunr> this is a very clean way to avoid deadlocks and denials of service
+ <teythoon> yes, I see
+ <braunr> qnx actually provides excellent documentation about these issues
+ <braunr> and their ipc interface is extremely simple and benefits from
+ decades of experience on the subject
+ <tschwinge> hacklu: This function implements the POSIX wait call, and per
+ »man 2 wait«: »The wait() system call suspends execution of the calling
+ process until one of its children terminates.«
+ <tschwinge> hacklu: This is implemented in glibc in sysdeps/posix/wait.c,
+ sysdeps/unix/bsd/bsd4.4/waitpid.c, sysdeps/mach/hurd/wait4.c, by invoking
+ this RPC synchronously.
+ <tschwinge> hacklu: GDB on the other hand, uses this infrastructure (as I
+ understand it) to detect (that is, to be informed) when a debuggee exits
+ (that is, when the inferior process terminates).
+ <tschwinge> hacklu: Ah, so maybe I miss-poke earlier: the
+ pthread_hurd_cond_wait_np implements the blocking. And depending on its
+ return value the operation will be canceled or restarted (»start_over«).
+ <tschwinge> s%maybe%%
+ <tschwinge> hacklu: Does this information help?
+ <hacklu> tschwinge: proc_wait_request is not only to detect the inferior
+ exit. it also detect the child's state change
+ <braunr> as tschwinge said, it's wait(2)
+ <hacklu> tschwinge: and I have see this, when kill a signal to inferior,
+ the gdb will get the message id=24120 which come from S_proc_wait
+ <hacklu> braunr: man 2 wait says: wait, waitpid, waitid - wait for process
+ to change state. (in linux, in hurd there is no man wait)
+ <braunr> uh
+ <braunr> there is, it's the linux man page :)
+ <braunr> make sure you have manpages-dev installed
+ <hacklu> I always think we are talk about linux's manpage :/
+ <hacklu> but regardless the manpage, gdb really call proc_wait_request() to
+ detect whether inferior's changed states
+ <braunr> in any case, keep in mind the hurd is intended to be a posix
+ system
+ <braunr> which means you can always refer to what wait is expected to do
+ from the posix spec
+ <braunr> see
+ http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html
+ <hacklu> braunr: even in the manpags under hurd, man 2 wait also says: wait
+ for process to change state.
+ <braunr> yes
+ <braunr> that's what it's for
+ <braunr> what's the problem ?
+ <hacklu> the problem is what tschwinge has said I don't understand. like
+ and per »man 2 wait«: »The wait() system call suspends execution of the
+ calling process until one of its children terminates.«
+ <braunr> terminating is a form of state change
+ <braunr> historically, wait was intended to monitor process termination
+ only
+ <hacklu> so the thread become stoped wait also return
+ <braunr> afterwards, process tracing was added too
+ <braunr> what ?
+ <hacklu> so when the child state become stopped, the wait() call will
+ return?
+ <braunr> yes
+ <hacklu> and I don't know this pthread_hurd_cond_wait_np.
+ <braunr> wait *blocks* until the process it references changes state
+ <braunr> pthread_hurd_cond_wait_np is the main blocking function in hurd
+ servers
+ <braunr> well, pthread_hurd_cond_timedwait_np actually
+ <braunr> all blocking functions end up there
+ <braunr> (or in mach_msg)
+ <braunr> (well pthread_hurd_cond_timedwait_np calls mach_msg too)
+ <hacklu> since I use proc_wait_request to get the state change, so the
+ thread in proc_server will be blocked, not me. is that right?
+ <braunr> no
+ <braunr> both
+ <hacklu> this is just a request, why should block me?
+ <braunr> because you're waiting for the reply afterwards
+ <braunr> or at least, you should be
+ <braunr> again, i'm not familiar with those parts
+ <hacklu> after call proc_wait_request(), gdb does a lot stuffs, and then
+ call mach_msg to recieve reply.
+ <braunr> ok
+ <hacklu> I think it will be blocked only in mach_msg() if need.
+ <braunr> usually, xxx_request are the async send-only versions of RPCs
+ <tschwinge> Yes, that'S my understanding too.
+ <braunr> and xxx_reply the async receive-only
+ <braunr> so that makes sense
+ <hacklu> so I have ask you is it a asyn RPC.
+ <braunr> yes
+ <braunr> 15:18 < hacklu> braunr: When client C send a request to server S,
+ but doesn't wait for the reply message right now, for a while, C call
+ mach_msg to recieve reply. Can I think this is a synchronous RPC?
+ <braunr> 15:19 < braunr> hacklu: no
+ <braunr> if it's not synchronous, it's asynchronous
+ <hacklu> sorry, I spell wrong. missing a 'a' :/
+ <tschwinge> S_proc_wait_reply will then be invoked once the procserver
+ actually answers the "blocking" proc_wait call.
+ <tschwinge> Putting "blocking" in quotes, because (due to the asyncoronous
+ RPC invocation), GDB has not actually blocked on this.
+ <braunr> well, it doesn't call proc_wait
+ <hacklu> tschwinge: yes, the S_proc_wait_reply is called by
+ process_reply_server().
+ <hacklu> tschwinge: so the "blocked" one is the thread in proc_server .
+ <tschwinge> braunr: Right. »It requests the proc_wait service.«
+ <braunr> gdb will also block on mach_msg
+ <braunr> 16:05 < braunr> both
+ <hacklu> braunr: yes, if gdb doesn't call mach_msg to recieve reply it will
+ not be blocked.
+ <braunr> i expect it will always call mach_msg
+ <braunr> right ?
+ <hacklu> braunr: yes, but before it call mach_msg, it does a lot other
+ things. but finally will call mach_msg
+ <braunr> that's ok
+ <braunr> that's the kind of things asynchronous IPC allows
+ <hacklu> tschwinge: I have make a mistake in my week report. The signal
+ recive by inferior is notified by the proc_server, not the
+ send_signal. Because the send_singal send a SIGCHLD to gdb's msgport not
+ gdbself. That make sense.
+
+
+# IRC, freenode, #hurd, 2013-07-30
+
+ <hacklu> braunr: before I go to sleep last night, this question pop into my
+ mind. How do you find my hello_world is still alive on darnassus? The
+ process is not a CPU-heavy or IO-heavy guy. You will not feel any
+ performance penalization. I am so curious :)
+ <teythoon> hacklu: have you looked into patching the proc server to allow
+ reparenting of processes?
+ <hacklu> teythoon:not yet
+ <teythoon> hacklu: i've familiarized myself with proc in the last week,
+ this should get you started nicely: http://paste.debian.net/19985/
+ diff --git a/proc/mgt.c b/proc/mgt.c
+ index 7af9c1a..a11b406 100644
+ --- a/proc/mgt.c
+ +++ b/proc/mgt.c
+ @@ -159,9 +159,12 @@ S_proc_child (struct proc *parentp,
+ if (!childp)
+ return ESRCH;
+
+ + /* XXX */
+ if (childp->p_parentset)
+ return EBUSY;
+
+ + /* XXX if we are reparenting, check permissions. */
+ +
+ mach_port_deallocate (mach_task_self (), childt);
+
+ /* Process identification.
+ @@ -176,6 +179,7 @@ S_proc_child (struct proc *parentp,
+ childp->p_owner = parentp->p_owner;
+ childp->p_noowner = parentp->p_noowner;
+
+ + /* XXX maybe need to fix refcounts if we are reparenting, not sure */
+ ids_rele (childp->p_id);
+ ids_ref (parentp->p_id);
+ childp->p_id = parentp->p_id;
+ @@ -183,11 +187,14 @@ S_proc_child (struct proc *parentp,
+ /* Process hierarchy. Remove from our current location
+ and place us under our new parent. Sanity check to make sure
+ parent is currently init. */
+ - assert (childp->p_parent == startup_proc);
+ + assert (childp->p_parent == startup_proc); /* XXX */
+ if (childp->p_sib)
+ childp->p_sib->p_prevsib = childp->p_prevsib;
+ *childp->p_prevsib = childp->p_sib;
+
+ + /* XXX we probably want to keep a reference to the old
+ + childp->p_parent around so that if the debugger dies or detaches,
+ + we can reparent the process to the old parent again */
+ childp->p_parent = parentp;
+ childp->p_sib = parentp->p_ochild;
+ childp->p_prevsib = &parentp->p_ochild;
+ <teythoon> the code doing the reparenting is already there, but for now it
+ is only allowed to happen once at process creation time
+ <hacklu> teythoon: good job. This is in my todo list, when I implement
+ attach feature to gdbserver I will need this
+ <braunr> hacklu: i use htop
+ <teythoon> braunr: why is that process so disruptive?
+ <braunr> the big problem with those stale processes is that they're in a
+ state that prevents one important script to complete
+ <braunr> there is a bug on the hurd with regard to terminals
+ <braunr> when you log out of an ssh session, the terminal remains open for
+ some reason (bad reference counting somewhere, but it's quite tricky to
+ identify)
+ <braunr> to work around the issue, i have a cron job that calls a script to
+ kill unused terminals
+ <braunr> this works by listing processes
+ <braunr> your hello_world processes block that listing
+ <teythoon> uh, how so?
+ <hacklu> braunr: ok. I konw.
+ <braunr> teythoon: probably the denial of service we were talking about
+ yesterday
+ <teythoon> select flooding a server?
+ <braunr> no, a program refusing to answer on its msg port
+ <braunr> ps has an option -M :
+ <braunr> -M, --no-msg-port Don't show info that uses a process's
+ msg port
+ <braunr> the problem is that my script requires those info
+ <teythoon> ah, I see, right
+ <braunr> hacklu being working on gdb, it's not surprising he's messing with
+ that
+ <teythoon> yes indeed. couldn't ps use a timeout to detect that?
+ <hacklu> braunr: yes, once I have found ps will hang when I has run
+ hello_world in a breakpoint state.
+ <teythoon> braunr: thanks for explaining the issue, i always wondered why
+ that process is such big a deal ;)
+ <braunr> teythoon: how do you tell between processes being slow to answer
+ and intentionnally refusing to answer ?
+ <braunr> a timeout is almost never the right solution
+ <braunr> sometimes it's the only solution though, like for networking
+ <braunr> but on a system running on a local machine, there is usually
+ another way
+ <teythoon> braunr: I don't of course
+ <braunr> ?
+ <braunr> ah ok
+ <braunr> it was rethorical :)
+ <teythoon> yes I know, and I was implying that I wasn't expecting a timeout
+ to be the clean solution
+ <teythoon> and the current behaviour is hardly acceptable
+ <braunr> i agree
+ <braunr> it's ok for interactive cases
+ <braunr> you can use Ctrl-C, which uses a 3 seconds delay to interrupt the
+ client RPC if nothing happens
+ <teythoon> braunr: btw, what about *_reply.defs? Should I add a
+ corresponding reply simpleroutine if I add a routine?
+ <braunr> normally yes
+ <braunr> right, forgot about that
+ <teythoon> so that the procedure ids are kept in sync in case one wants to
+ do this async at some point in the future?
+ <braunr> yes
+ <braunr> this happened with select
+ <braunr> i had to fix the io interface
+ <teythoon> ok, noted
+
+
+# IRC, freenode, #hurd, 2013-07-31
+
+ <hacklu> Do we need write any other report for the mid-evaluation? I have
+ only submit a question-answer to google.
+
+
+# IRC, freenode, #hurd, 2013-08-05
+
+ <hacklu> hi, this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report7build-gdbserver-on-gnuhurd-164/
+ <hacklu> youpi: can you show me some suggestions about how to design the
+ interface and structure of gdbserver?
+ <youpi> hacklu: well, I've read your blog entry, I was wondering about
+ tschwinge's opinion, that's why I asked whether he was here
+ <youpi> I would tend to start from an existing gdbserver, but as I haven't
+ seen the code at all, I don't know how much that can help
+ <hacklu> so you mean I shoule get a worked gdbserver then to improve it?
+ <youpi> I'd say so, but again it's not a very strong opinion
+ <youpi> I'd rather let tschwinge comment on this
+ <hacklu> youpi: ok :)
+
+ <youpi> how about the copyright assignments? did hacklu or teythoon receive
+ any answer?
+ <teythoon> youpi: I did, the copyright clerk told me that he finally got my
+ papers and that everything is in order now
+ <youpi> few!
+ <youpi> s/f/ph
+ <youpi> teythoon: you mean all steps are supposed to be done now, or is he
+ doing the last steps? I don't see your name in the copyright folder yet
+ <teythoon> youpi: well, he said that he had the papers and they are about
+ to be signed
+ <youpi> teythoon: ok, so it's not finished, that's why your name is not on
+ the list yet
+ <youpi> this paper stuff is really a pain
+ <hacklu> youpi: I haven't got any answer from FSF now.
+ <youpi> did you ping them recently?
+ <hacklu> I have pinged 2 week ago.
+ <hacklu> what you mean of ping? I just write an email to him. Is it enough?
+ <youpi> yes
+
+
+# IRC, freenode, #hurd, 2013-08-12
+
+ <hacklu> hi, this is my weekly report
+ http://hacklu.com/blog/gsoc-weekly-report8-168/ . sorry for so late.
+
+ <youpi> hacklu: it seems we misunderstood ourselves last week, I meant to
+ start from the existing gdbserver implementation
+ <youpi> but never mind :)
+ <youpi> starting from the lynxos version was a good idea
+ <hacklu> youpi: em... yeah, the lynxos port is so clean and simple.
+
+ <hacklu> youpi: aha, the "Remote connection closed" problem has been fixed
+ after I add a init_registers_i386() and set the structure target_desc.
+ <hacklu> but I don't get understand with the structure target_desc. I only
+ know it is auto-generated which configured by the configure.srv.
+ <tschwinge> Hi!
+ <tschwinge> hacklu: In gdbserver, you should definitely re-use existing
+ infrastructure, especially anything that deals with the
+ protocol/communication with GDB (that is, server.c and its support
+ files).
+ <tschwinge> hacklu: Then, for the x86 GNU Hurd port, it should be
+ implemented in the same way as an existing port. The Linux port is the
+ obvious choice, of course, but it is also fine to begin with something
+ simpler (like the LynxOS port you've chosen), and then we can still add
+ more features later on. That is a very good approach actually.
+ <tschwinge> hacklu: The x86 GNU Hurd support will basically consist of
+ three pieces -- exactly as with GDB's native x86 GNU Hurd port: x86
+ processor specific (tge existing gdbserver/i386-low.c etc. -- shouldn't
+ need any modifications (hopefully)), GNU Hurd specific
+ (gdbserver/gnu-hurd-low.c (or similar)), and x86 GNU Hurd specific
+ (gdbserver/gnu-hurd-x86-low.c (or similar)).
+ <tschwinge> s%tge%the
+ <hacklu> tschwinge: now I have only add a file named gnu-low.c, I should
+ move some part to the file gnu-i386-low.c I think.
+ <tschwinge> hacklu: That's fine for the moment. We can move the parts
+ later (everything with 86 in its name, probably).
+ <hacklu> that's ok.
+ <hacklu> tschwinge: Can I copy code from gnu-nat.c to
+ gdbserver/gnu-hurd-low.c? I think the two file will have many same code.
+ <tschwinge> hacklu: That's correct. Ideally, the code should be shared
+ (for example, in a file in common/), but that's an ongoing discussion in
+ GDB, for other duplicated code. So, for the moment, it is fine to copy
+ the parts you need.
+ <tschwinge> hacklu: Oh, but it may be a good idea to add a comment to the
+ source code, where it is copied from.
+ <hacklu> maybe I can do a common-part just for hurd gdb port.
+ <tschwinge> That should make it easier later on, to consolidate the
+ duplicated code into one place.
+ <tschwinge> Or you can do that, of course. If it's not too difficult to
+ do?
+ <hacklu> I think at the begining it is not difficult. But when the
+ gdbserver code grow, the difference with gdb is growing either. That will
+ be too many #if else.
+ <tschwinge> I think we should check with the GDB maintainers, what they
+ suggest.
+ <tschwinge> hacklu: Please send an email To: <gdb@sourceware.org> Cc:
+ <lgustavo@codesourcery.com>, <thomas@codesourcery.com>, and ask about
+ this: you need to duplicate code that already exists in gnu-nat.c for new
+ gdbserver port -- how to share code?
+ <hacklu> tschwinge: ok, I will send the email right now.
+ <hacklu> tschwinge: need I cc to hurd mail-list?
+ <tschwinge> hacklu: Not really for that questions, because that is a
+ question only relevant to the GDB source code itself.
+ <hacklu> tschwinge: got it.
+
+[[!message-id
+"CAB8fV=jzv_rPHP3-HQVBA-pCNZNat6PNbh+OJEU7tZgQdKX3+w@mail.gmail.com"]].
+
+
+# IRC, freenode, #hurd, 2013-08-19
+
+<http://hacklu.com/blog/gsoc-weekly-report9-172/>.
+
+ <hacklu__> when and where is the best time and place to get the regitser
+ value in gdb?
+ <youpi> well, I'm not sure to understand the question
+ <youpi> you mean in the gdb source code, right?
+ <youpi> isn't it already done in gdb?
+ <youpi> probably similarly to i386?
+ <youpi> (linux i386 I mean)
+ <hacklu__> I don't find the fetch_register or relate function implement in
+ gnu-nat.c
+ <hacklu__> so I can't make decision how to implement this in gdbserver.
+ <youpi> it's in i386gnu-nat.c, isn't it?
+ <hacklu__> yeah.
+ <youpi> does that answer your issue?
+ <hacklu__> thank you. I am so stupid
+
+
+# IRC, freenode, #hurd, 2013-08-26
+
+ < hacklu> hello everyone, this is my week
+ report. http://hacklu.com/blog/gsoc-weekly-report10-174/
+
+ < hacklu> btw, my FSF copyright assignment has been concepted. They guy
+ said, they have recived my mail for a while but forget to handle it.
+
+ < hacklu> but now I face a new problem, when I typed the first continue
+ command, gdb will continue all the breakpoint, and the inferior will run
+ until normally exit.
+
+
+# IRC, freenode, #hurd, 2013-08-30
+
+ <hacklu> tschwinge: hi, does gdb's attach feature work correctlly on Hurd?
+ <hacklu> on my hurd-box, the gdb can't attach to a running process, after a
+ attaching, when I continue, gdb complained "can't find pid 12345"
+ <teythoon> hacklu: attaching works, not sure why gdb is complaining
+ <hacklu> teythoon: yeah, it can attaching, but can't contine process.
+ <hacklu> in this case, the debugger is useless if it can't resume execution
+ <teythoon> hacklu: well, gdb on Linux reacts a little differently, but for
+ me attaching and then resuming works
+ <hacklu> teythoon: yes, gdb on linux works well.
+ <teythoon> % gdb --pid 21506 /bin/sleep
+ <teythoon> [...]
+ <teythoon> (gdb) c
+ <teythoon> Continuing.
+ <teythoon> warning: Can't wait for pid 21506: No child processes
+ <teythoon> # pkill -SIGILL sleep
+ <teythoon> warning: Pid 21506 died with unknown exit status, using SIGKILL.
+ <hacklu> yes. I used a sleep program to test too.
+ <teythoon> I believe that the warning and deficiencies with the signal
+ handling are b/c on Hurd the debuggee cannot be reparented to the
+ debugger
+ <hacklu> oh, I remembered, I have asked this before.
+ <tschwinge> Confirming that attaching to a process in __sleep -> __mach_msg
+ -> mach_msg_trap works fine, but then after »continue«, I see »warning:
+ Can't wait for pid 4038: No child processes« and three times »Can't fetch
+ registers from thread bogus thread id 1: No such thread« and the sleep
+ process exits (normally, I guess? -- interrupted "system call").
+ <tschwinge> If detaching (exit GDB) instead, I see »warning: Can't modify
+ tracing state for pid 4041: No such process« and the sleep process exits.
+ <tschwinge> Attaching to and then issueing »continue« in a process that is
+ not currently in a mach_msg_trap (tested a trivial »while (1);«) seems to
+ work.
+ <tschwinge> hacklu: ^
+ <hacklu> tschwinge: in my hurdbox, if I just attach a while(1), the system
+ is near down. nothing can happen, maybe my hardware is slow.
+ <hacklu> so I can only test on the sleep one.
+ <hacklu> my gdbserver doesn't support attach feature now. the other basic
+ feather has implement. I am doing test and review the code now.
+ <tschwinge> Great! :-)
+ <tschwinge> It is fine if attaching does not work currently -- can be added
+ later.
+ <hacklu> btw, How can I submit my code? put the patch in email directly?
+ <tschwinge> Did you already run the GDB testsuite using your gdbserver?
+ <hacklu> no, haven't yet
+ <tschwinge> Either that, or a Git branch to pull from.
+ <hacklu> I think I should do more review and test than I submit patches.
+ <tschwinge> hacklu: See [GDB]/gdb/testsuite/boards/native-gdbserver.exp
+ (and similar files) for how to run the GDB testsuite with gdbserver.
+ <hacklu> ok.
+ <tschwinge> But don't be disappointed if there are still a lot of failures,
+ etc. It'll already be great if some basic stuff works.
+ <hacklu> now it can set and remove breakpoint. show register, access
+ variables.
+ <tschwinge> ... which already is enogh for a lot of debugging sessions.
+ :-)
+ <hacklu> I will continue to make it more powerful.
+ <hacklu> :)
+ <tschwinge> Yes, but please first work on polishing the existing code, and
+ get it integrated upstream. That will be a great milestone.
+ <tschwinge> No doubt that GDB maintainers will have lots of comments about
+ proper formatting of the source code, and such things. Trivial, but will
+ take time to re-work and get right.
+ <hacklu> oh, I got it. I will give my pathch before this weekend.
+ <tschwinge> Then once your basic gdbserver is included, you can continue to
+ implement additional features, piece by piece.
+ <tschwinge> And then we can run the GDB testsuite with gdbserver and
+ compare that there are no regressions, etc.
+ <tschwinge> Heh, »before the weekend« -- that's soon. ;-)
+ <hacklu> honestly to say, most of the code is copyed from other files, I
+ haven't write too many code myself.
+ <tschwinge> Good -- this is what I hoped. Often, more time in software
+ development is spent on integrating existing things rathen than writing
+ new code.
+ <hacklu> but I have spent a lot of time to get known the code and to debug
+ it to work.
+ <tschwinge> Thzis is normal, and is good in fact: existing code has already
+ been tested and documented (in theory, at least...).
+ <tschwinge> Yes, that's expected too: when relying on/reusing existing
+ code, you first have to understand it, or at least its interfaces. Doing
+ that, you're sort of "mentally writing the existing code again".
+ <tschwinge> So, this sounds all fine. :-)
+ <hacklu> your words make me happy.
+ <hacklu> :)
+ <tschwinge> Well, I am, because this seems to be going well.
+ <hacklu> thank you. I am going to coding now~~
+
+
+# IRC, freenode, #hurd, 2013-09-02
+
+ <hacklu> hi, this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report11-181/
+
+ <hacklu> please give me any advice on how to use mig to generate stub-files
+ in gdbserver?
+ <braunr> hacklu:
+ http://darnassus.sceen.net/gitweb/rbraun/slabinfo.git/blob/HEAD:/Makefile
+ <hacklu> braunr: shouldnt' I work like this
+ https://github.com/hacklu/gdbserver/blob/gdbserver/gdb/config/i386/i386gnu.mh
+ ?
+ <braunr> hacklu: seems that you need server code
+ <braunr> other than that i don't see the difference
+ <hacklu> gdb use autoconf to generate the Makefile, and part from the *.mh
+ file, but in gdbserver, there is no .mh like files.
+ <braunr> hacklu: why can't you reuse /i386gnu.mh ?
+ <hacklu> braunr: question is that, there are something not need in
+ /i386gnu.mh.
+ <braunr> hacklu: like what ?
+ <hacklu> braunr: like fork-child.o msg_U.o core-regset.o
+ <braunr> hacklu: well, adjust the dependencies as you need
+ <braunr> hacklu: do you mean they become useless for gdbserver but are
+ useful for gdb ?
+ <hacklu> braunr: yes, so I need another one gnu.mh file.
+ <hacklu> braunr: but the gdbserver's configure doesn't have any *.mh file,
+ can I add the first one?
+ <braunr> or adjust the values of those variables depending on the building
+ mode
+ <braunr> maybe
+ <braunr> tschwinge is likely to better answer those questions
+ <hacklu> braunr: ok, I will wait for tschwinge's advice.
+ <luisgpm> hacklu, The gdb/config/ dir is for files related to the native
+ gdb builds, as opposed to a cross gdb that does not have any native bits
+ in it. In the latter, gdbserver will be used to touch the native layer,
+ and GDB will only guide gdbserver through the debugging session...
+ <luisgpm> hacklu, In case you haven't figured that out already.
+ <hacklu> luisgpm: I am not very clear with you. According to your words, I
+ shouldn't use gdb/config for gdbserver?
+ <luisgpm> hacklu, Correct. You should use configure.srv for gdbserver.
+ <luisgpm> hacklu, gdb/gdbserver/configure.srv that is.
+ <luisgpm> hacklu, gdb/configure.tgt for non-native gdb files...
+ <luisgpm> hacklu, and gdb/config for native gdb files.
+ <luisgpm> hacklu, The native/non-native separation for gdb is due to the
+ possibility of having a cross gdb.
+ <congzhang> what's srv file purpose?
+ <luisgpm> hacklu, gdbserver, on the other hand, is always native.
+ <luisgpm> Doing the target-to-object-files mapping.
+ <hacklu> how can I use configure.srv to config the MIG to generate
+ stub-files?
+ <luisgpm> What are stub-files in this context?
+ <hacklu> On Hurd, some rpc stub file are auto-gen by MIG with *.defs file
+ <braunr> luisgpm: c source code handling low level ipc stuff
+ <braunr> mig is the mach interface generator
+ <tschwinge> luisgpm, hacklu: If that is still helpful by now, in
+ <http://news.gmane.org/find-root.php?message_id=%3C87ppwqlgot.fsf%40kepler.schwinge.homeip.net%3E>
+ I described the MIG usage in GDB. (Which also states that ptrace is a
+ system call which it is not.)
+ <tschwinge> hacklu: For the moment, it is fine to indeed copy the rules
+ related to MIG/RPC stubs from gdb/config/i386/i386gnu.mh to a (possibly
+ new) file in gdbserver. Then, later, we should work out how to properly
+ share these, as with all the other code that is currently duplicated for
+ GDB proper and gdbserver.
+ <luisgpm> hacklu, tschwinge: If there is code gdbserver and native gdb can
+ use, feel free to put them inside gdb/common for now.
+ <tschwinge> hacklu, luisgpm: Right, that was the conclusion from
+ <http://news.gmane.org/find-root.php?message_id=%3CCAB8fV%3Djzv_rPHP3-HQVBA-pCNZNat6PNbh%2BOJEU7tZgQdKX3%2Bw%40mail.gmail.com%3E>.
+ <hacklu> tschwinge, luisgpm : ok, I got it.
+ <hacklu> tschwinge: sorry for haven't submit pathes yet, I will try to
+ submit my patch tomorrow.
+
+[[!message-id "CAB8fV=iw783uGF8sWyqJNcWR0j_jaY5XO+FR3TyPatMGJ8Fdjw@mail.gmail.com"]].
+
+
+# IRC, freenode, #hurd, 2013-09-06
+
+ <hacklu> If I want compile a file which is not in the current directory,
+ how should I change the Makefile. I have tried that obj:../foo.c, but the
+ foo.o will be in ../, not in the current directory.
+ <hacklu> As say, When I build gdbserver, I want to use [gdb]/gdb/gnu-nat.c,
+ How can I get the gnu-nat.o under gdbserver's directory?
+ <hacklu> tschwinge: ^^
+ <tschwinge> Hi!
+ <tschwinge> hacklu: Heh, unexpected problem.
+ <tschwinge> hacklu: How is this handled for the files that are already in
+ gdb/common/? I think these would have the very same problem?
+ <hacklu> tschwinge: ah.
+ <hacklu> I got it
+ <tschwinge> I see, for example:
+ <tschwinge> ./gdb/Makefile.in:linux-btrace.o:
+ ${srcdir}/common/linux-btrace.c
+ <tschwinge> ./gdb/gdbserver/Makefile.in:linux-btrace.o:
+ ../common/linux-btrace.c $(linux_btrace_h) $(server_h)
+ <hacklu> If I have asked before, I won't use soft link to solve this.
+ <tschwinge> But isn't that what you've been trying?
+ <hacklu> when this, where the .o file go to?
+ <tschwinge> Yes, symlinks can't be used, because they're not available on
+ every (file) system GDB can be built on.
+ <tschwinge> I would assume the .o files to go into the current working
+ directory.
+ <tschwinge> Wonder why this didn't work for you.
+ <hacklu> in gdbserver/configure.srv, there is a srv_tgtobj="gnu_nat.c ..",
+ if I change the Makefile.in, it doesn't gdb's way.
+ <hacklu> So I can't use the variable srv_tgtobj?
+ <tschwinge> That should be srv_tgtobj="gnu_nat.o [...]"? (Not .c.)
+ <hacklu> I have try this, srv_tgtobj="../gnu_nat.c", then the gnu_nat.o is
+ generate in the parent directory.
+ <hacklu> s/.c/.o
+ <hacklu> (wrong input)
+ <hacklu> For my understand now, I should set the srv_tgtobj="", and then
+ set the gnu_nat.o:../gnu_nat.c in the gdbserver/Makefile.in. right?
+ <tschwinge> Hmm, I thought you'd need both.
+ <tschwinge> Have you tried that?
+ <hacklu> no, haven't yet. I will try soon.
+ <hacklu> I have met an strange thing. I have this in Makefile,
+ i386gnu-nat.o:../i386gnu-nat.c $(CC) -c $(CPPFLAGS) $(INTERNAL_CFLAGS) $<
+ <hacklu> When make, it will complain that: no rules for target
+ i386gnu-nat.c
+ <hacklu> but I also have a line gnu-nat.o:../gnu-nat.c ../gnu-nat.h. this
+ works well.
+ <tschwinge> hacklu: Does it work if you use $(srcdir)/../i386gnu-nat.c
+ instead of ../i386gnu-nat.c?
+ <tschwinge> Or similar.
+ <hacklu> I have try this, i386gnu-nat.c: echo "" ; then it works.
+ <hacklu> (try $(srcdir) ing..)
+ <hacklu> make: *** No rule to make target `.../i386gnu-nat.c', needed by
+ `i386gnu-nat.o'. Stop.
+ <hacklu> seems no use.
+ <hacklu> tschwinge: I have found another thing, if I rename the
+ i386gnu-nat.o to other one, like i386gnu-nat2.o. It works!
+
+
+# IRC, freenode, #hurd, 2013-09-07
+
+ <hacklu> hi, I have found many '^L' in gnu-nat.c, should I fix it or keep
+ origin?
+ <LarstiQ> hacklu: fix in what sense?
+ <hacklu> remove the line contains ^L
+ <LarstiQ> hacklu: see bottom of
+ http://www.gnu.org/prep/standards/standards.html#Formatting
+ <LarstiQ> hacklu: "Please use formfeed characters (control-L) to divide the
+ program into pages at logical places (but not within a function)."
+ <LarstiQ> hacklu: so unless a reason has come up to deviate from the gnu
+ coding standards, those ^L's are there by design
+ <hacklu> LarstiQ: Thank you! I always think that are some format error. I
+ am stupid.
+ <LarstiQ> hacklu: not stupid, you just weren't aware
+ * LarstiQ thought the same when he first encountered them
+
+
+# IRC, freenode, #hurd, 2013-09-09
+
+ <youpi> hacklu_, hacklu__: I don't know what tschwinge thinks, but I guess
+ you should work with upstream on integration of your existing work, this
+ is part of the gsoc goal: submitting one's stuff to projects
+ <tschwinge> youpi: Which is what we're doing (see the patches recently
+ posted). :-)
+ <youpi> ok
+ <hacklu__> youpi: I always doing what you have suggest. :)
+ <hacklu> I have asked in my new mail, I want to ask at here again. Should
+ I change the gdb use lwp filed instead of tid field? There are
+ <hacklu> too many functions use tid. Like
+ <hacklu> named tid in the structure proc also.
+ <hacklu> make_proc(),inf_tid_to_thread(),ptid_build(), and there is a field
+ <hacklu> (sorry for the bad \n )
+ <hacklu> and this is my weekly
+ report. http://hacklu.com/blog/gsoc-weekly-report12-186/
+ <hacklu> And in Pedro Alves's reply, he want me to integration only one
+ back-end for gdb and gdbserver. but the struct target_obs are just
+ decalre different in both of the two. How can I integrate this? or I got
+ the mistaken understanding?
+ <hacklu> tschwinge: ^^
+ <tschwinge> hacklu: I will take this to email, so that Pedro et al. can
+ comment, too.
+ <tschwinge> hacklu: I'm not sure about your struct target_ops question.
+ Can you replay to Pedro's email to ask about this?
+ <hacklu> tschwinge: ok.
+ <tschwinge> hacklu: I have sent an email about the LWP/TID question.
+ <hacklu> tschwinge: Thanks for your email, now I know how to fix the
+ LWP/TID for this moment.
+ <tschwinge> hacklu: Let's hope that Pedro also is fine with this. :-)
+ <hacklu> tschwinge: BTW, I have a question, if we just use a locally
+ auto-generated number to distignuish threads in a process, How can we do
+ that?
+ <hacklu> How can we know which thread throwed the exception?
+ <hacklu> I haven't thought about this before.
+ <tschwinge> hacklu: make_proc sets up a mapping from Mach threads to GDB's
+ TIDs. And then, for example inf_tid_to_thread is used to look that up.
+ <hacklu> tschwinge: oh, yeah. that is.
+
+
+# IRC, freenode, #hurd, 2013-09-16
+
+ <tschwinge> hacklu: Even when waiting for Pedro (and me) to comment, I
+ guess you're not out of work, but can continue in parallel with other
+ things, or improve the patch?
+ <hacklu> tschwinge: honestly to say, these days I am out of work T_T after
+ I have update the patch.
+ <hacklu> I am not sure how to improve the patch beyond your comment in the
+ email. I have just run some testcase and nothing others.
+ <tschwinge> hacklu: I have not yet seen any report on the GDB testsuite
+ results using your gdbserver port (see
+ gdb/testsuite/boards/native-gdbserver.exp). :-D
+ <hacklu> question is, the resule of that testcase is just how many pass how
+ many not pass.
+ <hacklu> and I am not sure whether need to give this information.
+ <tschwinge> Just as a native run of GDB's testsuite, this will create *.sum
+ and *.log files, and these you can diff to those of a native run of GDB's
+ testsuite.
+ <hacklu> https://paste.debian.net/41066/ this is my result
+ === gdb Summary ===
+
+ # of expected passes 15573
+ # of unexpected failures 609
+ # of unexpected successes 1
+ # of expected failures 31
+ # of known failures 57
+ # of unresolved testcases 6
+ # of untested testcases 47
+ # of unsupported tests 189
+ /home/hacklu/code/gdb/gdb/testsuite/../../gdb/gdb version 7.6.50.20130619-cvs -nw -nx -data-directory /home/hacklu/code/gdb/gdb/testsuite/../data-directory
+
+ make[3]: *** [check-single] Error 1
+ make[3]: Leaving directory `/home/hacklu/code/gdb/gdb/testsuite'
+ make[2]: *** [check] Error 2
+ make[2]: Leaving directory `/home/hacklu/code/gdb/gdb'
+ make[1]: *** [check-gdb] Error 2
+ make[1]: Leaving directory `/home/hacklu/code/gdb'
+ make: *** [do-check] Error 2
+ <hacklu> I got a make error so I don't get the *.sum and *.log file.
+ <tschwinge> Well, that should be fixed then?
+ <tschwinge> hacklu: When does university start again for you?
+ <hacklu> My university have start a week ago.
+ <hacklu> but I will fix this,
+ <tschwinge> Oh, OK. So you won't have too much time anymore for GDB/Hurd
+ work?
+ <hacklu> it is my duty to finish my work.
+ <hacklu> time is not the main problem to me, I will shedule it for myself.
+ <tschwinge> hacklu: Thanks! Of course, we'd be very happy if you stay with
+ us, and continue working on this project (or another one)! :-D
+ <hacklu> I also thanks all of you who helped me and mentor me to improve
+ myself.
+ <hacklu> then, what the next I can do is that fix the testcase failed?
+ <tschwinge> hacklu: It's been our pleasure!
+ <tschwinge> hacklu: A comparison of the GDB testsuite results for a native
+ and gdbserver run would be good to get an understanding of the current
+ status.
+ <hacklu> ok, I will give this comparison soon. BTW,should I compare the
+ native gdb result with the one before my patch
+ <tschwinge> You mean compare the native run before and after your patch?
+ Yes, that also wouldn't hurt to do, to show that your patch doesn't
+ introduce any regressions to the native GDB port.
+ <hacklu> ok, beside this I should compare the native gdb with gdbserver ?
+ <tschwinge> Yes.
+ <hacklu> beside this, what I can do more?
+ <tschwinge> No doubt, there will be differences between the native and
+ gdbserver test runs -- the goal is to reduce these. (This will probably
+ translate to: implement more stuff for the Hurd port of gdbserver.)
+ <hacklu> ok, I know it. Start it now
+ <tschwinge> As time permits. :-)
+ <hacklu> It's ok. :)
+
+
+# IRC, freenode, #hurd, 2013-09-23
+
+ <hacklu_> I have to go out in a few miniutes, will be back at 8pm. I am
+ sorry to miss the meeting this week, I will finishi my report soon.
+ <hacklu_> tschwinge, youpi ^^
diff --git a/community/gsoc/2013/nlightnfotis.mdwn b/community/gsoc/2013/nlightnfotis.mdwn
new file mode 100644
index 00000000..a9176f51
--- /dev/null
+++ b/community/gsoc/2013/nlightnfotis.mdwn
@@ -0,0 +1,3037 @@
+[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!toc]]
+
+
+# IRC, freenode, #hurd, 2013-06-29
+
+ <teythoon> so, how is your golang port going?
+ <nlightnfotis> I just started working on it. I had been reading
+ documentation so far. Maybe over reading as people told me when I asked
+ for their feedback
+ <nlightnfotis> but I will report on what I have done (technically tomorrow,
+ and post it in the mailing list too.
+
+ <nlightnfotis> Hey guys, what could possibly cause the following error
+ message when executing a program in the Hurd? "./dumper: Could not open
+ note: (system server) error with unknown subsystem"
+ <nlightnfotis> My program is one that opens a file and dumps it into stdout
+ <nlightnfotis> pinotree: the code I am using is the one present here
+ http://www.gnu.org/software/hurd/hacking-guide/hhg.html under paragraph
+ 6.1
+ <nlightnfotis> I investigated it a bit but can not find a lead. I seem to
+ have all the rights to open the file that I want to dump to stdout
+ <pinotree> what if you reset errno to 0 just after all the declarations in
+ main, before the instructions?
+ <nlightnfotis> will check this out and get back to you.
+ <pinotree> sure :)
+ <nlightnfotis> pinotree: Now it suggests that it can't get the number of
+ readable files, which the source suggests that is normal behavior.
+ Thanks for your assistance.
+
+
+# IRC, freenode, #hurd, 2013-07-01
+
+ <nlightnfotis> youpi: from my part I can report that I have started working
+ with the code, and doing as Thomas suggested. I was about to write my
+ report yesterday, but I am facing some build errors on the HURD, which I
+ would like to investigate further before I write my report.
+ <nlightnfotis> that's why I decided to write it later in the day.
+ <youpi> I don't think you have to wait
+ <youpi> you can simply write in your report that you are having build
+ errors
+ <nlightnfotis> ok. I will have it written and delivered later in the day.
+ <nlightnfotis> braunr: that's cool. I think my reading has paid for
+ itself. And you may be pleased to know that I have gotten my hands dirty
+ with the code. I was about to write report yesterday, but some build
+ errors with the gcc (that I am investigating atm) are holding me
+ off. Will have that written later in the day.
+ <braunr> don't hesitate to ask help about build errors
+ <braunr> don't wait too much
+ <braunr> you need to progress on what matters, and not be blocked by
+ secondary problems
+ <nlightnfotis> I will see myself asking for help rather sooner than later,
+ but I would like to investigate it myself, and attempt to solve the
+ issues that occur to me before resort to bugging you guys.
+ <braunr> sure
+ <braunr> just not too long
+ <braunr> too long being a day or so
+ <nlightnfotis> these were my build_results on the hurd
+ <nlightnfotis> they were linker errors
+ <nlightnfotis>
+ https://gist.github.com/NlightNFotis/5896188#file-build_results
+ <nlightnfotis> I am trying to build gcc on a linux 32 bit environment. It
+ also has some issues but not linker errors
+ <nlightnfotis> will resolve them to see if the linker errors are
+ reproducible on linux
+ <braunr> oh, lex stuff
+ <braunr> should be easy enough
+
+
+# IRC, freenode, #hurd, 2013-07-05
+
+ <nlightnfotis> I have not made much progress, but I see myself working with
+ it.
+ <nlightnfotis> I have managed to build gcc go on Linux
+ <nlightnfotis> but Hurd seems to have some issues
+ <nlightnfotis> it seems to randomly crash
+ <teythoon> the build process?
+ <nlightnfotis> not quite randomly it seems to be though
+ <nlightnfotis> yeah
+ <nlightnfotis> I have noticed that there is a pattern
+ <nlightnfotis> it does crash after some time
+ <teythoon> ^^
+ <nlightnfotis> but it doesn't crash at specific files
+ <braunr> define crash
+ <nlightnfotis> at some times it may crash during compiling insn-emit.c
+ <braunr> (hello guys)
+ <teythoon> hi braunr :)
+ <nlightnfotis> braunr: hey there! It does seem to keep on compiling this
+ file for a very long time (I have let it do so for 10, 20, 30 minutes)
+ but the result is the same
+ <nlightnfotis> and it does so for different files for different build
+ options
+ <braunr> ok so it doesn't crash
+ <braunr> it just doesn't complete
+ <braunr> is the virtual machine eating 100% cpu during that time ?
+ <nlightnfotis> I can still type at the terminal, but I can't send a term
+ signal
+ <nlightnfotis> I can report that QEMU does hold 100% of one core at that
+ time, (like it keeps processing) but there is no output on the terminal
+ <braunr> ok
+ <nlightnfotis> of course I can type at the terminal
+ <nlightnfotis> but nothing happens
+ <braunr> any idea of the size of the files involved ?
+ <nlightnfotis> I am checking it out right now
+ <nlightnfotis> before this goes any further, let me report on my
+ investigation
+ <braunr> i expect that to be our classic writeback thread storm issue
+ <nlightnfotis> initially, I thought it might be that it run out of memory
+ <nlightnfotis> even though I know that compilation is not memory intensive,
+ rather, cpu intensive
+ <nlightnfotis> anyway I increased the size of ram available to the vm
+ <nlightnfotis> from 1024 mb to 1536
+ <nlightnfotis> that didn't seem to have any effect. The "crash" still
+ happens at the same time, at the same files
+ <braunr> use freeze
+ <braunr> not crash
+ <braunr> crash is very misleading here
+ <nlightnfotis> freeze it is then.
+ <nlightnfotis> anyway
+ <nlightnfotis> then it striked me that it might be that the hard disk size
+ (3gb) might be too small (considering the gcc git repo is 1gb+)
+ <nlightnfotis> so I resized the qemu image to 8gb of hdd size
+ <nlightnfotis> the new size is acknowledged by the vm
+ <pinotree> for gcc in debug mode? might still not be enough
+ <nlightnfotis> but still it has no effect - it seems to follow its freezing
+ patterns
+ <pinotree> giving your work, i'd have not less than 15-20
+ <braunr> i'd use 32
+ <pinotree> *given
+ <braunr> but that's because i like power of twos
+ <nlightnfotis> pinotree: thanks for the advice. Right now I was gonna
+ increase the swap size
+ <nlightnfotis> according to vmstat in the hurd
+ <nlightnfotis> swap size is 173 mb
+ <nlightnfotis> don't know if it does have an impact
+ <braunr> it may but before rushing
+ <braunr> if you need swap, you're doomed anyway
+ <braunr> consider swap highly unreliable on the hurd
+ <braunr> please show the output of df -h on the file system you're using to
+ build
+ <braunr> ideally, i'd recommend using separate / and /home file systems
+ <braunr> it really improves reliability
+ <nlightnfotis> I don't think it swaps to be honest; however that's
+ something that my mentor thomas had suggested (increasing swap size) so I
+ am gonna try it at some time.
+ <pinotree> or have a separate file system in a subdi and work on it
+ <braunr> yes, /home or whatever suits you
+ <braunr> just not /
+ <nlightnfotis> braunr: pinotree: thanks both for your advice. Will do now,
+ and report on the results.
+ <braunr> that's not all
+ <braunr> 11:17 < braunr> please show the output of df -h on the file system
+ you're using to build
+ <nlightnfotis> braunr: I am on it. Oh and btw, everytime I am forced to
+ close the vm (due to the freezes) when I restart it ext2 reports that the
+ file system was not cleanly unmounted and does some repair to some
+ files. I am trying to find an explanation for that, but I can think of
+ many things
+ <braunr> well obviously
+ <pinotree> ext2 has no journaling
+ <braunr> the file system was not cleanly unmounted since you restarted it
+ with a cold reset
+ <nlightnfotis> braunr: df -h comes out with this: "df: cannot read table of
+ mounted file systems"
+ <pinotree> also, even if you manage to always shut down correctly, when
+ fsck runs because of the maximum mount count it'd find errors anyway (so
+ we have some bug)
+ <braunr> nlightnfotis: df -h /path/to/build/dir
+ <braunr> pinotree: not really bugs but it could be cleaned up
+ <nlightnfotis> filesystem: - Size 2.8G Used 2.8G Avail 0 Use% 100% Mounted
+ on /
+ <nlightnfotis> wow
+ <braunr> nlightnfotis: see
+ <nlightnfotis> that seems to explain many things
+ <teythoon> ^^
+ <nlightnfotis> thanks for that braunr!
+ <braunr> you resized the disk, but not the partition and the file system
+ <pinotree> braunr: well, if something in ext2 (or its libs) leaves issues
+ in the fs, i'd call that a bug :>
+ <nlightnfotis> yeah, that was utterly stupid of me
+ <braunr> pinotree: they're not issues
+ <braunr> nlightnfotis: be careful, mach needs a reboot every time you
+ change a partition table
+ <teythoon> nlightnfotis: important thing is that you found the issue :)
+ <braunr> then only, you can use resize2fs
+ <teythoon> braunr: weird, I thought mach nowadays can reload the partition
+ tables?
+ <teythoon> braunr: doesn't d-i need that?
+ <braunr> maybe a recent change i forgot
+ <braunr> or maybe fdisk still reports the error although it's fine
+ <braunr> in doubt, rebooting is still safe :p
+ <teythoon> or maybe youpi hacked it into d-is gnumach
+ <braunr> i doubt it would be there for the installer only :)
+ <braunr> if it's there, it's there
+ <braunr> i just don't know it
+ <nlightnfotis> braunr: teythoon: and everyone else that helped me. Thanks
+ you all guys. This was something that was driving me crazy. Will do all
+ that you suggested and report back on my status
+
+
+# IRC, freenode, #hurd, 2013-07-08
+
+ <nlightnfotis> tschwinge, I have managed to overcome most of the obstacles
+ I had initially faced with my project
+ <nlightnfotis> but I still had some build errors, that's why I have not
+ reported yet. Wanna try to see if I can resolve them today, and write my
+ report in the afternoon.
+ <tschwinge> nlightnfotis: So, from a quick look into the IRC backlog, it
+ was a "simple" out of disk space problem? %-) That happens.
+ <tschwinge> nlightnfotis: And yes, GCC needs a lot of disk space.
+ <tschwinge> nlightnfotis: What kind of build errors are you seeing now?
+ <nlightnfotis> tschwinge, yeah I felt stupid at the time, but it didn't
+ actually strike me that the file system didn't see the extra space. Also
+ it took me some time to figure out that in order to mount the new
+ partition, I only had to edit /etc/fstab
+ <nlightnfotis> always tried to mount it with the ext2 translator
+ <nlightnfotis> and the translator kept dying
+ <nlightnfotis> but it's all figured out now
+ <nlightnfotis> the latest build errors I am seeing are these
+ <teythoon> nlightnfotis: o_O you used fstab and it worked?
+ <nlightnfotis> yeah
+ <teythoon> nlightnfotis: that's unexpected from my perspective...
+ <nlightnfotis> I only had to add the new partition into fstab
+ <nlightnfotis> teythoon: I can pastebin my fstab if you wanna take a look
+ at it
+ <nlightnfotis> tschwinge: these were my latest build errors
+ https://www.dropbox.com/s/b0pssdnfa22ajbp/build_results
+ <teythoon> nlightnfotis: I'm pretty sure that mount -a isn't done on hurd
+ w/o pinos runsystem.sysv
+ <teythoon> weird
+ <nlightnfotis> tschwinge: I have also tried to build gcc with "make -w"
+ which from what I know supresses the errors that stopped compilation
+ <nlightnfotis> but the weird thing is that gcc nearly took forever to build
+ <teythoon> nlightnfotis: could you do a showtrans /your/mountpoint?
+ <nlightnfotis> teythoon: /hurd/ext2fs /dev/hd0s3
+ <teythoon> nlightnfotis: ok, so you've set a passive translator and an
+ active is started on demand
+ <nlightnfotis> it must be a passive translator
+ <teythoon> nlightnfotis: this is the hurd way of doing things, fstab is
+ unrelated
+ <nlightnfotis> it seems to persist during reboots
+ <teythoon> yes, exactly
+ <nlightnfotis> teythoon: my fstab if you wanna take a look
+ http://pastebin.com/ef94JPhG
+ <nlightnfotis> after I added /dev/hd0s3 to fstab along with its mountpoint,
+ and restarting the hurd, only then I did manage to use that partition
+ <nlightnfotis> before doing so I tried pretty much anything involving
+ mounting the partition and setting the ext2fs translator for it, but it
+ kept dying
+ <nlightnfotis> of course it was a ext2 filesystem
+ <youpi> err, perhaps adding to fstab simply triggered an fsck at reboot?
+ <teythoon> nlightnfotis: might have been that you needed to reboot mach so
+ that it picks up the new partition table
+ <teythoon> youpi: I thought this was fixed, the partition reloading I mean?
+ <youpi> that is needed, yes
+ <youpi> let me check
+ <nlightnfotis> youpi: it could be, though, to be honest, my hurd system
+ does an fsck all the time at boot
+ <teythoon> how do you manage to do that w/o rebooting for d-i?
+ <youpi> (I don't remember whether device busy is detected)
+ <youpi> teythoon: by making all translators go away, iirc
+ <teythoon> nlightnfotis: btw, you have ~/gcc_new as mountpoint in your
+ fstab, pretty sure that this cannot work, the path has to be absolute and
+ no ~ expansion is done
+ <nlightnfotis> tbh it does work, and it's weird
+ <teythoon> nlightnfotis: it works b/c of the passive translator you set,
+ not b/c of the fstab entry
+ <nlightnfotis> teythoon: should I change it?
+ <teythoon> probably, yes
+ <tschwinge> Well, that is probably not used anywhere.
+ <teythoon> tschwinge: not yet but soon ;)
+ <tschwinge> Isn't /etc/fstab only consulted for fsck.
+ <youpi> atm yes
+ <tschwinge> Anyway, it is definitely a very good idea to have a partition
+ separate from the rootfs for doing actual work.
+ <tschwinge> I think I described that in one of the first GSoC coodridation
+ emails. In the long one.
+ <nlightnfotis> teythoon: Oh it struck me now! Is it because tilde expansion
+ is only happening in bash, but /etc/fstab is read before bash is
+ initialized?
+ <tschwinge> nlightnfotis: Instead of fumbling around with partitioning of
+ disk images, it may be easier in your KVM/QEMU setup to simply add a new
+ disk using -hdb [file] (or similar).
+ <tschwinge> nlightnfotis: Basically, yes.
+ <youpi> nlightnfotis: fstab is not related with bash in any way
+ <nlightnfotis> anyway, it shouldn't matter now, it seems to be working, and
+ I wouldn't like fiddling around with it and messing it up now. I will
+ continue with resolving the gcc issues.
+ <tschwinge> But /etc/fstab has its very own "language" (layout), so tilde
+ expansion will never be done there.
+ <tschwinge> nlightnfotis: df -h ~/gcc_new/
+ <nlightnfotis> tschwinge: size 24G Used: 4.2G Avail 18G
+ <tschwinge> OK, that's fine.
+ <tschwinge> As you can see on
+ <http://darnassus.sceen.net/~hurd-web/open_issues/gcc/#index4h1>, GCC
+ will easily need some GiB.
+ <nlightnfotis> tschwinge: I have some questions about GCC: out of curiosity
+ how much time does it take to compile it on your machine? Because
+ yesterday I tried a -w (suppress warnings) build and it seemed to take
+ forever
+ <nlightnfotis> mind you the vm has 1536 ram available (I have read
+ somewhere that it can utilise such an amount) and the vm is KVM enabled
+ <youpi> without disabling g++, it can easily take hours
+ <tschwinge> nlightnfotis: The build error is unexpected, because I had
+ addressed that issue in a recent patch. :-)
+ <tschwinge> nlightnfotis: This is wrong: »checking whether setcontext
+ clobbers TLS variables... [...] yes«. Please check your sources, that
+ they correspond to the current version of the upstream
+ tschwinge/t/hurd/go branch.
+ <tschwinge> nlightnfotis: Quoting from that wiki page: »This takes up
+ around 3.5 GiB, and needs roughly 3.5 h on kepler.SCHWINGE and 15 h on
+ coulomb.SCHWINGE.« The latter is my Hurd machine.
+ <tschwinge> That's however with Java and Ada enabled, and a full
+ three-stages bootstrap.
+ <youpi> ah, right, there's java & ada too
+ <nlightnfotis> tschwinge: git branch (in the repo): master,
+ *tschwinge/t/hurd/go
+ <youpi> in debian they are built separately
+ <tschwinge> What I asked you to do is configure »--disable-bootstrap
+ --enable-languages=go«.
+ <tschwinge> So that should be a lot quicker.
+ <nlightnfotis> tschwinge: oh yes, everytime I have tried to compile gcc I
+ have done with these configurations
+ <tschwinge> But still a few hours perhaps.
+ <nlightnfotis> that's what I did yesterday too.
+ <tschwinge> OK, good. :-)
+ <tschwinge> A bootstrap build is a good way to check the just-built GCC for
+ sanity, but we expect that it is fine, as we concentrate on the GCC Go
+ port.
+ <nlightnfotis> the only "extra" configuration yesterday was my "-w" flag to
+ make, because those errors were actually triggered by -Werror
+ <tschwinge> Let me read up what make -w does. ;-)
+ <nlightnfotis> ah, yes, d/w I have read and understood what the bootstrap
+ build is. Seems like we don't need it atm
+ <nlightnfotis> afaik it suppresses all warnings
+ <pinotree> youpi: gcj no more
+ <nlightnfotis> the way gcc builds, it does convert (some) warnings to
+ errors
+ <tschwinge> Hmm. -w, --print-directory Print a message containing the
+ working directory before and after other processing.
+ <pinotree> youpi: doko folded gcj and gdc into gcc-4.8 to "workaround"
+ Built-Using
+ <tschwinge> nlightnfotis: Ah, that'S configure --enable-werror or something
+ like that.
+ <youpi> pinotree: right
+ <nlightnfotis> yep, and -w suppresses it
+ <nlightnfotis> (from what I have understood)
+ <tschwinge> nlightnfotis: Are you thinking about make -k?
+ <tschwinge> Yeah, I guess.
+ <nlightnfotis> let me see what -k does
+ <pinotree> youpi: (just to make builds even more lightweight, eh</irony>)
+ <nlightnfotis> yeah, -k should do too, I shall try it
+ <tschwinge> But: if gcc -Werror fails, even with make -k, the build will
+ not be able to come to a successful end, because that one complation
+ artefact that failed will be missing.
+ <nlightnfotis> so I shall try again with -w (supressed warnings)
+ <tschwinge> Configureing with --disable-werror (or similar) will "help" if
+ -Werror is the default, and the build fails due to that.
+ <nlightnfotis> from what I have understood these "errors" are not something
+ critical: it's only that function prototypes for these functions are
+ missing
+ <nlightnfotis> I have seen the code there, and even "default" gcc generated
+ prototypes (from the first usage of the function) should do, so I can't
+ understand why it might be a serious problem if I tell gcc to skip that
+ point
+ <tschwinge> nlightnfotis: Ah, now I see. You don't mean make -w, but
+ rather gcc -w: »-w Inhibit all warning messages.«
+ <tschwinge> But really, there shouldn't be such warnings/errors that make
+ the build fail.
+ <nlightnfotis> yeah
+ <tschwinge> nlightnfotis: In your GCC sources directory, what does this
+ tell: git rev-parse HEAD
+ <tschwinge> And, is the checkout clean: git status
+ <tschwinge> The latter will take some time.
+ <nlightnfotis> git status takes an awful amount of time
+ <nlightnfotis> last I checked
+ <nlightnfotis> but git rev-parse HEAD
+ <nlightnfotis> produces this result:
+ <nlightnfotis> 91840dfb3942a8d241cc4f0e573e5a9956011532
+ <tschwinge> OK, that's correct. So probably some of the checked out files
+ are not in a pristine state?
+ <nlightnfotis> I shall run a git clean and see. If that doesn't work too,
+ maybe I shall reclone the repository?
+ <nlightnfotis> there's nothing foreign to the repo that I have added, only
+ lib gmp, lib mpc and lib mpfr (and they are in their own folders inside
+ my gcc working directory)
+ <tschwinge> nlightnfotis: You shouldn't need to do the latter if you
+ instead run: apt-get build-dep gcc-4.8
+ <nlightnfotis> I remember having done that inside the Hurd, but it always
+ resulted in an error from what I can recall
+ <nlightnfotis> let me check this out
+ <nlightnfotis> yes
+ <tschwinge> nlightnfotis: Whenever you use Git on Hurd, pass the --quiet
+ flag, to avoid the rare but possible corruption issue described on
+ <http://darnassus.sceen.net/~hurd-web/open_issues/git_duplicated_content/>
+ and <http://darnassus.sceen.net/~hurd-web/open_issues/git-core-2/>.
+ <nlightnfotis> tschwinge: Forgive me for that. I will set up an alias
+ immediately.
+ <tschwinge> nlightnfotis: I don't know if an alias is possible, because --
+ I think -- you'll need to do things like: git fetch --quiet
+ <tschwinge> So pass --quiet to subcommands.
+ <nlightnfotis> oh. ok.
+ <tschwinge> nlightnfotis: What you can also do, is shut down your Hurd VM,
+ and mount the disk image on GNU/Linux (mount with offset to get the right
+ partition), and then run a diff -ru against a Git clone done on
+ GNU/Linux, and see whether there are any unexpected differences outside
+ of the .git/ directory.
+ <nlightnfotis> sounds like a plan. I will check this out today then :)
+ <nlightnfotis> tschwinge: if all else fails, then recloning the repo with
+ --quiet passed should work, right?
+ <tschwinge> Yes, that's probably the most straight-forward check to do.
+ <tschwinge> Heh, yes to both these questions. :-)
+ <tschwinge> nlightnfotis: Oh, you don't even have to re-clone, but rather
+ re-check-out the branch.
+ <nlightnfotis> I was thinking of recloning just to bring the whole
+ repository to a pristine state
+ <tschwinge> So something like (inside the source directory): rm -rf ./*
+ (remove any files, but leave .* in place, in particular the .git/
+ directory), followd by git checkout -f HEAD --quiet
+ <tschwinge> nlightnfotis: But before doing that, please do the diff first,
+ so that we know (hopefully) where the erroneous build results were coming
+ from.
+ <nlightnfotis> considering the Copyright assignment files, I have sent them
+ from day 1 (that is the 20th of June). I have not heard anything about
+ those documents to date (sadly)
+ <nlightnfotis> what's worst is that although I have a reference number to
+ track those documents, their (greek postal office) tracking service sucks
+ so badly, that one day it's offline, the next it suggests it can't find
+ the object in their database, the next it says it is still in the local
+ post office
+ <nlightnfotis> let me check it out now
+ <nlightnfotis> still nothing from their online service
+ <nlightnfotis> let me call them
+ <nlightnfotis> tschwinge: I called the post office regarding the copyright
+ papers. They told me that the same day (the 20th of June) it left from
+ Herakleion, Crete to Athens and the same day it must have left the
+ country heading towards the US. They also told me it takes about 1 week
+ for it to arrive.
+ <tschwinge> nlightnfotis: OK, so probably waiting at the FSF office to be
+ processed. Let's allow for some more time. After all, this is not
+ critical for your progress.
+
+
+# IRC, freenode, #hurd, 2013-07-10
+
+ <nlightnfotis> tschwinge: I have run the diff of the GCC repo on the Hurd
+ against the one on my host linux os, and there was nothing relevant to
+ fixcontext and initcontext that are the ones that fail the
+ compilation. In any case I did recheck out the branch, and I have
+ attempted a build with it. It fails at the same point. Now I am
+ attempting a build with the -w (inhibit warnings) flag enabled
+ <tschwinge> nlightnfotis: Have there been any differences in the diff?
+ There should be none at all.
+ <nlightnfotis> tschwinge: there were some small changes due to the repo's
+ being checked out at different times. It was a large diff however. I
+ inspected it and didn't find anythign that was of much use. Here it is in
+ case you might want to see it:
+ https://www.dropbox.com/s/ilgc3skmhst7lpv/diffs_in_git.txt
+ <tschwinge> nlightnfotis: Well, the idea of this exercise precisely was to
+ use the same Git revisions on both sides of the diff -- to show that
+ there are no spurious differences -- which can't be shown from your
+ 124486 lines diff. (Even though indeed there is no difference in
+ libgo/configure that would explain the mis-match, but who knows what else
+ might be relevant for that.
+ <tschwinge> Would you please repeat that?
+ <nlightnfotis> tschwinge: I will do so. It was wrong from me to not diff
+ against the same revisions, but going through the diff results grepping
+ for the problematic code didn't yield any results, so I thought that
+ might not be the issue.
+ <nlightnfotis> I will perform the diff again tomorrow morning and report on
+ the results.
+ <tschwinge> nlightnfotis: Anyway, if you checked out again, the latest
+ revision, and it still fails in exactly the same way, there is something
+ wrong.
+ <tschwinge> nlightnfotis: And -w won't help, as there is a hard error
+ involved.
+ <tschwinge> nlightnfotis: Are yous till working on GSoC things today?
+ <nlightnfotis> tschwinge: yeah I am here. I decided to do the diff today
+ instead of tomorrow.
+ <nlightnfotis> It finished now btw
+ <nlightnfotis> let me tell you
+ <nlightnfotis> ah and this time, the gits were checked out at the same time
+ <nlightnfotis> from the same source
+ <nlightnfotis> and are at the same branch
+ <tschwinge> nlightnfotis: Coulod you upload the
+ gccbuild/i686-unknown-gnu0.3/libgo/config.log of the build that failed?
+ <nlightnfotis> tschwinge: sure. give me a minute
+ <nlightnfotis> tschwinge: there is something strange going on. The two
+ repos are at the exact same state (or at least should be, and the logs
+ indicate them to be) but still the diff output is 4.4 mb
+ <nlightnfotis> but no presence of initcontext of fixcontext
+ <nlightnfotis> tschwinge: the config.log file -->
+ http://pastebin.com/bSCW1JfF
+ <nlightnfotis> wow! I can see several errors in the config.log file
+ <nlightnfotis> but I am not so sure about their fatality. Config returns 0
+ at the end of the log
+ <tschwinge> nlightnfotis: As the configure scripts probe for all kings of
+ features on all kings of strange systems, it's to be expected that some
+ of these fail on GNU/Hurd.
+ <tschwinge> What is not expected, however, is:
+ <tschwinge> configure:15046: checking whether setcontext clobbers TLS
+ variables
+ <tschwinge> [...]
+ <tschwinge> configure:15172: ./conftest
+ <tschwinge> /root/gcc_new/gcc/libgo/configure: line 1740: 1015 Aborted
+ ./conftest$ac_exeext
+ <tschwinge> Hmm. apt-cache policy libc0.3
+ <tschwinge> nlightnfotis: ^
+ <nlightnfotis> tschwinge: Installed 2.13-39+hurd.3
+ <nlightnfotis> Candidate: 2.1-6
+ <nlightnfotis> *2.17
+ <tschwinge> Bummer.
+ <tschwinge> nlightnfotis: As indicated in
+ <http://news.gmane.org/find-root.php?message_id=%3C87li6cvjnl.fsf%40kepler.schwinge.homeip.net%3E>
+ and thereabouts, you need 2.17-3+hurd.4 or later...
+ <tschwinge> Well.
+ <tschwinge> At least that now explains what is going on.
+ <nlightnfotis> tschwinge: i see. I am in the process of updating my hurd
+ vm. I saw that libc has also been updated to 2.17
+ <nlightnfotis> I will confirm when updating is done
+ <tschwinge> nlightnfotis: Anyway, is the diff between the two repositories
+ empty now or are there still differences?
+ <nlightnfotis> there are differences
+ <nlightnfotis> and they were checked out at the same time
+ <nlightnfotis> from the same source
+ <nlightnfotis> (the official git mirror)
+ <nlightnfotis> and they are both at the same branch
+ <nlightnfotis> and still diff output is 4.4 MB
+ <nlightnfotis> but quick grepping into it and there is not mention of
+ initcontext or fixcontext
+ <tschwinge> That's... unexpected.
+ <nlightnfotis> may be a mistake I am making
+ <nlightnfotis> but considering that diff run for some time before
+ completing
+ <tschwinge> In both Git repositories, »git rev-parse HEAD« shows the same
+ thing?
+ <tschwinge> Could you please upload the diff again?
+ <nlightnfotis> tschwinge: confirmed. libc is now version 2.17-1
+ <nlightnfotis> tschwinge: http://pastebin.com/bSCW1JfF
+ <nlightnfotis> for the rev-parse give me a second
+ <tschwinge> nlightnfotis: Where is libc0.3 2.17-1 coming from? You need
+ 2.17-3+hurd.4 or later.
+ <nlightnfotis> it is 2.17-7+hurd.1
+ <tschwinge> OK, good.
+ <tschwinge> The URL you just have is the config.log file, not the diff.
+ <tschwinge> s%have%gave
+ <nlightnfotis> oh my mistake
+ <nlightnfotis> wait a minute
+ <nlightnfotis> the two repos have different output to rev-parse
+ <tschwinge> Phew.
+ <tschwinge> That explains.
+ <tschwinge> So the Git branches are at different revisions.
+ <nlightnfotis> that confused me... when I run git pull -a the branches that
+ were changed were all updated to the same revision
+ <nlightnfotis> unless... there were some automatic merges in the *host* GCC
+ repo required during some pulls
+ <nlightnfotis> but that was some time ago
+ <nlightnfotis> would it have messed my local history that much?
+ <nlightnfotis> that's the only thing that may be different between the two
+ repos
+ <nlightnfotis> they checkout from the same source
+ <tschwinge> nlightnfotis: At which revisions are the two
+ repositories/branches?
+ <tschwinge> I have never used »put pull -a«. What does that do?
+ <nlightnfotis> tschwinge: from what I know it does an automatic git fetch
+ followed by git merge. The -a flag must signal to pull all branches (I
+ think it's possible to pull only one branch)
+ <tschwinge> That's the --all option. -a is something different (that I
+ don't understand off-hand).
+ <tschwinge> Well, --all means to pull all remotes.
+ <tschwinge> But you just want the GCC upstream, I guess.
+ <tschwinge> I always use git fetch and git merge manually.
+ <nlightnfotis> oh my god! You are write. -a is equivallent to --append
+ <nlightnfotis>
+ https://www.kernel.org/pub/software/scm/git/docs/git-pull.html
+ <nlightnfotis> git pull must be safe though
+ <nlightnfotis>
+ http://stackoverflow.com/questions/292357/whats-the-difference-between-git-pull-and-git-fetch
+ <nlightnfotis> without the -a
+ <nlightnfotis> *right
+ <nlightnfotis> why did I even write "right" as "write" above I don't
+ even...
+ <nlightnfotis> what did I write in the sentence above
+ <nlightnfotis> oh my god...
+ <nlightnfotis> tschwinge: they are indeed on different revisions: The host
+ repo's last commit was made by me apparently, to merge master into
+ tschwinge/t/hurd/go, whereas the last commit of the Hurd repo was by you
+ and it reverted commit 2eb51ea
+ <nlightnfotis> and that should also explain the large diff file
+ <nlightnfotis> with master merged into the tschwinge/t/hurd/go branch
+ <nlightnfotis> I will purge the debian repo and redownload it
+ <nlightnfotis> *reclone it
+ <nlightnfotis> that should bring it to a safe state I suppose.
+
+
+# IRC, freenode, #hurd, 2013-07-11
+
+ <teythoon> nlightnfotis: how's your build going?
+ <nlightnfotis> I tried one earlier and it seemed to build without any
+ issues, something that was...strange. I am repeating the build now, but I
+ am saving the compilation output this time to study it.
+ <teythoon> it was strange that the build succeeded? that sounds sad :/
+ <nlightnfotis> teythoon: considering that 3 weeks now I failed to build it
+ without errors, it sure seems weird that it builds without errors now :)
+ <braunr> what did you change ?
+ <nlightnfotis> braunr: not many things apparently. To be honest the change
+ that seemed to do the trick was (under thomas' guidance) update of libc
+ from 2.13 to 2.17
+ <braunr> well that can explain
+ <nlightnfotis> tschwinge: Big update! GCC-go not compiles without errors
+ under the Hurd. I have done 2 compilations so far, none of which had
+ issues. Time needed for full build (without bootstrap) is 45 minutes +- 1
+ minute. I also run the test suite, and I can confirm your results
+ <pinotree> s/not/now/, perhaps?
+ <nlightnfotis> pinotree yeah. I don't know how it came up with not there. I
+ meant now
+ <nlightnfotis> tschwinge: link for the go.sum is here -->
+ https://www.dropbox.com/s/7qze9znhv96t1wj/go.sum
+
+
+# IRC, freenode, #hurd, 2013-07-12
+
+ <tschwinge> nlightnfotis: Great! So you finally reproduced my results.
+ :-)
+ <nlightnfotis> tschwinge: Yep! I am now building a blog, so that I can move
+ my reports there, so that they are more detailed, to allow for greater
+ transparency of my actions
+ <tschwinge> nlightnfotis: Did you recently (in email, I think?) indicate
+ that there is another Go testsuite, for libgo?
+ <tschwinge> nlightnfotis: As you prefer.
+ <nlightnfotis> tschwinge: there seemed to be one, at least in linux. I
+ think I saw one in the Hurd too.
+ <tschwinge> Oh indeed there is a libgo testsuite, too.
+ <nlightnfotis> as a matter of fact, make check-go
+ <nlightnfotis> did check for the lib
+ <nlightnfotis> but lib was failing
+ <nlightnfotis> yeah
+ <tschwinge> So please have a look at that testsuite's results, too, and
+ compare to the GNU/Linux ones.
+ <nlightnfotis> sure. I can do that now.
+ <tschwinge> And for the go.sum you posted, please have a look at the tests
+ that do not pass (»grep -v ^PASS: < go.sum«), assuming they do pass on
+ GNU/Linux.
+ <tschwinge> I suggest you add a list of the differences between GNU/Linux
+ and GNU/Hurd testresults to the wiki page,
+ <http://darnassus.sceen.net/~hurd-web/open_issues/gccgo/>, at the end of
+ the Part I section.
+ <nlightnfotis> I'm on it.
+ <tschwinge> For now, please ignore any failing tests that have »select« in
+ their name -- that is, do file them, but do not spend a lot of time
+ figuring out what might be wrong there.
+ <tschwinge> The Hurd's select implementation is a bit of a beast, and I
+ don't want you -- at this time -- spend a lot of time on that. We
+ already know there are some deficiencies, so we should postpone that to
+ later.
+ <nlightnfotis> tschwinge: noted.
+ <tschwinge> So what I would like at the moment, is a list of the testresult
+ differences to GNU/Linux, then from the go.log file any useful
+ information about the failing test (which perhaps already explains)
+ what's going wrong, and then a analysis of the failure.
+ <tschwinge> nlightnfotis: I assume you must be really happy that you
+ finally got it build fine, and reproduced my results. :-)
+ <nlightnfotis> tschwinge: yeah! I can not hide from you the fact that
+ failing all those builds made me really nervous about me missing my
+ schedule. Having finally built that and revisiting my application I can
+ see I am on schedule, but I have to intensify my work to compensate for
+ any potential unforeseen obstacles
+ <nlightnfotis> , in the futute
+ <nlightnfotis> *future
+
+
+# IRC, freenode, #hurd, 2013-07-15
+
+ <youpi> nlightnfotis: btw, do you have a weekly progress report?
+ <nlightnfotis> youpi: not yet. Will write it shortly and post it here. I
+ made a new blog to keep track of my progress.
+ <nlightnfotis> Will report much more frequently now via my blog
+ <youpi> did you add your blog url to the hurd iwki?
+ <nlightnfotis> currently I am running gcc tests on both gcc go and libgo to
+ see what the differences are with Linux
+ <nlightnfotis> I believe I have done so, let me see
+ <nlightnfotis> youpi: gccgo passes most of its tests (it fails a small
+ number, and I am looking into those tests) but libgo fails 130/131 tests
+ (on the Hurd that is)
+ <youpi> ok
+
+ <nlightnfotis> guys I wrote my report. This time I made it available on my
+ personal blog. You can find it here:
+ www.fotiskoutoulakis.com/blog/2013/07/15/gsoc-week-4-report/ As always,
+ open to (and encouraging) criticism, suggestions, anything that might
+ help me.
+ <nlightnfotis> I also have to mention that now that my personal website is
+ online, I will report much more frequently, to the scale of reporting day
+ by day, or every 2-3 days.
+ <youpi> nlightnfotis: without spending time on select, it'd be good to have
+ an idea of what is going wrong
+ <braunr> eh, go having trouble with select
+ <youpi> select is a beast, but we do have fixed things lately and we don't
+ currently know any issue still pending
+ <nlightnfotis> youpi: are you suggesting to not skip the select tests too?
+ <braunr> select is kind of critical ..
+ <braunr> as youpi said, if you can determine what's wrong, at the interface
+ level (not the implementation), it would be a good thing to do
+ <youpi> so we know what's wrong
+ <youpi> we're not asking to fix it, though
+ <nlightnfotis> braunr: youpi: noted. Thanks for the feedback. Is there
+ something else you might want me to improve? Something with the report
+ itself? Something you were expecting to see but I failed to provide?
+ <braunr> no it's ok
+ <braunr> it's short, readable, and readily answers the questions i might
+ have had so it's good
+ <braunr> as you say, now you have to work on the core of your task :)
+ <youpi> note: the "select" word in the testsuite is not strictly bound to
+ the C "select"
+ <youpi> so it is probably really worth digging a bit at least on the go
+ side
+ <braunr> but it's really worth doing in the end, as it will probably reveal
+ some nasty bugs on the way
+ <nlightnfotis> I appreciate your input. I will start working on it asap
+ (today) and will report on Wednesday perhaps (or Thursday at worst).
+
+
+# IRC, freenode, #hurd, 2013-07-18
+
+ <nlightnfotis> braunr: I found out what was causing the fails in the tests
+ <nlightnfotis> in both libgo and gccgo
+ <nlightnfotis> it's a assertion: mach_port_t ktid = __mach_thread_self ();
+ int ok = thread->kernel_thread == ktid; __mach_port_deallocate
+ ((__mach_task_self_ + 0), ktid); ok; })
+ <braunr> is all that the assertion ?
+ <nlightnfotis> yes
+ <braunr> please paste the code somewhere
+ <braunr> or is it in libpthread ?
+ <nlightnfotis> http://pastebin.com/G2w9d474
+ nonblock.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
+ 9 FAIL: go.test/test/chan/nonblock.go execution, -O2 -g
+ <braunr> yes
+ <braunr> that's related to my current work on thread destruction
+
+[[open_issues/libpthread/t/fix_have_kernel_resources]].
+
+ <braunr> thread resources recycling is buggy
+ <braunr> i suggest you make your own thread pool if you can
+ <nlightnfotis> I will look into it further and let you know. Thanks for
+ that.
+
+
+# IRC, freenode, #hurd, 2013-07-22
+
+ <nlightnfotis> tschwinge, I have found what is failing both libgo and gccgo
+ tests, but for the life of me, I can not really find the offending code
+ on any repository.
+ <nlightnfotis> not even the eglibc-source debian package. it's driving me
+ insane.
+ <tschwinge> nlightnfotis: If this is driving you insane, we should quickly
+ have a look at that!
+ <nlightnfotis> thanks tschwinge: I have found that the offending code is an
+ assertion: { mach_port_t ktid = __mach_thread_self (); int ok =
+ thread->kernel_th read == ktid; __mach_port_deallocate ((__mach_task_s
+ elf_ + 0), ktid); ok; } on a file called pt-create.c under the
+ libpthread on line 167
+ <nlightnfotis> but for the life of me, I can not find that piece of code
+ anywhere. And when I mean anywhere, I mean anywhere. I have looked for it
+ on all of the branches of glibc, libpthread and the source code of
+ eglibc.
+ <nlightnfotis> that's why if you don't mind I would like to write my report
+ in a day or two, when (hopefully) I will have more progress to report on.
+ <youpi> nlightnfotis: isn't that libpthread/sysdeps/mach/pt-thread-start.c
+ ?
+ <youpi> or rather, ./sysdeps/mach/hurd/pt-sysdep.h
+ <nlightnfotis> youpi: let me check this out. If that's it I'm gonna cry.
+ <youpi> which unfortunately is inlined in a lot of places
+ <youpi> nlightnfotis: does the assertion not tell you the file & line?
+ <nlightnfotis> youpi: holy smokes! That's the code I was looking for! Oh
+ boy. Yeah the logs do tell me, but it was very misleading. So misleading,
+ taht I was actually looking at the wrong place. All logs suggest that
+ this piece of code is at libpthread/pthread/pt-create.c in line 167
+ <youpi> what is that line in your tree?
+ <youpi> a call to _pthread_self(), isn't it?
+ <youpi> then it's not actually misleading, this is indeed where the
+ pt-sysdep.h definition gets inlined
+ <nlightnfotis> it seems so, yeah. it's err = __pthread_sigstate
+ (_pthread_self (), 0, 0, &sigset, 0);
+ <youpi> nlightnfotis: and what is the backtrace?
+ <nlightnfotis> youpi: _pthread_create_internal: Assertion failed.
+ <nlightnfotis> The assertion is the one above
+ <youpi> nlightnfotis: sure, but what is the backtrace?
+ <nlightnfotis> I don't have the full backtrace. These are the logs from the
+ compiler. All I can get is: reports like this: nonblock.x:
+ ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({
+ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread
+ == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid);
+ ok; })' failed.
+ <youpi> nlightnfotis: you should probably have a look at running the tests
+ by hand
+ <youpi> so you can run them in a debugger, and get backtraces etc.
+ <braunr> nlightnfotis: did i answer that ?
+ <nlightnfotis> braunr: which one?
+ <braunr> the problems you're seeing are the pthread resources leaks i've
+ been trying to fix lately
+ <braunr> they're not only leaks
+ <braunr> creation and destruction are buggy
+ <nlightnfotis> I have read so in
+ http://www.gnu.org/software/hurd/libpthread.html. I believe it's under
+ Thread's Death right?
+ <braunr> nlightnfotis: yes but it's buggy
+ <braunr> and the description doesn't describe the bugs
+ <nlightnfotis> so we will either have to find a temporary workaround, or
+ better yet work on a fix, right?
+ <braunr> nlightnfotis: i also told you the work around
+ <braunr> nlightnfotis: create a thread pool
+ <nlightnfotis> braunr: since thread creation is also buggy, wouldn't the
+ thread pool be buggy too?
+ <braunr> nlightnfotis: creation *and* destruction is buggy
+ <braunr> nlightnfotis: i.e. recycling is buggy
+ <braunr> nlightnfotis: the hurd servers aren't affected much because the
+ worker threads are actually never destroyed on debian (because of a
+ debian specific patch)
+
+ <teythoon> youpi, nlightnfotis, hacklu_: btw, what about the copyright
+ assignment process
+ <tschwinge> nlightnfotis just got his on file, so there is progress.
+ <tschwinge> I have email from Donald R Robertson III
+ <copyright-clerk@fsf.org> about that -- but it is not yet present in the
+ FSF copyright.list file...
+ <tschwinge> I think I received that email because I was CCed on
+ nlightnfotis' submission.
+ <nlightnfotis> tschwinge: I have got the papers, and they were signed by
+ the FSF. They stated delivery date 11 of July, but the documents were
+ signed on the 10th of July :P
+ <tschwinge> Ah, no, I received it via hurd-maintainers@gnu.org -- and the
+ strange thing is that not all assignments that got processed got sent
+ there...
+ <tschwinge> At the recent GNU Tools Cauldron we also discussed this in the
+ GCC context; and their experience was the very same. Emails get lost,
+ and/or take ages to be processed, etc.
+ <tschwinge> It seems the FSF is undermanned.
+
+
+# IRC, freenode, #hurd, 2013-07-27
+
+ <nlightnfotis> I have one question about the Mach sources: I can see it
+ uses its own scheduler (more like, initializes) and also does the same
+ for the linux scheduler. Which one does it use?
+ <youpi> it doesn't use the linux scheduler
+ <youpi> the linux glue just glues linux scheduling concepts onto the mach
+ scheduler
+ <nlightnfotis> ohh I see now. Thanks for that youpi.
+
+
+# IRC, freenode, #hurd, 2013-07-28
+
+ <nlightnfotis> In the mach kernel source code, does the (void) before a
+ function call have a semantic meaning, or is it just remnants of the past
+ (or even documentation)
+ <pinotree> for example?
+ <nlightnfotis> pinotree: (void) thread_create (kernel_task,
+ &startup_thread);
+ <nlightnfotis> I read on stack overflow that there is only one case where
+ it has a semantic meaning, most of the times it doesn't
+ <nlightnfotis>
+ http://stackoverflow.com/questions/13954517/use-of-void-before-a-function-call
+ <pinotree> most probably thread_create has a non-void return value, and
+ this way you're explicitly suppressing its return value (usually because
+ you don't want/need to care about it)
+ <nlightnfotis> isn't the value discarded if the (void) is not there?
+ <pinotree> yes, but depending on extra attributes and/or compiler warning
+ flags the compiler might warn that the return value is not used while it
+ ought to
+ <pinotree> the cast to void should suppress that
+ <nlightnfotis> oh, okay, thanks for that pinotree
+ <nlightnfotis> and yes you are right that thread_create actually does
+ return something
+ <pinotree> even if there would be no compiler message about that, adding
+ the explicit cast could mean "yes, i know the function does return
+ something, but i don't care about it"
+ <pinotree> ... as hint to other code readers
+ <nlightnfotis> as a form of documentation then
+ <pinotree> also
+
+ <nlightnfotis> oh well, I am gonna ask and I hope someone will answer it:
+ In the Mach's dmesg (/var/log/dmesg) I can see that the version string
+ along with initial memory mapping information are printed twice, when in
+ fact they are supposed to be called only once. Is this a bug, or some
+ buffering error, or are they actually called twice for some reason?
+
+
+# IRC, freenode, #hurd, 2013-07-29
+
+ <nlightnfotis> guys is the evaluation today?
+ <hacklu_> yes
+ <teythoon> right
+ <nlightnfotis> where can we find the evaluation papers on melange?
+ <hacklu_> wait untill 12pm UTC.
+ <nlightnfotis> yeah, I just noticed thanks hacklu_
+ <hacklu_> nlightnfotis:)
+
+ <NlightNFotis> tschwinge: I only have one question regarding my project. If
+ I make some changes to libpthread, what's the best way to test them in
+ the hurd? Rebuild glibc with the updated libpthread?
+ <tschwinge> NlightNFotis: Yes, you'll have to rebuild glibc. I have a
+ cheat sheet for that:
+ http://darnassus.sceen.net/~hurd-web/open_issues/glibc/debian/
+ <tschwinge> It may be that the »Run debian/rules patch to apply patches«
+ step is no longer encessary with the 2.17 glibc packages.
+ <NlightNFotis> thanks for that tschwinge. :)
+ <tschwinge> NlightNFotis: Sure. :-)
+
+ <tschwinge> NlightNFotis: Where's your weekly status?
+ <NlightNFotis> I will write it today at the noon. I have written all the
+ other ones, and they are available at www.fotiskoutoulakis.com
+ <NlightNFotis> the next one will be available there as well, later in the
+ day
+ <tschwinge> Ack. But please try to finish your report before the meeting,
+ as discussed.
+ <NlightNFotis> oh, forgive me for that. I thought it was ok to write my
+ report a day or so later. Sorry.
+ <tschwinge> NlightNFotis: Please write your report as soon as possible --
+ otherwise there's no useful way for me to know what your status is.
+ <NlightNFotis> I will. This week I have been mostly going through the
+ various sources (the Hurd, Mach and libpthread, especially the last two)
+ in my attempt to get a better understanding for how libpthread
+ works. Since yesterday I have attempted some small changes on my
+ libpthread repo that I plan on testing and reporting on them. That's why
+ I still have not written my report.
+ <tschwinge> NlightNFotis: Things don't need to be finished before you
+ report about them. It's often more useful to discuss issues *before* you
+ spend time on implementing them.
+ #hurd
+ <braunr> NlightNFotis: what kind of changes do you want to add to
+ libpthread ?
+ <tschwinge> Have a look at the asseriton failure, I would hope. :-)
+ <braunr> well no
+ <braunr> again, i did that
+ <braunr> and it's not easy to fix
+ <NlightNFotis> braunr: I was looking into ways that I could create the
+ thread pool you suggested into libpthread
+ <braunr> no, don't
+ <braunr> create it in your application
+ <braunr> not in libpthread
+ <braunr> well, this may not be an acceptable solution either ..
+ <tschwinge> Before doing that we have to understand what exactly the Go
+ runtime is doing. It may just be a weird itneraction with the setcontext
+ et al. functions that I failed to think about when implementing these?
+ <NlightNFotis> the other possibility is the go runtime libraries. But I
+ thought that libpthread might be a better idea, since you told me that
+ creation *and* destruction are buggy
+ <hacklu> braunr: you are right, the signal thread is always exist. I have
+ got a wrong understand before.
+ <NlightNFotis> tschwinge: I can look into that, now. I will also include
+ that in my report.
+ <braunr> NlightNFotis: i don't see how this is a relevant argument ..
+ <braunr> tschwinge: i'd suggest he first try with a custom pool in the go
+ runtime, so we exclude what you're suspecting
+ <braunr> if this pool actually works around the issues NlightNFotis is
+ having, it will confirm the offending problem comes from libpthread
+ <tschwinge> So, as a very first step make any thread
+ distruction/deallocation a no-op.
+ <braunr> yes
+ <NlightNFotis> braunr: I originally understood that a thread pool might
+ skip the thread's destruction, so that we escape the buggy part with the
+ thread's destruction. Since that was a problem with libpthread, it sure
+ affects other threads (instead of go's ) too. So I assumed that building
+ the thread pool into libpthread might help eliminate bugs that may affect
+ other code too.
+ <braunr> no, it's not a proper fix
+ <braunr> it's a work around
+ <braunr> and i'm working on a proper fix in parallel
+ <braunr> (when i have the time, that is :/)
+ <NlightNFotis> oh, I see. So for the time, I had better not touch
+ libpthread, and take a look at the go run time aye?
+ <tschwinge> NlightNFotis: Remember: one thing after the other. First
+ identify what is wrong exactly. Then think and discuss how to solve the
+ very specific issue. Then implement it.
+ <braunr> as tschwinge said, make thread destruction a nop in go
+ <braunr> see if that helps
+ <tschwinge> NlightNFotis: For example, you surely have noticed (per your
+ last report), that basically all Go language test pass (aside from the
+ handful of those testing select, etc.) -- but all those of the libgo
+ runtime library fail, literally all of them.
+ <tschwinge> You noticed they basically all fail with the same assertion
+ failure. But why do all the Go language ones work fine?
+ <tschwinge> Don't they execute the program they built, for example?
+ <tschwinge> (I haven't looked.)
+ <NlightNFotis> they do execute the program. the language ones that fail
+ too, fail due to the assertion failure
+ <tschwinge> Or, what else is different for them? How are they built, which
+ flags, how are they invoked.
+ <braunr> how many goroutines ?
+ <braunr> :p
+ <tschwinge> Do you also get the assertion failure when you built a small Go
+ program yourself and run that one.
+ <tschwinge> Don't get the assertion failure? Then add some more complex
+ stuff that are likely to invole adding/re-using new threads, such as
+ goroutines.
+ <NlightNFotis> I didn't get the assertion failure on a small test program,
+ but now that you suggest it it might be a good idea to build a custom
+ test suite
+ <tschwinge> Etc. That way you'll eventually get an understanding what
+ triggers the assertion failure.
+ <tschwinge> And that exeactly is the kind of analysis I'd like to read in
+ your weekly report.
+ <tschwinge> A list of things what you have done, which assuptions you've
+ made, how that directed your further analysis, what results that gave,
+ etc.
+ <NlightNFotis> I will do it. I will try to rush to finish it today before
+ you leave, so that you can inspect it. God I feel like all that time I
+ spent this week studying the particular source code (libpthread, and the
+ Mach) were in vain...
+ <NlightNFotis> on second thoughts, it was not in vain. I got a pretty good
+ understanding of how these pieces of software work, but now I will have
+ to do something completely different.
+ <tschwinge> Studying code is never in vain.
+ <tschwinge> Exactly.
+ <tschwinge> You must have had some motivation to study the code, so that
+ was surely a valid thing to do.
+ <tschwinge> But we'd link to understand your reasoning, so that we can
+ support you and direct you accordingly.
+ <braunr> but it's better to focus on your goals and determine an
+ appropriate course of actions, usually starting with good analysis
+ <tschwinge> Yes.
+ <pinotree> s/link/like/?
+ <tschwinge> pinotree: Indeed, thanks.
+ <braunr> makes me remember when i implemented radix trees to replace splay
+ trees, only to realize splay trees were barely used ..
+ <tschwinge> braunr: Yes. It has happened to all of us. ;-P
+ <tschwinge> NlightNFotis: So, don't worry -- but learn from such things.
+ :-)
+ <NlightNFotis> anyway, I will start right away with the courses of action
+ you suggested, and will try to have finished them by noon. Thanks for
+ your help, it really means a lot.
+ <tschwinge> In software generally, it is never a good idea to let you be
+ distracted, and don't follow your focus goal, because there are always so
+ many different things that could be improved/learned/fixed/etc.
+ <NlightNFotis> tschwinge, I am only nervous about one thing: the fact that
+ I have not submitted yet any patch or some piece of code in general. Then
+ again, the summer of code for me so far has been 70-80% reading about
+ stuff I didn't know about and 30-20% doing the stuff I should know
+ about...
+ <tschwinge> NlightNFotis: That's why we're here, to teach you something.
+ Which we're happy to do, but we all need to cooperate for that (and I'm
+ well aware that this is difficult if one is not in the same rooms, and
+ I'm also aware that my time is pretty limited).
+ <tschwinge> NlightNFotis: We're also very aware that the Hurd system, as
+ any operating system project (if you're not just doing "superficial"
+ things) is difficult, and takes lots of time to learn, and have concepts
+ and things sink into your brain.
+ <braunr> i wouldn't worry too much
+ <tschwinge> We're also still learning every day.
+ <braunr> go doesn't require a lot from the underlying system, but what is
+ required is critical
+ <braunr> once you identify it, coding will be quick
+ <NlightNFotis> tschwinge: braunr: thanks. I shall begin working following
+ the directions you gave to me.
+ <tschwinge> NlightNFotis: So yes, because Google wants us to grade you
+ based on that, you'll eventually have to write some code, but for
+ example, a patch to disable thread distruction/deallocation in libgo
+ would definitely count as such code. And that seems like one of your
+ next steps.
+ <NlightNFotis> tschwinge: i need to deliver that instantly, right? seeing
+ as the evaluation is today.
+ <tschwinge> NlightNFotis: No. Deliver it when you have something to
+ deliver. :-)
+ <NlightNFotis> tschwinge: I am nervous about the evaluation today. I have
+ not submitted a single piece of code, only some reports. How negatively
+ does this influence my performance report?
+ <tschwinge> NlightNFotis: If I can say so, in the evaluation today, Google
+ basically asks us mentors whether we want to fail our students right now.
+ Which I don'T plan to do, knowing about the complexity of the Hurd
+ system, and the learning required before you can do useful code changes.
+ <NlightNFotis> tschwinge: that really means a lot to me, and it got a
+ weight of my chest.
+ <braunr> uh ok, i have to be the rude guy again
+ <braunr> NlightNFotis: the gsoc is also a way for the student to prepare
+ for working in software development communities
+ <braunr> whether free software/open source and/or in companies
+ <braunr> people involved care a lot less about pathos than actual results
+ <pinotree> (or to prepare students to be hired by google, but that's
+ another story)
+ <braunr> NlightNFotis: in other words, stop apologizing that much, stop
+ focusing so much on that, and just work as you can
+
+
+# IRC, freenode, #hurd, 2013-07-31
+
+ <nlightnfotis> teythoon: both samuel and thomas would be missing for the
+ week right?
+ <teythoon> nlightnfotis: they do, why?
+ <teythoon> nlightnfotis: err, they do?? why?
+
+
+# IRC, freenode, #hurd, 2013-08-01
+
+ <nlightnfotis> braunr: I checked out what you (and Thomas) suggested and
+ did some research on go on the Hurd. I have found out that go works,
+ until you need to use anything that has to do with a goroutine. I am now
+ playing with the go runtime and checking to see if turning thread
+ destruction to noop will have any difference.
+
+
+# IRC, freenode, #hurd, 2013-08-05
+
+ <nlightnfotis> youpi: whenever you have time, I would like to report my
+ progress as well.
+ <youpi> nlightnfotis: sure, go ahead
+ <youpi> but again, you should report before the meeting
+ <youpi> so we can read it before coming to the discussion
+ <nlightnfotis> I have written my report
+ <youpi> ah
+ <hacklu> nlightnfotis: I have read your report, these days you have make a
+ great progress.
+ <youpi> where is it?
+ <nlightnfotis> it was available since yesterday
+ <nlightnfotis>
+ http://www.fotiskoutoulakis.com/blog/2013/08/05/gsoc-partial-week-7-report/
+ <nlightnfotis> thanks hacklu. The particular piece of code I was studying
+ was very very interesting :)
+ <hacklu> nlightnfotis: I think you should show your link in here or email
+ next time. I have spend a bit more time to find that :)
+ <nlightnfotis> youpi: for a tldr, at the last time I was told to check
+ gccgo's runtime for clues regarding the go routine failures.
+ <nlightnfotis> hacklu: will keep that in mind, thanks.
+ <nlightnfotis> youpi: thing is, gccgo operates on two different thread
+ types: G's (the goroutines, lightweight threads that are managed by the
+ runtime) and M's (the "real" kernel threads")
+ <nlightnfotis> none of which are really "destroyed"
+ <youpi> ok, makes sense
+ <nlightnfotis> G's are put in a pool of available goroutines when their
+ status is changed to "Gdead" so that they can be reused
+ <nlightnfotis> M's also don't seem to go away. There is always at least one
+ M (the bootstrap one) and all other M's that get created are also stashed
+ in a pool of available working threads.
+ <youpi> you could put some debugging printfs in libpthread, to make sure
+ whether threads do die or not
+ <nlightnfotis> I am studying this further as we speak, but they both don't
+ seem to get "destroyed", so that we can be sure that bugs are triggered
+ by thread destruction
+ <nlightnfotis> I was beginning to believe that maybe I was looking in the
+ wrong direction
+ <nlightnfotis> but then I looked at my past findings, and I noticed
+ something else
+ <nlightnfotis> if you take a look at the first failed go routine, it failed
+ at the time.sleep function, which puts a goroutine to sleep for ns
+ nanoseconds. That made me think if it was something that had to do with
+ the context functions and not the goroutines' creation.
+ <youpi> nlightnfotis: that's possible
+ <youpi> nlightnfotis: I'd say you can focus on this very simple example: a
+ mere sleep
+ <youpi> that's one of the simplest things a thread scheduler has to do, but
+ it has to do it right
+ <youpi> fixing that should fix a lot of other issues
+ <nlightnfotis> if I have understood correctly, there is at least one G
+ (Goroutine) and at least one M (kernel thread) running. Sleep does put
+ that goroutine at a hold, and restarting it might be an issue
+ <braunr> talking about thread scheduling ? :)
+ <youpi> nlightnfotis: go's runtime doesn't actually destroy kernel threads,
+ apparently
+ <nlightnfotis> youpi: yeah, that's what I have understood so far. And it
+ neither does destroy goroutines. If there was an issue with thread
+ creation, then I guess it should be triggered in the beginning of the
+ program too (seeing as both M's and G's are created there)
+ <nlightnfotis> the fact that it is triggered when a goroutine goes to sleep
+ makes me suspect the context functions
+ <youpi> yes
+ <nlightnfotis> again I am studying it the last days, in search of
+ clues. Will keep you all updated.
+ <nlightnfotis> braunr: I have written my report and it is available here
+ http://www.fotiskoutoulakis.com/blog/2013/08/05/gsoc-partial-week-7-report/
+ If you could read it and tell me if you notice something weird tell me
+ so.
+ <braunr> nlightnfotis: ok
+ <braunr> nlightnfotis: quite busy here so don't worry if i suddenly
+ disappear
+ <braunr> nlightnfotis: hum, does go implement its own threads ??
+ <nlightnfotis> braunr: yeah. It has 2 threads. Runtime managed (the
+ goroutines) and "real" (kernel managed) ones.
+ <braunr> i mean, does it still use libpthread ?
+ <nlightnfotis> thing is none of them "disappear" so as to explain the bug
+ with "thread creation **and** destruction)
+ <nlightnfotis> it must use libpthread for kernel threads as far as creation
+ goes.
+ <braunr> ok, good
+ <braunr> then, it schedules its own threads inside one pthread, right ?
+ <braunr> using the pthread as a virtual cpu
+ <nlightnfotis> yes. It matches kernel threads and runtime threads and runs
+ the kernel threads in reality
+ <nlightnfotis> the scheduler decides which goroutine will run on each
+ kernel thread.
+ <braunr> ew
+ <braunr> this is pretty much non portable
+ <braunr> and you're right to suspect context switching functions
+ <nlightnfotis> yeah my thought for it was the following: thread creation,
+ if it was buggy, should be triggered as soon as a program starts, seeing
+ as at least one kernel thread and at least one go routine starts. My
+ sleep experiment crashes when the goroutine is put on hold
+ <braunr> did you find the code putting on hold ?
+ <nlightnfotis> I will give you the exact link, wait a moment
+ <nlightnfotis> braunr:
+ https://github.com/NlightNFotis/gcc/blob/master/libgo/runtime/time.goc?source=c#L59
+ <nlightnfotis> that is the exact location is line 26, which calls the one I
+ pointed you at
+ <braunr> ahah, tsleep
+ <braunr> old ghost from the past
+ <braunr> nlightnfotis: the real location is probably runtime_park
+ <nlightnfotis> I will check this out.
+
+ <nlightnfotis> may I ask something non-technical but relevant to summer of
+ code?
+ <braunr> sure
+ <nlightnfotis> would it be okay if I took the day off tomorrow?
+ <braunr> nlightnfotis: ask tschwinge but i guess it's ok
+
+ <braunr> have you found runtime_park ?
+ <braunr> i'm downloading your repository from github but it's slow :/
+ <nlightnfotis> braunr: not yet. Grepping through the files didn't produce
+ any meaningful results and github's search is not working
+ <nlightnfotis> braunr: there is that strange thing with th gccgo sources,
+ where I can find a function's declaration but not it's definition. Funny
+ thing is those functions are not really extern, so I am playing a hide
+ and seek game, in which I am not always successful.
+ <nlightnfotis> runtime_park is declared in runtime.h. I have looked nearly
+ everywhere for it. There is only one last place I have not looked at.
+ <nlightnfotis> braunr: I found runtime_park. It's here:
+ https://github.com/NlightNFotis/gcc/blob/master/libgo/runtime/proc.c?source=c#L1372
+
+ <tschwinge> nlightnfotis: Taking the day off is fine. Have fun!
+ <nlightnfotis> tschwinge: I am still here; Thanks for that tschwinge. I
+ will be for the next half hour or something if you would like to ask me
+ anything
+ <tschwinge> nlightnfotis: I have no immediate questions (first have to read
+ your report and discussion in here) -- so feel free to log out and enjoy
+ the sun outside. :-)
+
+ <teythoon> nlightnfotis, tschwinge: btw, have you seen
+ http://morsmachine.dk/go-scheduler ?
+ <nlightnfotis> teythoon: thanks for the link. It's really interesting.
+
+
+# IRC, freenode, #hurd, 2013-08-12
+
+ <nlightnfotis> teythoon did you manage to build the Hurd successfuly?
+ <teythoon> ah yes, the Hurd is relatively easy
+ <teythoon> the libc is hard
+ <nlightnfotis> debian glibc or hurd upstream libc?
+ <teythoon> but my build on darnassus was successful
+ <nlightnfotis> *debian eglibc
+ <teythoon> well, I rebuilt the debian package with two tweaks
+ <nlightnfotis> do you build on linux and rsync on hurd or ...?
+ <teythoon> I built it on Hurd, though I thought about setting up a cross
+ compiler
+ <nlightnfotis> I see. The process was build Mach, build Hurd, and then
+ build glibc and it's ready or it needed more?
+ <teythoon> no, I never built Mach
+ <teythoon> I must admit I'm not sure about the "proper" procedure
+ <teythoon> if I change one of Hurds RPC definitions, I think the proper way
+ is to rebuild the libc against the new definitions and then the Hurd
+ <teythoon> but I found no way to do that, so everyone seems to build the
+ Hurd, install it, build the libc and then rebuild the Hurd again
+ <nlightnfotis> I see. Thanks for that :)
+
+ <nlightnfotis> tschwinge, I have also written my report! It's available
+ here
+ http://www.fotiskoutoulakis.com/blog/2013/08/12/gsoc-week-8-partial-report/
+ <nlightnfotis> I can sum it up if you want me to.
+ <tschwinge> nlightnfotis: I already read it! :-D
+ <tschwinge> Oh, I didn't. I read the week 7 one. Let me read week 8. ;-)
+ <nlightnfotis> ok. I am currently going through the assembly generated for
+ the sample program I have embedded my report.
+ <nlightnfotis> the weird thing is that the assembly generated is pretty
+ much the same for the program with 1 and 2 goroutine functions (with the
+ obvious difference that the one with 2 goroutine functions has 1 more
+ goroutine in it's assembly code)
+ <nlightnfotis> I can not understand why it is that when I have 1 goroutine,
+ an exception is triggered, but when I am having two (which are 99%
+ identical) it seems to be executed.
+ <nlightnfotis> and I do not understand why the exception is triggered when
+ I manually use a goroutine.
+ <nlightnfotis> To my understanding so far, there is at least 1 (kernel)
+ thread created at program startup to run main. The same thread gets
+ created to run a new goroutine (goroutines get associated with kernel
+ threads)
+ <nlightnfotis> and it's obvious from the assembly generated.
+ <nlightnfotis> go_init_main (the main function for go programs) starts with
+ a .cfi_startproc
+ <nlightnfotis> the same piece of code (.cfi_startproc) starts a new kernel
+ thread (on which a goroutine runs)
+ <tschwinge> nlightnfotis: Re your two-goroutines example: in that case I
+ assume, you're directly returning from the main function and the program
+ terminates normally. ;-)
+ <tschwinge> nlightnfotis: Studying the assembly code for this will be too
+ verbose, too low-level. What we need is a trace of steps that happen
+ until the error.
+ <nlightnfotis> tschwinge, that must be it, but it should trigger the bug,
+ since it still has at least one goroutine (and one is known to trigger
+ the bug)
+ <tschwinge> nlightnfotis: I guess the program exits before the first
+ gorouting would be scheduled for execution.
+ <nlightnfotis> the assembly for the goroutines is identical. You can't tell
+ one from the other. The only change is that it has 2 of these sections
+ instead of one
+ <nlightnfotis> actually it's the same for the first one
+ <tschwinge> nlightnfotis: I very much assume that the issue is not due to
+ the code generated by the Go compiler (which you're seeing in the
+ assembly code), but rather due to the runtime code in the libgo library.
+ <nlightnfotis> I didn't think of it this way.
+ <tschwinge> ... that improperly interacts with our libpthread.
+ <nlightnfotis> so my research should focus on the runtime from now on?
+ <tschwinge> Improperly may well imply that our libpthread is at fault, of
+ course, as we discussed.
+ <tschwinge> Back to the one-gouroutine case (that shows the assertion
+ failure). Simple case: one goroutine, plus the "main" thread.
+ <tschwinge> We need to get an understanding of the steps that happen until
+ the error happens.
+ <tschwinge> As this is a parallel problem, and it is involving "advanced"
+ things (such as setcontext), I would not trust GDB too much when used on
+ this code.
+ <nlightnfotis> I will have to manually step through the source myself,
+ right?
+ <tschwinge> What I would do, is add printf's (or similar) into the code at
+ critical points, to get an udnerstanding of what's going on.
+ <tschwinge> Such critical points are: pthread_create, setcontext,
+ swapcontext.
+ <nlightnfotis> It sounds like a good idea. Anything else to note?
+ <tschwinge> That way, you can isolate the steps required to trigger the
+ assertion failure.
+ <tschwinge> For example, it could be something like: makecontext,
+ swapcontext, pthread_creat, boom.
+ <nlightnfotis> pthread_create_internal is failing at an assertion. I wonder
+ what would happen if I remove that assertion.
+ <tschwinge> Not without understanding what the error is, and why it is
+ happening (which steps lead to it). We don't usually do »voodoo
+ computing and programming by coincidence«.
+ <nlightnfotis> tschwinge, I also figured out something. If it is a
+ libpthread issue, it should also get triggered when a simple C program
+ creates a thread (assuming _pthread_create is causing the issue)
+ <nlightnfotis> so maybe I should write a C program to test that
+ functionality and see if it provides any further clues?
+ <tschwinge> nlightnfotis: That's precile what the goal of »isolate the
+ steps required to trigger the assertion failure« is about: reduce the big
+ libgo code to a few function calls required to reproduce the problem.
+ <tschwinge> nlightnfotis: I simple C program just doing pthread_create
+ evidently does not fail.
+ <tschwinge> nlightnfotis: I assume you have a Go program dynamically linked
+ to the libgo you build?
+ <nlightnfotis> yes. To the latest go build from the source (4.9)
+ <nlightnfotis> *gccgo build from source
+ <braunr> removing an assertion is usually extremely bad practice
+ <tschwinge> Then you can just do something like make target-libgo (IIRC)
+ (or instead: cd i686-pc-gnu/libgo/ && make) to rebuild your changed
+ libgo, and then re-run the Go program.
+ <braunr> the thought of randomly removing assertions shouldn't even reach
+ your mind !
+ <nlightnfotis> braunr: even if it is not permanent, but an experiment?
+ <braunr> yes
+ <nlightnfotis> can you explain to me why?
+ <tschwinge> nlightnfotis: <tschwinge> Not without understanding what the
+ error is, and why it is happening (which steps lead to it). We don't
+ usually do »voodoo computing and programming by coincidence«.
+ <braunr> an assertion exists to make sure something that should *never*
+ happen never happens
+ <braunr> removing it allows such events to silently occur
+ <teythoon> braunr: that's the theory, yes, to check invariants
+ <braunr> i dont' know what you mean by using assertions for "an experiment"
+ <teythoon> unfortunately some people use assert for error handling :/
+ <braunr> that's wrong
+ <braunr> and i dont't remember it to be the case in libpthread
+ <braunr> nlightnfotis: can you point the faulting assertion again there
+ please ?
+ <nlightnfotis> braunr: sure: Assertion `({ mach_port_t ktid =
+ __mach_thread_self (); int ok = thread->kernel_thread == ktid;
+ <nlightnfotis> __mach_port_deallocate ((__mach_task_self + 0), ktid); ok;
+ })' failed.
+ <braunr> so basically, thread->kernel_thread != __mach_thread_self()
+ <braunr> this code is run only for num_threads == 1
+ <braunr> but has there been any thread destruction before ?
+ <nlightnfotis> no. To my understanding kernel threads in the go runtime
+ never get destroyed (comments seem to support that)
+ <braunr> IOW: is it certain the only thread left *is* the main thread ?
+ <braunr> hm
+ <braunr> intuitively, i'd say this is wrong
+ <braunr> i'd say go doesn't destroy threads in most cases, but something in
+ the go runtime must have done it already
+ <braunr> i'm not even sure the main thread still exists
+ <braunr> check that
+ <braunr> where is the go code you're working on ?
+ <nlightnfotis> there are 3 files of interest
+ <braunr> i'd like the whole sources please
+ <nlightnfotis> I will find it in a moment
+ <tschwinge> braunr: GCC Git clone, tschwinge/t/hurd/go branch.
+ <nlightnfotis> it is <gcc_root>/libgo/runtime/runtime.h
+ <nlightnfotis> it is <gcc_root>/libgo/runtime/proc.c
+ <braunr> tschwinge: thanks
+ <tschwinge> braunr: git://gcc.gnu.org/git/gcc.git
+ <nlightnfotis> I will provide links on github
+ <braunr> nlightnfotis: i sayd the whole sources, why do you insist on
+ giving me separate files ?
+ <nlightnfotis> for checking it out quickly
+ <nlightnfotis> oh I misunderstood that sorry
+ <nlightnfotis> thought you wanted to check out thread creation and
+ destruction and that you were interested only in those specific files
+ <braunr> tschwinge: is it completely contained there or are there external
+ libraries ?
+ <tschwinge> braunr: You mean libgo?
+ <braunr> tschwinge: possibly
+ <nlightnfotis> tschwinge, I just made sure that yeah programs are
+ dynamically linked against the compiler's libgo
+ <nlightnfotis> libgo.so.3
+ <braunr> does libgo come from gcc sources ?
+ <nlightnfotis> yeah
+ <braunr> ok
+ <nlightnfotis> go files on gcc sources are split under two directories: go,
+ which contains the frontend go, and libgo which contains the libraries
+ and the runtime code
+ <tschwinge> braunr: darnassus:~tschwinge/tmp/gcc/go.build/ is a recent
+ build, with sources in $PWD/../go/.
+ <tschwinge> braunr: libgo is in i686-unknown-gnu0.3/libgo/.libs/
+ <nlightnfotis> so tschwinge to roundup for this week I should print debug
+ around the "hotspots" and see if I can extract more information about
+ where the specific problem is triggered right?
+ <tschwinge> nlightnfotis: Yes, for a start.
+ <braunr> nlightnfotis: identify the main thread, make sure it doesn't exit
+ <nlightnfotis> noted.
+ <nlightnfotis> braunr: do you have an idea about the issue I described
+ earlier? The one with the 1 goroutine triggering the bug, but the 2
+ exiting successfully but with no output?
+ <braunr> nlightnfotis: i didn't read
+ <nlightnfotis> do you have 2 mins to read my report? I describe the issue
+ <braunr> something messed up in the context i suppose
+ <tschwinge> nlightnfotis: Uhm, I already explained that issue?
+ <braunr> you did ?
+ <nlightnfotis> tschwinge, I know, don't worry. I am trying to get all the
+ insight I can get.
+ <nlightnfotis> you mentioned that the scheduler might have an issue and
+ that the main thread returns before the goroutines execu
+ <nlightnfotis> *execute
+ <nlightnfotis> right?
+ <tschwinge> It is the normal thing for a process to terminate normally when
+ the main function returns. I would expect Go to behave the same way.
+ <braunr> "Now, if we change one of the say functions inside main to a
+ goroutine, this happens"
+ <braunr> how do you change it ?
+ <tschwinge> Or am I confused?
+ <braunr> tschwinge: i don't remember exactly
+ <nlightnfotis> braunr: from say("world") to go say("world")
+ <nlightnfotis> tschwinge, yeah I get that. What I still have not understood
+ is what is it specifically about the 2 goroutines that doesn't trigger
+ the issu when 1 goroutine does.
+ <nlightnfotis> You said that it might have something to do with the
+ scheduler; it does seem like a good explanation to me
+ <tschwinge> nlightnfotis: My understanding still is that the goroutinges
+ don't get executed before the main thread exits.
+ <braunr> which scheduler ?
+ <nlightnfotis> braunr: the runtime (go) scheduler.
+ <nlightnfotis> tschwinge, Yeah, they don't. But still, with 1 goroutine:
+ you get into main, attempt to execute it, and bam! With two, it should be
+ the same, but strangely it seems to exit main without an issue
+ <nlightnfotis> (attempt to execute the goroutine)
+ <braunr> why should it be the same ?
+ <nlightnfotis> braunr: seeing as one goroutine has problems, I can't see
+ why two wouldn't. At least one of the two should result in an exception.
+ <braunr> nlightnfotis: why ?
+ <braunr> nlightnfotis: they do have the problem
+ <braunr> they don't run
+ <braunr> they just don't run into that assertion, probably because there is
+ more than one thread
+ <nlightnfotis> wait a minute. You imply that they fail silently? But still
+ end up in the same situation
+ <braunr> yes
+ <braunr> in which case it does look like a go scheduler problem
+ <nlightnfotis> if I understood it correctly, that assertion fails when it
+ is only 1 thread?
+ <braunr> yes
+ <braunr> and since the main thread is always correct, i expect the main
+ thread has exited
+ <braunr> which this happens because the one thread left is *not* the main
+ thread
+ <braunr> (which is a libpthread bug)
+ <braunr> but it's a bug we've not seen because we don't have applications
+ creating threads while exiting
+ <nlightnfotis> I think I got it now.
+ <braunr> try to put something like getchar() in your go program
+ <braunr> something that introduces a break
+ <braunr> so that the main thread doesn't exit
+ <nlightnfotis> oh right. Thanks for that. And sorry tschwinge I reread what
+ you said, it seems I had misinterpreted what you suggested.
+ <tschwinge> braunr: If you're interested: for a Go program triggering the
+ asserition, I don't see any thread exiting (see
+ darnassus:~tschwinge/tmp/gcc/a.go, run: cd ~tschwinge/tmp/gcc/go.build/
+ && ./a.out) -- but perhaps I've been looking for the wrong things in l_.
+ File l is without a goroutine. Have to leave now, sorry.
+ <tschwinge> braunr: If you want to rebuild: gcc/gccgo -B gcc -B
+ i686-unknown-gnu0.3/libgo ../a.go -Li686-unknown-gnu0.3/libgo/.libs
+ -Wl,-rpath,i686-unknown-gnu0.3/libgo/.libs
+ <braunr> tschwinge: no i won't touch anything
+ <braunr> but thanks
+
+
+# IRC, freenode, #hurd, 2013-08-19
+
+ <youpi> nlightnfotis: how are you going with gcc go?
+ <nlightnfotis> I was print debugging all the week.
+ <nlightnfotis> I can tell you I haven't noticed anything weird so far.
+ <nlightnfotis> But I feel I am close to the solution
+ <nlightnfotis> I have not written my report yet.
+ <nlightnfotis> I will write it maximum until wednesday
+ <nlightnfotis> I hope I will have figured it all out until then
+ <pinotree> a report is not for writing solutions, but for the progress
+ <youpi> yes
+ <youpi> it's completely fine to be saying "I've been debugging, not found
+ anything yet"
+ <pinotree> results or not, always write your reports on time, so your
+ mentor(s) know what you are doing
+ <nlightnfotis> I see. Would you like me to write it right now, or is it
+ okay to write it a day or two later?
+ <hacklu__> nlightnfotis: FYI. this week my report is not finished. just
+ state some problem I face now.
+ <youpi> nlightnfotis: I'd say better write it now
+ <nlightnfotis> youpi: Ok I will write it and tell you when I am done with
+ it.
+ <nlightnfotis> youpi: here is my partial report describing what my course
+ of action looked like this
+ week. http://www.fotiskoutoulakis.com/blog/2013/08/19/gsoc-week-9-partial-report/
+ <nlightnfotis> of course, I will write in a day or two (hopefully having
+ figured out the whole situation) an exhaustive report describing
+ everything I did in detail
+ <nlightnfotis> youpi: I have written my (partial) report describing how I
+ went about this week
+ http://www.fotiskoutoulakis.com/blog/2013/08/19/gsoc-week-9-partial-report/
+ <youpi> nlightnfotis: good, thanks!
+ <nlightnfotis> youpi: please note that this is not an exhaustive link of my
+ findings or course of action, it merely acts as an example to demonstrate
+ the way I think and how I go about every day.
+ <nlightnfotis> I will write an exhaustive report of everything I did so
+ far, when I figure out what the issue is, and I feel I am close.
+ <youpi> well, you don't need to explain all bits in details
+ <youpi> this is fine to show an example of how you went
+ <youpi> but please also provide a summary of your other findings
+ <nlightnfotis> oh okay, I will keep this in mind. :)
+
+
+# IRC, freenode, #hurd, 2013-08-22
+
+ < nlightnfotis> if I want to rebuild libpthread, I have to embed it into
+ eglibc's source, then build?
+ < pinotree> or pick the debian sources, patch libpthread there and rebuild
+ < nlightnfotis> that's most likely what I am going to do. Thanks pinotree.
+ < pinotree> yw
+ < braunr> nlightnfotis: i usually add my patches on top of the debian glibc
+ ones, yes
+ < braunr> it requires some tweaking
+ < braunr> but it's probably the easiest way
+ < nlightnfotis> braunr: I was studying my issues with gcc, and everyday I
+ was getting more and more confident it must be a libpthread issue
+ < nlightnfotis> and I figured out, that I might wanna play with libpthread
+ this time
+ < braunr> it probably is but
+ < braunr> i'm not so sure you should dive there
+ < nlightnfotis> why not?
+ < braunr> because it can be worked around in go
+ < braunr> i had a test for you last time
+ < braunr> do you remember what it was ?
+ < nlightnfotis> nope :/ care to remind it?
+ < braunr> iirc, it was running the go test you did but with an additional
+ instruction in the main function, that pauses
+ < braunr> something like getchar() in c
+ < braunr> to make sure main doesn't exit while the goroutines are still
+ running
+ < braunr> i'm almost positive that the bug you're seeing is main returning
+ and libpthread beleiving it's acting on the main thread because there is
+ only one left
+ < nlightnfotis> oh that's easy, I can do it now. But it's probably what
+ thomas had suggested: go routines may not be running at all.
+ < braunr> they probably aren't
+ < braunr> and that's a context bug
+ < braunr> not a libpthread bug
+ < braunr> and that's what you should focus on
+ < braunr> the libpthread bug is minor
+ < nlightnfotis> which is strange, because I had studied the assembly code
+ and it the code for the goroutine was there
+ < nlightnfotis> anyway I will proceed with what you suggested
+ < braunr> yes please
+ < braunr> that's becoming important
+ < nlightnfotis> would you mind me dumping some of my findings for you to
+ evaluate/ post on opinion on?
+ < braunr> no
+ < braunr> please do so
+ < nlightnfotis> I have found that the go runtime starts with a total number
+ of threads == 1
+ < braunr> nlightnfotis: as all processes
+ < nlightnfotis> I would guess that's because of using fork ()
+ < nlightnfotis> oh so it's ok
+ < braunr> there always is a main thread
+ < braunr> even for non-threaded applications
+ < nlightnfotis> yeah, that I know. The runtime proceeds to create
+ immediately one more.
+ < braunr> then it's 2
+ < nlightnfotis> and that's ok, it doesn't have an issue with that
+ < nlightnfotis> yep
+ < nlightnfotis> the issue begins when it tries to create the 3rd one
+ < braunr> hum
+ < braunr> from what i remember
+ < nlightnfotis> it happily goes through the go runtime's kernel thread
+ allocation function (runtime_newm())
+ < braunr> you also had an issue with the first goroutine
+ < nlightnfotis> that's with 1 go routine
+ < braunr> ok
+ < braunr> so 1 goroutine == 3 threads
+ < nlightnfotis> it seems so yes.
+ < braunr> depending on how the go scheduler is able to assign goroutines to
+ kernel threads i suppose
+ < nlightnfotis> mind you, (disclaimer: I am not so sure about that) that go
+ must be using one extra thread for the runtime scheduler and garbage
+ collector
+ < braunr> that's ok
+ < nlightnfotis> so that's where the two come from
+ < braunr> and expected from a modern runtime
+ < nlightnfotis> the third must be the go routime
+ < nlightnfotis> routine
+ < braunr> hum have to go
+ < braunr> brb in a few minutes
+ < braunr> keep posting
+ < nlightnfotis> it's ok take your time
+ < nlightnfotis> I will be here
+ < braunr> but i may not ;p
+ < braunr> in fact i will not
+ < braunr> i have like 15 mins ;)
+ < braunr> nlightnfotis: ^
+ < nlightnfotis> I am trying what you told me to do with go
+ < nlightnfotis> it's ok if you have to go, I will continue investigating
+ and be back tomorrow
+ < braunr> ok
+ < nlightnfotis> braunr: I tried what you asked me to do, both we waiting to
+ read a string from stdin and with waiting to read an int from stdin
+ < nlightnfotis> it never waits, it still aborts with the assertion failure
+ < nlightnfotis> both with one and two go routines
+ < nlightnfotis> dumping it here just for the log, running the same code
+ without waiting for input results in two threads created (1 for main and
+ 1 for runtime, most likely) and "normal" execution.
+ < nlightnfotis> normal as in no assertion failure,
+ < nlightnfotis> it seems to skip the goroutines altogether
+
+
+# IRC, freenode, #hurd, 2013-08-23
+
+ < braunr> nlightnfotis: can i see your last go test code please ? the one
+ with the read at the end of main
+ < nlightnfotis> braunr sure
+ < nlightnfotis> sorry I had gone to the toilet, now I am back
+ < nlightnfotis> I will send it right now
+ < nlightnfotis> braunr: http://pastebin.com/DVg3FipE
+ < nlightnfotis> it crashes when it attempts to create the 3rd thread (the
+ 1st goroutine), with the assertion fail
+ < nlightnfotis> if you remove the Scanf it will not fail, return 0, but
+ only create 2 threads (skip the goroutines alltogether)
+ < braunr> can you add a print right before main exits please ?
+ < braunr> so we know when it does
+ < nlightnfotis> doing it now
+ < nlightnfotis> braunr: If I enter a print statement right before main
+ exits, the assertion failure is triggered. If I remove it, it still runs
+ and creates only 2 threads.
+ < braunr> i don't understand
+ < braunr> 14:42 < nlightnfotis> it crashes when it attempts to create the
+ 3rd thread (the 1st goroutine), with the assertion fail
+ < braunr> why don't you get that ?
+ < nlightnfotis> This seems like having to do with the runtime. I mean, I
+ have seen the emitted assembly from the compiler, and the goroutines are
+ there. Something in the runtime must be skipping them
+ < braunr> context switching seems buggy
+ < nlightnfotis> if it's only goroutines in main
+ < nlightnfotis> if there's also something else in main, the assertion
+ failure is triggered.
+ < braunr> i want you to add a printf right before main exits, from the code
+ you pasted
+ < nlightnfotis> I did. It acts the same as before.
+ < braunr> do you see that last printf ?
+ < nlightnfotis> no. It aborts before that
+ < nlightnfotis> :q
+ < braunr> find a way to make sure the output buffer is flushed
+ < braunr> i don't know how it's done in go
+ < nlightnfotis> mistype the :q, was supposed to do it vim
+ < nlightnfotis> braunr will do right away
+ < nlightnfotis> there is one thing I still can not understand: Why is it
+ that two threads are ok, but when the next is going to get created, the
+ assertion is triggered.
+ < braunr> nlightnfotis: the assertion is triggered because a thread is
+ being created while there is only one thread left, and this thread isn't
+ the main thread
+ < braunr> so basically, the main thread has exited, and another (the last
+ one) is trying to create one
+ < nlightnfotis> the other one might be the runtime I guess. Let me check
+ out quickly what you suggested
+ < braunr> the main thread shouldn't exit at all
+ < braunr> so something with context switching is wrong
+ < nlightnfotis> the thing is: it doesn't seem to exit when this happens. My
+ debug statements (in the runtime) suggest that there are at least 2
+ threads active, kernel threads don't get destroyed in gccgo
+ < braunr> 14:52 < braunr> so something with context switching is wrong
+ < braunr> how well have the context switching functions been tested ?
+ < nlightnfotis> to be honest I have not tested them; up until this point I
+ trusted they worked. Should I also take a look at them?
+ < braunr> how can you trust them ?
+ < braunr> they've never been used ..
+ < braunr> thomas added them recently if i'm right
+ < braunr> nothing has been using them except go
+ < braunr> piece of advice: don't trust anything
+ < nlightnfotis> I think they were in before, and thomas recently patched
+ them!
+ < braunr> they were in, but didn't work
+ < braunr> (if i'm right)
+ < braunr> nlightnfotis: you could patch libpthread to monitor the number of
+ threads
+ < braunr> or the go runtime, idk
+ < nlightnfotis> I have done so on the go runtime
+ < nlightnfotis> that's where I am getting the number of threads I
+ report. That's straight out from the scheduler's count.
+ < braunr> threads can exit by calling pthread_exit() or returning from the
+ thread routine
+ < braunr> make sure you catch both
+ < braunr> also check for pthread_cancel(), although i don't expect any in
+ go
+ < nlightnfotis> braunr: Should I really do that? I mean, from what I can
+ see in gccgo's comments, Kernel threads (m) never go away. They are added
+ to a pool of m's waiting for work if there is no goroutine running on
+ them
+ < nlightnfotis> I mean, I am not so sure they exit at all
+ < braunr> be sure
+ < braunr> point me the code please
+ < nlightnfotis>
+ https://github.com/NlightNFotis/gcc/blob/master/libgo/runtime/proc.c#L224
+ < nlightnfotis> this is where it get's stated that m's never go away
+ < nlightnfotis> and at line 257 you can see the pool
+ < nlightnfotis> and wait for me to find the code that actually releases an
+ and places into the pool
+ < nlightnfotis> yep found it
+ < nlightnfotis> line 817 mput
+ < nlightnfotis> puts a kernel thread given as parameter to the pool
+ < nlightnfotis> another proof of the theory is at line 1177. It states:
+ "This point is never reached, because scheduler does not release os
+ threads at the moment."
+ < braunr> fetching git repository, bit busy, i'll have a look in 5-10 mins
+ < nlightnfotis> oh it's ok, I had pointed you to the file directly on
+ github to check it out instantly, but never mind, the file is
+ <gccroot>/libgo/runtime/proc.c
+ < braunr> damn github is so slow ..
+ < braunr> nlightnfotis: i much prefer my own text interface :)
+ < nlightnfotis> braunr: just out of curiosity what's your setup? I use vim
+ mainly (not that I am a vim expert or anything, I only know the basics,
+ but I love it)
+ < braunr> same
+ < braunr> nlightnfotis: add a trace at that comment to make SURE threads do
+ not exit
+ < braunr> you *cannot* get the libpthread assertion with more than 1 thread
+ < braunr> grep for pthread_exit() too
+ < nlightnfotis> will do it now. It will take about an hour to compile
+ though.
+ < braunr> i don't understand the stack trick at the start of runtime_mstart
+ < braunr> ah splitstack ..
+ < nlightnfotis> I think I should try cross compiling gcc, and then move
+ files on the hurd. It would be so much faster I believe.
+ < braunr> than what ?
+ < nlightnfotis> building gcc on the hurd
+ < nlightnfotis> I remember it taking about 10minutes with make -j4 on the
+ host
+ < nlightnfotis> it takes 45-50 minutes on the vm (kvm enabled)
+ < braunr> but you can merely rebuild the files you've changed
+ < nlightnfotis> I feel stupid now...
+ < braunr> nlightnfotis: have you tried setting GOMAXPROCS to 1 ?
+ < nlightnfotis> not really, but from what I know GOMAXPROCS defaults to 1
+ if not set
+ < braunr> again, check that
+ < braunr> take the habit of checking things
+ < nlightnfotis> braunr: yeah sorry for that. I have checked these things
+ out before they don't come out of my head I just don't remember exactly
+ where I had seen this
+ < braunr> what you can also do is use gdb to catch the assertion and check
+ the number of threads at that time, as well as the number of threads as
+ seen by libpthread
+ < nlightnfotis> braunr: line 492 file proc.c: runtime_gomaxprocs = 1;
+ < braunr> also see runtime.LockOSThread
+ < braunr> to make sure the main thread is locked to its own pthread
+ < nlightnfotis> I can see in line 529 of the same file that the first
+ thread is getting locked
+ < nlightnfotis> the new threads that get initialised are non main threads
+ < braunr> if(!runtime_sched.lockmain) runtime_UnlockOSThread();
+ < braunr> i'm suggesting you set runtime_sched.lockmain
+ < braunr> so it remains true for the whole execution
+ < braunr> this code looks like a revamp of plan9 lol
+ < nlightnfotis> it is
+ < nlightnfotis> in the paper from Ian Lance Taylor describing gccgo he
+ states somewhere that the original go compilers (the 3gs) are a modified
+ version of plan9's C compiler, and that gccgo tries to follow them
+ < nlightnfotis> they differ in a lot of ways though
+ < nlightnfotis> the 3gs generate a lot of code during link time
+ < nlightnfotis> gccgo follows the standard gcc procedures
+ < braunr> eh :D
+ < nlightnfotis> go -> gogo -> generic -> gimple -> rtl -> object
+ < nlightnfotis> that's how it flows as far as I recall
+ < nlightnfotis> gogo is an internal representation of go's structure inside
+ the gccgo frontend
+ < nlightnfotis> that's why you see many functions with gogo in their name
+ < nlightnfotis> I just revisited the paper: gogo is there to make it easy
+ to implement whatever analysis might seem desirable. It mirrors however
+ the Go source code read from the input files
+ < braunr> nlightnfotis: what are you trying now ?
+ < nlightnfotis> I am basically studying the runtime's source code while
+ waiting for gccgo to compile on the Hurd
+ < nlightnfotis> yes I did the stupid whole recompilation again. :/
+ < braunr> nlightnfotis: compile for what ?
+ < braunr> what test ?
+ < nlightnfotis> to check out to see if M's really are added to the pool
+ instead of getting deleted
+ < braunr> nlightnfotis: but how ?
+ < nlightnfotis> braunr: I have added a statement in mput if we get there
+ first, and secondly the number of threads that the runtime scheduler
+ knows that are waiting (are in the pool of m's waiting for work)
+ < braunr> ok
+ < braunr> when you can, i'd really like you to do this test :
+ < braunr> 15:55 < braunr> what you can also do is use gdb to catch the
+ assertion and check the number of threads at that time, as well as the
+ number of threads as seen by libpthread
+ < nlightnfotis> the number of threads required by libpthread is gonna need
+ me to recompile the whole eglibc right?
+ < braunr> no
+ < braunr> just print it with gdb
+ < nlightnfotis> oh, ok
+ < braunr> it's __pthread_num_threads
+ < nlightnfotis> is gdb reliable? I remember thomas telling me that I can't
+ trust gdb at this point in time
+ < braunr> and also __pthread_total
+ < braunr> really ?
+ < braunr> i don't see why not :/
+ < braunr> youpi: any idea about what nlightnfotis is speaking of ?
+ < nlightnfotis> I may have misunderstood it; don't take it by heart
+ < nlightnfotis> I don't wanna put words in other people's mouths because I
+ misunderstood something
+ < braunr> sure
+ < braunr> that's my habit to check things
+ < youpi> braunr: nope
+ < braunr> youpi: and am i right when i say we don't use context functions
+ on the hurd, and they're likely to be incomplete, even with the recent
+ changes from thomas ?
+ < braunr> (mcontext, ucontext)
+ < nlightnfotis> braunr: this is what had been said: 08:46:30< tschwinge> As
+ this is a parallel problem, and it is involving "advanced" things (such
+ as setcontext), I would not trust GDB too much when used on this code.
+ < pinotree> if thomas' changes were complete and polished, i guess he would
+ have sent them upstream already
+ < braunr> i see but
+ < braunr> you can normally trust gdb for global variables
+ < nlightnfotis> Didn't post it as an objection; I posted it because I felt
+ bad putting the wrong words on other people's mouths, as I said
+ before. So I posted his original comment which was more authoritative
+ than my interpretation of it
+ < braunr> i wonder if there is a tunable to strictly map one thread to one
+ goroutine
+ < braunr> nlightnfotis: more focus on the work, less on the rest please
+ < nlightnfotis> Did I do something wrong?
+ < braunr> you waste too much time apologizing
+ < braunr> for no reason
+ < braunr> nlightnfotis: i suppose you don't use splitstack, right ?
+ < nlightnfotis> no I didn't
+ < nlightnfotis> and here's something interesting: The code I just added, in
+ mput, to see if threads are added in the pool. It's not there, no matter
+ what I run
+ < nlightnfotis> So it seems that we the runtime is not reaching mput.
+ < nlightnfotis> Could this be normal behavior? I mean, on process
+ termination just release the resources so mput is skipped?
+ < braunr> i don't know the code well enough to answer that
+ < braunr> check closer to the lower interface
+
+
+# IRC, freenode, #hurd, 2013-08-25
+
+ < nlightnfotis> braunr: what is initcontext supposed to be doing?
+ < braunr> nlightnfotis: didn't look
+ < braunr> i'll take a look later
+ < nlightnfotis> braunr: I am buffled by it. It seems to be doing nothing on
+ the Hurd branch and nothing in the Linux branch either. Why call a
+ function that does nothing? (it doesn't only seem to do nothing, I have
+ confirmed it)
+ < nlightnfotis> youpi: I was wondering if you could explain me
+ something. What is the initcontext function supposed to be doing?
+ < youpi> you mean initcontext ?
+ < nlightnfotis> yes
+ < youpi> ergl
+ < youpi> you mean makecontext?
+ < nlightnfotis> no initcontext. I am faced with this in the goruntime. It's
+ called in it, but it is doing nothing. Neither in the Hurd tree, nor in
+ the Linux one
+ < youpi> I don't know what initcontext is
+ < youpi> where do you read it?
+ < nlightnfotis> youpi: let me show you
+ < nlightnfotis>
+ https://github.com/NlightNFotis/gcc/blob/fotisk/goruntime_hurd/libgo/runtime/proc.c#L80
+ < nlightnfotis> and it is called in quite a few places
+ < youpi> it's not doing nothing, see other implementations
+ < pinotree> if SETCONTEXT_CLOBBERS_TLS is not defined, initcontext and
+ fixcontext do nothing
+ < pinotree> otherwise (presuming if setcontext clobbers tls) there are two
+ implementations for solaris/x86_64 and netbsd
+ < youpi> I don't think we have the tls clobber bug
+ < youpi> so these functions being empty is completely fine
+ < nlightnfotis> pinotree: oh, you mean it's used as a workaround for these
+ two systems only?
+ < youpi> yes
+ < pinotree> yes
+ < nlightnfotis> That makes sense. Thanks both of you for the help :)
+ < nlightnfotis> youpi: if this counts as some progress, I have traced the
+ exact bootstrapping sequence of a new go process. I know a good deal of
+ what is done from it's spawn to it's end. There are some things I wanna
+ sort out, and later tonight I will write my report for it to be ready for
+ tomorrow.
+ < youpi> good
+
+
+# IRC, freenode, #hurd, 2013-08-26
+
+ < nlightnfotis> Hi everyone, my report is here
+ http://www.fotiskoutoulakis.com/blog/2013/08/26/gsoc-week-10-report/
+ < youpi> nlightnfotis: you should clearly put printfs inside libpthread
+ < youpi> to check what is happening with the ktids
+ < nlightnfotis> youpi: yep, that's my next course of action. I just want to
+ spend some more time in the go runtime to make sure that I understand the
+ flow perfectly, and to make sure that it is not the runtime's fault
+ < braunr> nlightnfotis: did you try gdb to print the number of threads ?
+ < youpi> nlightnfotis: to build it, the easiest way is to start building
+ eglibc, and when you see it compiling C files (i.e. run i486-gnu-gcc-4.7
+ etc.)
+ < youpi> stop it
+ < youpi> and go into build/hurd-i386-libc, and run "make others" from there
+ < nlightnfotis> braunr: that was my plan for today or tomorrow :)
+ < braunr> start building *debian* glibc
+ < youpi> there's perhaps some way to only build libpthread, but I don't
+ remember
+ < braunr> nlightnfotis: ok
+ < braunr> youpi: i suggested he tried gdb first
+ < youpi> why not
+ < braunr> if you need quick glibc builds, you can use darnassus
+ < nlightnfotis> braunr: how much time on average should I expect it to
+ take?
+ < youpi> it highly depends on the machine
+ < youpi> it can be hours
+ < youpi> or a few minutes
+ < youpi> depending you already have a built tree, a fast disk, etc.
+ < braunr> make lib others on darnassus takes around 30 minutes
+ < braunr> a complete dpkg-buildpackage from fresh sources takes 5-6 hours
+ < braunr> make others from a built tree is very quick
+ < braunr> a few minutes at most
+ < braunr> nlightnfotis: i don't see any trace of thread exiting in your
+ report, is that normal ?
+ < nlightnfotis> yeah, I guess, since they don't exit prematurely, they are
+ released along with other resources at the process' exit
+ < braunr> i'll rephrase
+ < braunr> you said last time that you saw a function never got called
+ < braunr> i assumed it was because a thread exited prematurely
+ < nlightnfotis> oh I sorted it out with the help of youpi and pinotree
+ yesterday
+ < braunr> that's different
+ < braunr> i'm not talking about the function that does nothing
+ < braunr> i'm talking about the one never called
+ < nlightnfotis> oh, go on then,
+ < braunr> i don't remember its name
+ < braunr> anyway
+ < nlightnfotis> abort()?
+ < braunr> i hope abort doesn't get called :)
+ < nlightnfotis> it doesn't
+ < braunr> i thought it was the one right before
+ < braunr> what i mean is
+ < nlightnfotis> oh runtime_mstart, it does get called
+ < braunr> add traces at thread exit points
+ < nlightnfotis> I sorted it out too
+ < braunr> make *sure* threads don't exit
+ < nlightnfotis> it get's called to start the kernel thread created at
+ process spawn at the runtime_schedinit
+ < braunr> if they really don't, it's probably a context/tls issue
+ < nlightnfotis> I will do this right now.
+ < nlightnfotis> braunr: if it's a context/tls issue it's libpthread's
+ problem?
+
+
+# IRC, freenode, #hurd, 2013-09-02
+
+ <nlightnfotis> Hello! My report for this week is online:
+ http://www.fotiskoutoulakis.com/blog/2013/09/02/gsoc-week-11-report/
+ <braunr> nlightnfotis: there always is a signal thread in every hurd
+ program
+ <braunr> nlightnfotis: i also pointed out that there are two variables
+ involved in counting threads in libpthread, the other one being
+ __pthread_num_threads
+ <braunr> again, more attention to work and details, less showmanship
+ <braunr> i'm tired of repeating it
+ <youpi> nlightnfotis: doesn't backtrace work in gdb to tell you what
+ 0x01da48ec is?
+ <youpi> also, do you have libc0.3-dbg installed?
+ <nlightnfotis> braunr: __pthread_num_threads reports is 4.
+ <braunr> then why isn't it in your report ?
+ <braunr> it's acceptable that you overlook it
+ <nlightnfotis> and youpi: yeah I have got the backtrace, but 0x01da48ec is
+ ?? () from /lib/i386-gnu/libc.so.3
+ <braunr> it's NOT when someone else has previously mentioned it to you
+ <youpi> nlightnfotis: only that line, no other line?
+ <nlightnfotis> it has 8 more youpi, the one after ?? is mach_msg ()
+ form/lib/gni386-gnu/libc.so.0.3
+ <braunr> yes mach_msg
+ <braunr> almost everything ends up in mach_msg
+ <youpi> you should probably pastebin somewhere the output of thread apply
+ all bt
+ <braunr> what's before that ?
+ <nlightnfotis> braunr: I don't know how I even missed it. I skimmed through
+ the code and only found __pthread_total and assumed that it was the total
+ number of threads
+ <braunr> nlightnfotis: i don't know either
+ <braunr> take notes
+ <nlightnfotis> before mach_msg ins __pthread_timedblock () from
+ /lib/i386-gnu/libpthread.so.0.3
+ <nlightnfotis> I will add it to pastebin in a second
+ <braunr> i find it very disappointing that after several weeks blocking on
+ this, despite all the pointers you've been given, you still haven't made
+ enough progress to reach the context switching functions
+ <braunr> last week, most progress was made when we talked together
+ <braunr> then nothing
+ <braunr> it seems that you disappear, apparently searching on your own
+ <braunr> but for far too long
+ <nlightnfotis> braunr: I do search on my own, yes,
+ <braunr> almost like exploiting being blocked not to make progress on
+ purpose ...
+ <braunr> but too much
+ <nlightnfotis> braunr: I am not doing this on purpose, I believe you are
+ unfair to me. I am trying to make as much progress as I can alone, and
+ reach out only when I can't do much more alone
+ <braunr> then why is it only now that we get replies to questions such as
+ "how much is __pthread_num_threads" ?
+ <braunr> why do you stop discussions for almost a week, just to find
+ yourself blocked again ?
+ <nlightnfotis> I was working on gcc, going through the runtime making sure
+ about assumptions and going through various other goroutine or not
+ programs through gdb
+ <braunr> that doesn't take a week
+ <braunr> clearly not
+ <braunr> last time we talked was
+ <braunr> 10:40 < nlightnfotis> braunr: if it's a context/tls issue it's
+ libpthread's problem?
+ <nlightnfotis> it did for me... honestly, what is it you believe I am doing
+ wrong? I too am frustrated by my lack of progress, but I am doing my best
+ <braunr> august 26
+ <nlightnfotis> yeah, I wanted to make sure about certain assumptions on the
+ gcc side. I don't want to start hacking on libpthread only to see that it
+ might have been something I msissed on the gcc side
+ <braunr> i told you
+ <braunr> it's probably not a libpthread issue
+ <braunr> the assertion is
+ <braunr> but it's minor
+ <braunr> it's not the realy problem, only a side effect
+ <braunr> i told you about __pthread_num_threads, why didn't you look at it
+ ?
+ <braunr> i told you about context switching functions, why nothing about it
+ ?
+ <braunr> doing a few printfs to check numbers and using gdb to check them
+ at break points should be quick
+ <braunr> when we talk,ed we had the results in a few minutes
+ <nlightnfotis> yeah, because I was guided, and that helped me target my
+ research. On my own things are quite different. I find out something
+ about gcc's behavior, then find out I need tons more information, and I
+ have a lot of things that I need to research to confirm any assumptions
+ from my side
+ <braunr> how did you miss the signal thread ?
+ <braunr> we even talked about it right here with hacklu
+ <braunr> i'll say it again
+ <braunr> if blocked more than one day, ask for help
+ <braunr> 2 days minimum each time is just too long
+ <nlightnfotis> I'm sorry. I will be online every day from now on and report
+ every 10 minutes, on my course of actions.
+ <nlightnfotis> I recognise that time is off the essence at this point in
+ time
+ <braunr> it's also NO
+ <braunr> NO
+ <braunr> *SIGH*
+ <hacklu> nlightnfotis: calm down. braunr just want to help you solve
+ problem quickly.
+ <braunr> 10 minutes is the other extreme
+ <hacklu> nlightnfotis: in my experiecence, if something block me, I will
+ keep asking him until I solve the problem.
+ <braunr> it's also very frustrating to see you answer questions quickly
+ when you're here, then wait days for unanswered questions that could have
+ taken little time if you kept being here
+ <braunr> this just gives the impression that you're doing something else in
+ parallel that keeps you busy
+ <braunr> and comfort me in believing you're not being serious enough
+ aboutit
+ <nlightnfotis> yeah, I understand that it gives that impression. The only
+ thing I can tell you now, is that I am *not* doing something else in
+ parallel. I am only trying to demonstrate some progress alone, and when
+ working alone things for me take quite some more time than when I am
+ guided
+ <braunr> hacklu: i'm actually the nervous one here
+ <nlightnfotis> braunr: ok, I understand I have dissapointed you. What would
+ you suggest me to do from now on?
+ <hacklu> braunr: :)
+ <braunr> manage your time correctly or you'll fail
+ <braunr> i'm not the main mentor of this project so it's not for me to
+ decide
+ <braunr> but if i were, and if i had to wait again for several days before
+ any notice of progress or blocking, i wouldn't even wait for the end of
+ the gsoc
+ <braunr> you're confronted with difficult issues
+ <braunr> tls, context switching, thread
+ <braunr> ing
+ <braunr> they're all complicated
+ <braunr> unless you're very experienced and/or gifted, don't assume you can
+ solve it on your own
+ <braunr> and the biggest concern for me is that it's not even the main
+ focus of your project
+ <braunr> you should be working on go
+ <braunr> on porting
+ <braunr> any side issues should be solved as quickly as possible
+ <braunr> and we're now in september ...
+ <nlightnfotis> go is working quite alright. It's goroutines that have
+ issues.
+ <braunr> nlightnfotis: same thing
+ <braunr> goroutines are part of go as far as i'm concerned
+ <braunr> and they're working too, something in the hurd isn't
+ <braunr> so it's a side issue
+ <braunr> you're very much entitled to ask as much help as you need for side
+ issues
+ <braunr> and i strongly feel you didn't
+ <nlightnfotis> yeah, you're right. I failed on that aspect, mainly because
+ of the way I work. I wanted to show some progress on my own, and not be
+ here and spam all day. I felt that spamming questions all day would
+ demonstrate incompetence from my side
+ <nlightnfotis> and I wanted to show that I am capable of solving my
+ problems on my own.
+ <braunr> well, in a sense it does, but that's not the skills we were
+ expecting from you so it's perfectly ok
+ <braunr> nlightnfotis: no development group, even in companies, in their
+ right mind, would expect you to grasp the low level dark details of an
+ operating system implementation in a few weeks ...
+ <nlightnfotis> braunr: ok, may I ask what you suggest to me that my next
+ course of action is?
+ <braunr> let me see
+ <braunr> nlightnfotis: your report mentions runtime_malg
+ <nlightnfotis> yes, I runtime malg always returns a new goroutine
+ <braunr> nlightnfotis: what's the problem ?
+ <nlightnfotis> a new m created is assigned a new goroutine via runtime_malg
+ <nlightnfotis> what happens to that goroutine? Is it destroyed? Because it
+ seems to be a bogus goroutine. Why isn't the kernel thread instantly
+ picking the one goroutine available at the global goroutine pool?
+ <braunr> let's see if it's that hard to figure out
+ <nlightnfotis> seeing as m's and g's have a 1:1 (in gccgo) relationship,
+ and a new kernel thread is created everytime there is a new goroutine
+ there to run.
+ <braunr> are you sure about that 1:1 relationship ?
+ <braunr> i hardly doubt it
+ <braunr> highly*
+ <nlightnfotis> yeah, that's what I thought too, but then again, my research
+ so far shows that when a new goroutine is created, a new kernel thread
+ creation follows suit
+ <nlightnfotis> what I have mentioned of course, happens in runtime_newm
+ <braunr> nlightnfotis: that's when you create a new m, not a new g
+ <nlightnfotis> yes, a new m is created when you create a new g. My issue is
+ that during m's creation, a new (bogus) g is created and assigned to the
+ m. I am looking into what happens to that.
+ <braunr> nlightnfotis: "a new m is created when you create a new g", can
+ you point me to the code ?
+ <nlightnfotis> braunr: matchmg line 1280 or close to that. Creates new m's
+ to run new g's up to (mcpumax)
+ <braunr> "Kick off new m's as needed (up to mcpumax)."
+ <braunr> so basically you have at most mcpumax m
+ <nlightnfotis> yeah. but for a small number of goroutines (as for example
+ in my experiments), a new m is created in order to run a new g.
+ <braunr> runtime_newm is called only if mget(gp)) == nil
+ <braunr> be rigorous please
+ <braunr> when i ask
+ <braunr> 11:01 < braunr> are you sure about that 1:1 relationship ?
+ <braunr> this conclusively proves it's *false*
+ <braunr> so don't answer yes to that
+ <braunr> it's true for a small number of goroutines, ok
+ <braunr> and at startup
+ <braunr> because then, mget returns an existing m
+ <braunr> nlightnfotis: this g0 goroutine is described in the struct as
+ <braunr> G runtime_g0; // idle goroutine for m0
+ <braunr> runtime_malg builds it with just a stack
+ <braunr> apparently, that's the goroutine an m runs when there are no g
+ left
+ <braunr> so yes, the idle one
+ <braunr> it's not bogus
+ <nlightnfotis> I thought m0 and g0 where the bootstrap m and g for the
+ scheduler.
+ <nlightnfotis> *correction: runtime_m0 and runtime_g0
+ <braunr> hm i got a bit fast
+ <braunr> G* g0; // goroutine with scheduling stack
+ <nlightnfotis> braunr: scheduling stack with stacksize = -1?
+ <nlightnfotis> unless it's not used as a parameter
+ <nlightnfotis> let me investigate that
+ <nlightnfotis> yeah now that I am seeing it, it might make sense, if it
+ using a default stack size, #defined as StackMin
+ <braunr> g0 looks like a placeholder
+ <braunr> i think it's used to reuse switching code when there is only one
+ goroutine involved
+ <braunr> e.g. when starting
+ <braunr> anyway i don't think we should waste too much time with it
+ <braunr> nlightnfotis: try to make a real 1:1 mapping
+ <braunr> that's something else i suggested last time
+ <nlightnfotis> braunr: ok. Where do you suspect the problem lies?
+ <braunr> context switching
+ <nlightnfotis> inside the goruntime?
+ <braunr> in glibc
+ <braunr> try to use runtime.LockOSThread
+ <braunr> http://code.google.com/p/go-wiki/wiki/LockOSThread
+ <braunr> nlightnfotis: http://golang.org/pkg/runtime/ is probably better
+ <nlightnfotis> what exactly do you mean by `use runtime.LockOSThread`?
+ LockOSThread locks the very first m and goroutine as the main threads
+ during process initialisation
+ <nlightnfotis> in proc.c line 565 or something
+ <braunr> i'm not sure it will help, because the problem is likely to occur
+ before even switching to the goroutine that locks its m, but worth trying
+ <braunr> 11:28 < braunr> nlightnfotis: http://golang.org/pkg/runtime/ is
+ probably better
+ <braunr> the first example is specific to GUIs that have requirements on
+ the main thread
+ <braunr> whereas i want every goroutine to run in its own thread
+ <nlightnfotis> I have also noticed that some context switching happens in
+ the goruntime even with a low number of goroutines and kernel threads
+ <braunr> that's expected
+ <braunr> goroutines must be viewed as works, and ms as worker threads
+ <braunr> everytime a goroutine sleeps, its m should be switching to useful
+ work
+ <braunr> nlightnfotis: i'd make prints (probably using mach_print) of
+ contexts when saved and restored
+ <braunr> and try to see if it makes any sense
+ <braunr> that's not simple to setup but not overly complicated either
+ <braunr> don't hesitate to ask for help
+ <nlightnfotis> from inside glibc, right?
+ <braunr> yes
+ <braunr> well
+ <braunr> no from go
+ <braunr> don't touch glibc from now
+ <braunr> put these prints near calls to makecontext/swapcontext
+ <braunr> and setcontext/getcontext
+ <braunr> wel
+ <braunr> you'll be using getcontext i think
+ <nlightnfotis> noted it all. I also have the gdb output you asked me for
+ http://pastebin.com/LdnMQDh1
+ <braunr> i don't see main
+ <nlightnfotis> some notes first: The main thread is the one with id 4, and
+ the output on the top is its backtrace.
+ <braunr> and main.main is run in thread 6
+ <nlightnfotis> Remember that main when it comes to go is in the file
+ go-main.c
+ <braunr> so main becomes runtime_MHeap_Scavenger
+ <nlightnfotis> yeah, main.main is the code of the program, (the one the
+ user wrote, not the runtime)
+ <nlightnfotis> yeah, it becomes a gc thread
+ <nlightnfotis> seeing as runtime_starttheworld reports that there is
+ already one gc thread
+ <braunr> and how much are __pthread_total and __pthread_num_threads for
+ that trace ?
+ <nlightnfotis> they were: __pthread_total = 2, and __pthread_num_threads =
+ 4
+ <braunr> can you paste the assertion again please, just to make sure
+ <nlightnfotis> a.out: ./pthread/pt-create.c:167: __pthread_create_internal:
+ Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok =
+ thread->kernel_thread == ktid;
+ <nlightnfotis> __mach_port_deallocate ((__mach_task_self + 0), ktid); ok;
+ })' failed.
+ <braunr> btw, install the -dbg packages too
+ <nlightnfotis> dbg for which one? gccgo?
+ <braunr> libc0.3
+ <braunr> pthread/pt-create.c:167 is __pthread_sigstate (_pthread_self (),
+ 0, 0, &sigset, 0); here :/
+ <braunr> that assertion should be in __pthread_thread_start
+ <braunr> let's just say gdb is confused
+ <pinotree> braunr: apt-get source eglibc ; cd eglibc-* ; debian/rules patch
+ <braunr> pinotree: i have
+ <braunr> and that assertion can only trigger if __pthread_total is 1
+ <braunr> so let's say it just got to 2
+ <nlightnfotis> it does from very early on in process initialisation
+ <nlightnfotis> let me check this out again
+ <braunr> hm
+ <braunr> actually, both __pthread_total and __pthread_num_threads must be 1
+ <braunr> the context functions might be fine actually
+ <nlightnfotis> braunr: __pthread_num_threads = 2 right from the start of
+ the program
+ <nlightnfotis> 0x01da48ec is in mach_msg_trap
+ <braunr> something happened with libpthreads recently ..
+ <braunr> i can't even start iceweasel
+ <pinotree> braunr: what's the error?
+ <braunr> iceweasel: ./pthread/../sysdeps/generic/pt-mutex-timedlock.c:70:
+ __pthread_mutex_timedlock_internal: Assertion `__pthread_threads' failed.
+
+But not the [[open_issues/libpthread_dlopen]] issue?
+
+ <braunr> considering __pthread_threads is a global variable, this is tough
+ <braunr> i wonder if that's the issue with nlightnfotis's work
+ <braunr> wrong symbol resolution, leading libpthread to consider there is
+ only one thread running
+ <pinotree> try with LD_PRELOAD=/lib/i386-gnu/libpthread.so.0 iceweasel
+ <braunr> same
+ <braunr> maybe the switch to glibc 2.17
+ <braunr> this assertion is triggered by __pthread_self, assert
+ (__pthread_threads);
+ <braunr> __pthread_threads being the array of thread pointers
+ <braunr> so either corrupted (but we hardly changed anything ...) or wrong
+ resolution
+ <braunr> __pthread_num_threads includes the signal thread, __pthread_total
+ doesn't
+ <nlightnfotis> braunr: I recompiled with the libc debugging symbols and I
+ have new information
+ <nlightnfotis> the threads block at mach_msg_trap
+ <braunr> again, almost everything blocks there
+ <braunr> mach_msg is mach ipc, the way hurd system calls are implemented
+ <nlightnfotis> and the next calls (if it didn't block, from what I can see
+ from eip) are mach_reply_port and mach_thread_self
+ <braunr> please paste it
+ <nlightnfotis> yes give me 2 mins plz, brb
+ <braunr> pinotree: looks different for firefox
+ <braunr> it seems it calls pthread_key_create before pthread_create
+ <braunr> something our libpthread doesn't handle correctly
+ <nlightnfotis> braunr: http://pastebin.com/yNbT7nLn
+ <pinotree> braunr: what do you mean?
+ <braunr> pinotree: i mean libpthread needs to be fixed so thread-specific
+ data can be set even without a call to pthread_create
+ <braunr> nlightnfotis: hum, we already knew it was blocking in a semaphore
+ <braunr> nlightnfotis: ok forget the other things i told you to test
+ <braunr> nlightnfotis: track __pthread_total and __pthread_num_threads
+ <braunr> add prints (again, with mach_print) to see when (and why) they
+ change and go back to 1
+ <pinotree> braunr: i see that pthread_key_create uses a mutex which in
+ turns needs _pthread_self(), but shouldn't at least one pthread_create be
+ done (directly by libc for the main thread)?
+ <braunr> pinotree: no :)
+ <braunr> well
+ <braunr> it should have been for the signal thread indeed
+ <braunr> and the signal thread exists
+ <pinotree> and the main thread?
+ <braunr> not the main, no
+ <pinotree> how so?
+ <braunr> a simple test program shows it does indeed work ..
+ <braunr> so this is again another problem in firefox too
+ <nlightnfotis> braunr: I don't think I understand this. I mean how can
+ pthread_total and __pthread_num_thread turn to 1, when , right before and
+ right after the crash they have numbers between 2, 3, and 4?
+ <braunr> how did you get their values "right before" the crash ?
+ <nlightnfotis> I have set a breakpoint to a printing function right before
+ the go statement
+ <nlightnfotis> (right before in this context, in the application code, not
+ the runtime code, but then again, I don't really think they are too far
+ each other)
+ <braunr> well, that's the mystery
+ <nlightnfotis> I am not challenging what you said, I will of course do,
+ just asking to understand some things
+ <braunr> they may either turn to 1, or there is some mess with symbol
+ resolution leading threads to see a value of 1
+ <nlightnfotis> *do it
+ <braunr> there*
+ <nlightnfotis> braunr: ping
+ <teythoon> just ask ;)
+ <nlightnfotis> teythoon: have you used mach_print?
+ <teythoon> no
+ <nlightnfotis> I have some questions about it
+ <teythoon> ask them
+ <nlightnfotis> I was told to use them inside go's runtime, to print the
+ values of __pthread_total and __pthread_num_threads. The thing is, these
+ values (I believe) are unknown to the runtime, they are only known to the
+ executable (linking time and later)
+ <teythoon> so? if the requested information is bound to a symbol that is
+ resolved at link time, you can print it from within the runtime
+ <teythoon> the same way any function from the libc is not known to the
+ executable until linking against it, but you can still "use" it in your
+ executable
+ <nlightnfotis> yeah, ok I understand that, but these are references that
+ are resolved at link time. The values I want to print are totally unknown
+ to the runtime (0 references to them)
+ <teythoon> if the value you are interested in is bound to the symbol
+ __pthread_total at link time, then you've got a reference you can use
+ <teythoon> doesn't printing __pthread_total work? did you try that?
+ <nlightnfotis> no, whenever I printed these values I did it from gdb. I am
+ trying to do what you suggested atm
+ <braunr> nlightnfotis: im here
+ <braunr> printing those values from libgo will tell us what value libgo
+ actually sees
+ <nlightnfotis> I am trying to use mach_print. Could you give me some
+ pointers on its usage (inside the goruntime?) (I have already read your
+ document here
+ http://www.gnu.org/software/hurd/microkernel/mach/gnumach/interface/syscall/mach_print.html
+ and the example code)
+ <braunr> and symbol resolution may depend on where it's done from
+ <braunr> nlightnfotis: first, it only work with -dbg kernels
+ <braunr> so make sure you're running one
+ <braunr> actually, i'll write you a patch
+ <braunr> including a mach_printf function with argument parsing
+ <nlightnfotis> isn't it on by default? I read that on the document you are
+ discussing mach_printf
+ <nlightnfotis> ahh ok
+ <braunr> it's on by default on -dbg kernels
+ <braunr> i'll make a repository on darnassus too
+ <braunr> better store it there
+ <braunr> nlightnfotis:
+ http://darnassus.sceen.net/gitweb/rbraun/mach_print.git/
+ <braunr> nlightnfotis: i suggest you implement mach_print with inline asm
+ statement in a C file, so that you don't need to alter the build system
+ configuration
+ <braunr> i'll make an example of that too
+ <nlightnfotis> braunr: that wasn't a problem. My only real problem atm is
+ that __atomic_t isn't recognised as a type, and I can not find the header
+ file for it on Hurd
+ <nlightnfotis> it was pt-internal.h in libpthread
+ <braunr> ah
+ <braunr> nlightnfotis: just in case, i updated the repository with an
+ inline assembly version
+ <braunr> let's see about __atomic_t
+ <braunr> sysdeps/i386/bits/pt-atomic.h:typedef __volatile int __atomic_t;
+ <braunr> nlightnfotis: just redeclare it as this locally
+ <braunr> nlightnfotis: ok ?
+ <nlightnfotis> I am working on it, because I still haven't found what
+ __atomic_t is typedefed from. Thinking of typedefing an int to it and see
+ how it goes
+ <nlightnfotis> braunr: found it just now: __volatile int
+ <braunr> "just now" ?
+ <braunr> 14:19 < braunr> sysdeps/i386/bits/pt-atomic.h:typedef __volatile
+ int __atomic_t;
+ <nlightnfotis> I was using cscope all this time
+ <braunr> why use cscope at all when i tell you where it is ?
+ <nlightnfotis> because I didn't notice it: your discussion was between
+ pino's and srs' and I wasn't tagged and thought it had something to do
+ with their discussion
+ <pinotree> (sorry)
+ <nlightnfotis> no it was my bad
+ <braunr> ok
+ <braunr> pinotree: there is indeed a special call to
+ __pthread_create_internal for the main thread
+ <pinotree> yeah
+ <pinotree> braunr: if there wouldn't be that libc→pthread bridge, things
+ like pthread_self() or so wouldn't work for the main thread
+ <braunr> pinotree: right
+ <pinotree> braunr: weird thing is that the error you got is usually a sign
+ that pthread is not linked in explicitly
+ <braunr> pinotree: yes
+ <braunr> pinotree: with firefox, gdb can't locate pthread symbols before a
+ call to a pthread function
+ <braunr> so yes, libpthread is loaded after main is called
+ <braunr> nlightnfotis: can you give me a quick procedure to build gcc with
+ go support from your repository, and then test a go program please ?
+ <braunr> to i can have a better look at it myself
+ <braunr> so*
+ <nlightnfotis> braunr: sure you want access to my go repo? If you already
+ have gcc repo add my github repo as a remote and checkout
+ fotisk/goruntime_hurd
+ <braunr> i have your github repo
+ <nlightnfotis> git checkout fotisk/goruntime_hurd (You may need to revert a
+ commit or two, because of my latest endeavour with mach_print
+ <nlightnfotis> braunr: check it out now, I reverted some messy commits for
+ you to rebuild
+ <braunr> nlightnfotis: i won't work on it right now, i'm building glibc to
+ check some things in libpthread
+ <braunr> since it seems to be the source of your problems and many others
+ <nlightnfotis> oh ok then. btw, it compiles ok, but when I try to compile
+ another program with gccgo collect2 cries about undefined references to
+ __pthread_num_threads and __pthread_total
+ <braunr> Oo
+ <braunr> another program ?
+ <nlightnfotis> braunr: will I get the same result if I slowly go through it
+ with gdb
+ <nlightnfotis> yep
+ <braunr> i don't understand
+ <braunr> what compiles ok, what fails ?
+ <nlightnfotis> gccgo compiles without errors (which is strange) but when I
+ use it to compile goroutine.go it fails with the errors I reported
+ <pinotree> (missing linking to pthread?)
+ <braunr> since when ?
+ <nlightnfotis> pinotree: perhaps braunr: since I made the changes with
+ mach_print
+ <nlightnfotis> pinotree: but what could be missing the link? GCC compiled
+ programs are getting linked automatically to the shared objects of the
+ headers they include right?
+ <nlightnfotis> (assuming it's not a huge program, only a tiny 10 liner for
+ instance)
+ <braunr> uh
+ <braunr> did you declare them as extern
+ <braunr> ?
+ <nlightnfotis> yes
+ <braunr> do you see -lpthread on the link line ?
+ <nlightnfotis> during gcc's compilation? I will have to rerun it again and
+ see.
+ <braunr> log the compilation output somewhere once
+ <braunr> nlightnfotis: why did you remove volatile from the definition of
+ __atomic_t ??
+ <nlightnfotis> just for testing purposes, because I thought that the GNU
+ version is volatile with no __ in front of it and that might cause some
+ issues.
+ <braunr> i don't understand
+ <nlightnfotis> it was just an experiment gone wrong
+ <braunr> nlightnfotis: keep volatile there
+ <nlightnfotis> just did
+ <nlightnfotis> braunr: there is -lpthread on some lines. For instance when
+ libtool is invoked.
+ <youpi> braunr: the pthread assertion usually happens when libpthread gets
+ loaded from a plugin, I guess mozilla got rid of libpthread in the main
+ application recently, simply
+ <pinotree> youpi: he said that the LD_PRELOAD trick (which used to
+ workaround the issue in older iceweasel) does not work, though
+ <youpi> ah? it does work for me
+ <pinotree> dunno then...
+ <braunr> youpi: aouch, ok
+ <braunr> nlightnfotis: what about the specific gcc invocation that fails ?
+ <braunr> pinotree: /lib/i386-gnu/libpthread.so.0: ERROR: cannot open
+ `/lib/i386-gnu/libpthread.so.0' (No such file or directory)
+ <braunr> trying with a working path this time
+ <braunr> better
+ <pinotree> sorry, i typed it by hand :p
+ <braunr> Segmentation fault
+ <braunr> but no assertion
+ <nlightnfotis> braunr: gccgo hello.go
+ <braunr> nlightnfotis: ?
+ <pinotree> <braunr> nlightnfotis: what about the specific gcc invocation
+ that fails ?
+ <braunr> nlightnfotis: i'm asking if -lpthread is present when you have
+ these undefined reference errors
+ <nlightnfotis> it is. it seems so
+ <nlightnfotis> I wrote above that it is present when libtool is called
+ <nlightnfotis> I don't know what libtool is doing sadly
+ <braunr> you said some lines
+ <nlightnfotis> but I from what I've seen I believe it does some kind of
+ linking
+ <braunr> paste it somewhere please
+ <nlightnfotis> yeah it doesn't fail though
+ <braunr> that's far too vague ...
+ <braunr> it doesn't fail ?
+ <nlightnfotis> give me a second
+ <braunr> i thought it did
+ <nlightnfotis> no it doesn't
+ <braunr> 14:53 < nlightnfotis> gccgo compiles without errors (which is
+ strange) but when I use it to compile goroutine.go it fails with the
+ errors I reported
+ <nlightnfotis> yeah gccgo compiles.
+ <nlightnfotis> when I use the compiler, it fails
+ <braunr> so it fails running
+ <braunr> is gccgo built with -lpthread itself ?
+ <nlightnfotis> http://pastebin.com/1TkFrDcG
+ <nlightnfotis> check it out
+ <nlightnfotis> I think it does, but I would take an extra opinion
+ <nlightnfotis> line 782
+ <nlightnfotis> and 784
+ <braunr> (are you building as root ?)
+ <nlightnfotis> yes. for now
+ <pinotree> baaad :p
+ <nlightnfotis> I never had any particular problems...except that one time
+ that I rm -rf the source tree :P
+ <nlightnfotis> I know it's bad d/w
+ <nlightnfotis> braunr: I found something interesting (I don't know if it's
+ expected or not; probably not): If I set GOMAXPROCS to 2, and run the
+ goroutine program, it seems to be running for a while (with the
+ goroutines!) and then it segfaults. Will look more into it
+ <braunr> it's interesting, yes
+ <braunr> nlightnfotis: have you tried the preload trick too ?
+ <nlightnfotis> ldpreload? no. Could you tell me how to do it? export
+ LDPRELOAD and a path to libpthread?
+ <braunr> nlightnfotis: LD_PRELOAD=/lib/i386-gnu/libpthread.so.0.3 ...
+ <nlightnfotis> braunr: it also produces a very different backtrace. This
+ one heavily involves mig functions
+ <tschwinge> braunr, nlightnfotis: Thanks for working together, and sorry
+ for my lack of time.
+ <braunr> nlightnfotis: paste please
+ <nlightnfotis> tschwinge, Hello. It's ok, I am sorry for not showing good
+ amounts of progress from my part.
+ <nlightnfotis> braunr: http://pastebin.com/J4q2NN9p
+ <braunr> nlightnfotis: thread apply all bt full please
+ <nlightnfotis> braunr: http://pastebin.com/tbRkNzjw
+ <braunr> looks like an infinite loop of
+ __mach_port_mod_refs/__mig_dealloc_reply_port
+ <braunr> ...
+ <nlightnfotis> yes that's what I got from it too. Keep in mind these
+ results are with GOMAXPROCS=2 and they result in segmentation fault
+ <nlightnfotis> and I also can not understand the corrupted stack at the
+ beginning of the backtrace
+ <braunr> no please
+ <nlightnfotis> ?
+ <braunr> test LD_PRELOAD=/lib/i386-gnu/libpthread.so.0.3 without
+ GOMAXPROCS=2
+ <nlightnfotis> braunr: LD_PRELOAD without GOMAXPROCS results in the usual
+ assertion failure and abortion of execution after it
+ <braunr> nlightnfotis: ok
+ <braunr> nlightnfotis: im sorry, i thought you couldn't launch a test since
+ you added mach_print
+ <nlightnfotis> I am not using mach_print, I couldn't fix the issue with the
+ references and thought I was losing time, so I went back to debugging
+ with gdb until I can't get anything more out of it
+ <nlightnfotis> braunr: should I focuse on mach_print? Will it produce very
+ different results than gdb?
+ <nlightnfotis> *focus
+ <nlightnfotis> (btw I didn't delete mach print or anything, it's still
+ there, in another branch)
+ <nlightnfotis> braunr: Now I stepped through the program in gdb, and got
+ something really really weird. Some close to a full execution
+ <nlightnfotis> Number of gorountines and machine threads according to
+ runtime was 3, __pthread_num_threads was 4
+ <nlightnfotis> it did get SIGILL (illegal instruction some times though)
+ <nlightnfotis> and it exited with code 02
+ <braunr> uh
+ <braunr> nlightnfotis: try with mach_print yes, it will show the values
+ from the real execution context, and be as close as what we can get
+ <braunr> i'm not sure about how gdb finds the values
+ <nlightnfotis> braunr: ok, will spend the rest of the day to find a way to
+ make mach_print and the other values work. Did you see my last messages,
+ with the goroutines that worked under gdb?
+ <braunr> yes
+ <nlightnfotis> it seemed to run. Didn't get the expected output, but also
+ didn't get any errors other than illegal instruction either
+ <nlightnfotis> braunr: I still have not found an easy way to do what you
+ asked me to from go's runtime. Would it be ok if I do it from inside
+ libpthread?
+ <braunr> nlightnfotis: do what ?
+ <nlightnfotis> print the values of __pthread_total and
+ __pthread_num_threads with mach_print.
+ <braunr> how ?
+ <braunr> oh wait
+ <braunr> well yes ofc, they're not exported :/
+ <braunr> nlightnfotis: have you been able to use mach_print ?
+ <nlightnfotis> braunr: not really because of the problems I shared
+ earlier. I can try to use with in-gcc structures if you want me to, it's
+ nothing hard to do
+ <nlightnfotis> actually I will. Hang on
+ <braunr> proceed with debugging inside libpthread instead
+ <braunr> using mach_print to avoid deadlocks this time
+ <braunr> (mach_print was purposely built for debugging such low level code
+ parts)
+ <nlightnfotis> ok, I will patch this, but can I build it tomorrow?
+ <braunr> yes
+ <braunr> just keep us informed
+ <nlightnfotis> ok, thanks, and sorry for everything I have done. I want you
+ to know that I really appreciate that you are helping me.
+ <braunr> remember: the goal here is to understand why __pthread_total and
+ __pthread_num_threads have inconsistent values
+ <nlightnfotis> braunr: whenever you see it, mach_print works as expected
+ inside gcc.
+
+
+# IRC, freenode, #hurd, 2013-09-03
+
+ <nlightnfotis> braunr: I have made the changes I want to glibc. After I
+ build it, how do I install it? make install or is it more involved?
+ <braunr> nlightnfotis: use LD_LIBRARY_PATH
+ <braunr> never install an experimental glibc unless you have backups or are
+ certain of what you're doing
+ <braunr> nlightnfotis: i didn't understand what you meant about mach_print
+ yesterday
+ <nlightnfotis> it works in gcc.
+ <braunr> what do you mean "in gcc" ?
+ <braunr> why would you put mach_print in gcc ?
+ <braunr> we want it in go programs ..
+ <nlightnfotis> yes, I understand it. gcc was the fastest way to test it's
+ usage at that moment (for me) and I just wanted to confirm it works. I
+ only had to change its signature to const char * because gcc wouldn't
+ accept it otherwise
+ <braunr> doesn't my example include const ?
+ <braunr> nlightnfotis: why did you rebuild glibc ?
+ <nlightnfotis> braunr: I have not started yet, will do now, to apply the
+ changes to libpthread
+ <braunr> you mean add the print calls there ?
+ <nlightnfotis> yes
+ <braunr> ok
+ <braunr> use debian/rules build, interrupt when you see gcc invocations
+ <braunr> then switch to the build directory (hurd-libc-i386 iirc), and make
+ others
+ <braunr> nlightnfotis: did you send me the instructions to build and test
+ your work ?
+ <braunr> so i can reproduce these weird threading problems at my side
+ <nlightnfotis> braunr: sorry, I was in the toilet, where would you like me
+ to send the instructions?
+ <braunr> nlightnfotis: i should be fine i guess, let's check here
+ <braunr> nlightnfotis: i simply used configure
+ --enable-languages=c,c++,go,lto
+ <braunr> and i'll see how it goes
+ <nlightnfotis> I configure with --enable-languages=go (it automatically
+ builds c and c++ for that as go depends on them), --disable-bootstrap,
+ and use a custom prefix to install at a custom location
+ <braunr> yes
+ <braunr> ok
+ <braunr> nlightnfotis: how long does it take you ?
+ <nlightnfotis> complete non-bootstrap build about 45 minutes. With a build
+ tree ready and only simple changes, about 2-3 minutes
+ <nlightnfotis> braunr: In an hour I will go offline for 2-3 hours, I am
+ gonna move back to my other home in the other city. It won't take long,
+ the whole process will be about 4 hours, and I will compensate for the
+ time lost by staying up late up until 3 o clock in the morning
+ <braunr> i'd prefer you didn't "compensate"
+ <nlightnfotis> ?
+ <braunr> work if you want to
+ <braunr> noone if forcing you to work late at night for gsoc, unless you
+ want to
+ <nlightnfotis> no, I do it because I want to. I **really** really want to
+ succeed, and time is off the essence for me at this point
+ <braunr> then ok
+ <braunr> nlok i have a gccgo compiler
+ <pinotree> nlok?
+ <braunr> nl being nlightnfotis but he's gone
+ <pinotree> oh
+ * pinotree was trying to parse that as "now" or "look" or the like
+ <nlightnfotis> braunr: 08:19:56< braunr> use debian/rules build, interrupt
+ when you see gcc invocations: Are gcc invocations related to
+ i486-gnu-gcc-4.7?
+ <nlightnfotis> nvm I'm good now :)
+ <gnu_srs> of course not, that's only for compiling applications using the
+ newly built libc
+ <nlightnfotis> gnu_srs: I didn't exactly understand what you said? Care to
+ elaborate? which one is for compiling applications using the newly build
+ libc? -486-gnu-gcc-4.7?
+ <gnu_srs> when you see gcc ... -llibc.so you know libc.so is built, and
+ that is sufficient to use it.
+ <gnu_srs> with LD_PRELOAD or LD_LIBRARY_PATH (after cding and building
+ others)
+ <nlightnfotis> gnu_srs: thanks for the tip :)
+ <gnu_srs> :-D
+ <nlightnfotis> is anyone else getting glibc build problems? (from apt-get
+ source glibc, at cxa-finalize.c)?
+ <gnu_srs> apt-get source eglibc; apt-get build-dep eglibc (as root);
+ dpkg-buildpackage -b ...
+ <braunr> nlightnfotis: just debian/rules build
+ <braunr> to start the glibc build
+ <nlightnfotis> braunr: oh I have now, it's building without issues so far
+ <braunr> when you see gcc processes, it means the build process has
+ switched from configuring to making
+ <braunr> then interrupt (ctrl-c)
+ <braunr> cd build-tree/hurd-i386-libc
+ <braunr> make others
+ <braunr> or make lib others
+ <braunr> lib is glibc, others is some addons which include our libpthread
+ <nlightnfotis> thanks for the tip braunr.
+ <nlightnfotis> braunr: I have managed to get a working version of glibc and
+ libpthread with mach_print working. I have also run 2 test programs and
+ it works as expected. Will continue researching tomorrow if that's ok
+ with you, I am too tired to keep on now.
+ <nlightnfotis> for the record compilation of glibc right from the start was
+ about 1 hour and 20 - 30 minutes
+
+
+# IRC, freenode, #hurd, 2013-09-04
+
+ <braunr> i've taken a deeper look at this assertion failure
+ <braunr> and ...
+ <braunr> it has nothing to do with pthread_create
+ <braunr> i assumed it was the one in sysdeps/mach/pt-thread-start.c
+ <nlightnfotis> pthread_self ()?
+ <braunr> but it's actually from sysdeps/mach/hurd/pt-sysdep.h, in
+ _pthread_self()
+ <braunr> and looking there :
+ <braunr> thread = *(struct __pthread **)__hurd_threadvar_location
+ (_HURD_THREADVAR_THREAD);
+ <braunr> so simply put, context switching doesn't fix up thread specific
+ data ...
+ <braunr> it's that simple
+ <nlightnfotis> wow
+ <nlightnfotis> today I was running programs all day long with mach_print on
+ to print __pthread_total and __pthread_num_threads to see when both
+ become 1 and couldn't find anything
+ <nlightnfotis> I was nearly desperate. You just made my day! :)
+ <braunr> now the problem is
+ <braunr> thread specific data is highly dependent on the stack
+ <braunr> it's illegal to make a thread switch stack and expect it to keep
+ working on the hurd
+ <nlightnfotis> unless split stack is activated?
+ <nlightnfotis> no wait
+ <braunr> split stack is completely unsupported on the hurd
+ <teythoon> uh, why would that be?
+ <braunr> teythoon: about split stack ?
+ <teythoon> yes
+ <braunr> i'm not sure
+ <nlightnfotis> at least now we do know what the problem is and I can start
+ working on a solution.
+ <nlightnfotis> braunr: we should tell tschwinge and youpi about it.
+ <braunr> nlightnfotis: sure but
+ <braunr> nlightnfotis: you can also start looking at a workaround
+ <braunr> nlightnfotis: also, let's makre sure that's the reason first
+ <braunr> nlightnfotis: use mach_print to display the stack pointer when
+ switching
+ <braunr> nlightnfotis:
+ http://stackoverflow.com/questions/1880262/go-forcing-goroutines-into-the-same-thread
+ <braunr> " I believe runtime.LockOSThread() is necessary if you are
+ creating a library binding from C code which uses thread-local storage"
+ <braunr> oh, a paper about the go runtime scheduler
+ <braunr> let's have a look ..
+ <teythoon> braunr: have you seen the high level overview presented in that
+ blog post I once posted here?
+ <braunr> no
+ <nlightnfotis> braunr, just came back, and read the log. Which paper are
+ you reading? The one from columbia university?
+ <braunr> but i need to know about details here, specifically, if threads do
+ change stack
+ <braunr> nlightnfotis: yes
+ <teythoon> braunr: ok
+ <braunr> this could be caused either by true stack switching, or by "stack
+ segmentation" as implemented by go
+ <braunr> it is interesting that there are stack related members per
+ goroutine
+ <braunr> nlightnfotis: in particular, pthread_attr_setstacksize() doesn't
+ work on the hurd
+ <nlightnfotis> <braunr> it is interesting that there are stack related
+ members per goroutine -> I think that's go's policy. All goroutines run
+ on a shared address space (that is the kernel thread's address space)
+ <braunr> nlightnfotis: that's obvious
+ <braunr> and not the problem
+ <braunr> and yes, it's "stack segmentation"
+ <braunr> and on linux, and probably other archs, switching stack may be
+ perfectly legit
+ <braunr> on the hurd, we still have threadvars
+ <braunr> which are the hurd specific thread local storage mechanism
+ <braunr> it means 1/ all stacks in a process must have the same size
+ <braunr> 2/ stack size must be a power of two
+ <braunr> 3/ threads can't switch stack
+ <braunr> this hardly prevents goroutines from being run by just any thread
+ <braunr> i see there already hard hurd specific changes about stack
+ handling
+ <nlightnfotis> so we should only make changes to the specific gccgo
+ scheduler as a workaround under the Hurd right?
+ <braunr> i don't know
+ <braunr> this might also push the switch to tls
+ <nlightnfotis> this sounds better as a long term fix
+ <nlightnfotis> but it must also involve a great amount of work, right?
+ <braunr> most of it has already been done
+ <braunr> by youpi and tschwinge
+ <nlightnfotis> with the changes to tls early in the summer?
+ <braunr> maybe
+ <braunr> 14:36 < braunr> nlightnfotis: also, let's makre sure that's the
+ reason first
+ <braunr> 14:36 < braunr> nlightnfotis: use mach_print to display the stack
+ pointer when switching
+ <braunr> check what goes wrong with the stack
+ <braunr> then we'll see
+ <braunr> as a very simple workaround, i expect locking g's on m's to be a
+ good first step
+ <nlightnfotis> braunr: noted everything. that's my work for tonight. I
+ expect myself to stay up late like yesterday and have this all figured
+ out by tomorrow.
+ <braunr> nlightnfotis: why not now ?
+ <nlightnfotis> I am starting from now, but I expect myself to stop about 6
+ o clock here (2 hours) because I have an appointment with a doctor.
+ <nlightnfotis> and keep on when I come back home
+ <braunr> well adding a few printfs to track the stack should be doable
+ before 2 hours
+ <nlightnfotis> braunr: I am doing it now. Will report as soon as I have
+ results :)
+ <nlightnfotis> braunr: have I messed up with the way I read esp's value?
+ https://github.com/NlightNFotis/glibc/commit/fdab1f5d45a43db5c5c288c4579b3d8251ee0f64#L1R67
+ <braunr> nlightnfotis: +unsigned
+ <braunr> nlightnfotis: using gdb :
+ <braunr> (gdb) info registers
+ <braunr> esp 0x203ff7c0 0x203ff7c0
+ <braunr> (gdb) print thread->stackaddr
+ <braunr> $2 = (void *) 0x2000000
+ <nlightnfotis> oh yes, I know about gdb, I thought you wanted me to use
+ mach_print
+ <braunr> nlightnfotis: yes
+ <braunr> this is just my own attempt
+ <braunr> and it does show the stack pointer is completely outside the
+ thread stack
+ <braunr> nlightnfotis: in your code, i suggest using
+ __builtin_frame_address()
+ <braunr> well __builtin_frame_address(0)
+ <braunr> see
+ http://gcc.gnu.org/onlinedocs/gcc-4.7.3/gcc/Return-Address.html#Return-Address
+ <braunr> it's not exactly the stack pointer but close enough, unless of
+ course the stack is changed in the middle of the function
+ <nlightnfotis> I see. I am gonna try one more time with esp the way I
+ worked it and if it fails to work, I am gonna use return address
+ <braunr> nlightnfotis: be very careful about signed/unsigned and type
+ widths
+ <braunr> not return address, frame address
+ <braunr> return address is code, frame address is data (stack)
+ <nlightnfotis> ah, I see, thanks for the correction.
+ <braunr> youpi: not sure you catched it earlier, the problem fotis has been
+ having with goroutines is about threadvars
+ <braunr> simply put, threads use setcontext functions to save/restore
+ goroutines state, which make them switch stack, rendering the location of
+ threadvars invalid, and making _pthread_self() choke
+
+
+# IRC, freenode, #hurd, 2013-09-05
+
+ <nlightnfotis> I am having very weird behavior with my code, something that
+ I can not explain and seems likely to be a bug, could someone else take a
+ look?
+ <nlightnfotis> pinotree are you available at the moment to take a look at
+ something?
+ <pinotree> nlightnfotis: dont ask to ask, just ask
+ <nlightnfotis> I have made some modifications to pthread_self as also
+ suggested by braunr to see if the stack pointer is within the bounds of
+ the frame address after context switching. I can get the values of both
+ esp and frame_address to be shown before the context switch, but I can
+ only get the value of esp to be shown after the context switch, and it
+ always results to the program getting killed
+ <nlightnfotis>
+ https://github.com/NlightNFotis/glibc/blob/7e72da09a42b1518865f6f4882d68689e681f25b/libpthread/sysdeps/mach/hurd/pt-sysdep.h#L97
+ <nlightnfotis> thing is a dummy print value I have right after the code
+ that was supposed to print the frame_address after the context switching
+ is executing without any issues.
+ <pinotree> oh assembler... cannot help, sorry :/
+ <nlightnfotis> oh no, I am not asking for assembler help, that part works
+ quite alright. I am asking why from the 4 identical pieces of code that
+ print debugging values the last one doesn't work. I am on it all day, and
+ still have not found an answer
+ <braunr> nlightnfotis: i can
+ <nlightnfotis> hello braunr,
+ <braunr> nlightnfotis: do you have a backtrace ?
+ <braunr> uh
+ <nlightnfotis> nope, it crashes right after I execute something. Let me
+ compile glibc once again and see if a fix I attempted works
+ <braunr> malloc and free use locks
+ <braunr> so they probably use _pthread_self
+ <braunr> don't use them
+ <braunr> for debugging, a simple statically allocated buffer on the stack
+ will do
+ <braunr> nlightnfotis: so ?
+ <nlightnfotis> Ι got past my original problem, but now I am trying to get
+ past the sigkills that kill the program at the beginning
+ <nlightnfotis> i remember not having this problem, so I am compiling my
+ master branch to see if it is reproducible. If it is, it means something
+ is very wrong. If it's not, it means I screwed up somewhere
+ <braunr> i don't understand, how do you know if you get past the problem if
+ you still have trouble reaching that code ?
+ <nlightnfotis> braunr: I fixed all my problems now. I can see that both esp
+ and the frame_address are the same after context switching though?
+ <braunr> always ?
+ <braunr> for all goroutines ?
+ <nlightnfotis> for all kernel threads, not go routines. We are in
+ libpthread
+ <braunr> if they're the same after a context switch, it usually means the
+ scheduler didn't switch
+ <braunr> well obviously
+ <braunr> but what i asked you was to trace calls to setcontext functions
+ <nlightnfotis> I will run some tests again. May I show you my code to see
+ if there is anything wrong with it?
+ <braunr> what address do you have ?
+ <braunr> not yet
+ <braunr> i'm not sure you understand what i want to check
+ <braunr> do you see how threadvars work basically ?
+ <nlightnfotis> I think so yes, they keep in the stack the local variables
+ of a thread right?
+ <nlightnfotis> and the globals
+ <nlightnfotis> or
+ <nlightnfotis> wait a minute...
+ <braunr> yes but do you see how the thread specific data are fetched ?
+ <nlightnfotis> with __hurd_threadvar_location_from_sp?
+ <braunr> yes but "basically", what does it do ?
+ <nlightnfotis> it get's a stack pointer as a parameter, and returns the
+ location of that specific data based on that stack pointer, right?
+ <braunr> and how ?
+ <nlightnfotis> I believe it must compare the base value of the stack and
+ the value of the end of the stack, and if the results are consistent, it
+ returns a pointer to the data?
+ <braunr> and how does it determine the start and end of the stack ?
+ <nlightnfotis> stack_pointer must be pointing at the base of the
+ stack. That + stack_size must be the stack limit I guess.
+ <braunr> so you're saying the caller of __hurd_threadvar_location_from_sp
+ knows the stack base ?
+ <nlightnfotis> I am not so sure I understand this question.
+ <braunr> i want to know if you understand how threadvars work
+ <braunr> apparently you don't
+ <braunr> the caller only has its current stack pointer
+ <braunr> which does *not* point to the stack base
+ <braunr> threadvars work by assuming a *fixed* stack size, power of two,
+ aligned (obviously)
+ <braunr> in our case, 2MiB (except in hurd servers where a kludge reduces
+ that to 64k)
+ <braunr> this is why stack size can't be changed
+ <braunr> this is also why the stack pointer can't ever point outside the
+ initial stack
+ <braunr> i want you to make sure go violates this last assumption
+ <braunr> so 1/ show the initial stack boundaries of your threads, then show
+ that, after loading a goroutine, the stack pointer is outside
+ <braunr> which is what, if i'm right, triggers the assertion
+ <braunr> ask if there is anything confusing
+ <braunr> this is important, it should already have been done
+ <nlightnfotis> ok, I noted it all, I am starting to work on it right now. I
+ only have one question. My results, the ones with the stack pointer and
+ the frame address, are expected or unexpected?
+ <braunr> i don't know
+ <braunr> show me the code again please
+ <braunr> and explain your intent
+ <nlightnfotis>
+ https://github.com/NlightNFotis/glibc/blob/7fe202317db4c3947f8ae1d1a4e52f7f0642e9ed/libpthread/sysdeps/mach/hurd/pt-sysdep.h
+ <nlightnfotis> At first I print the value of esp and the frame_address
+ before the context switching and after the context switching.
+ <nlightnfotis> The different variables were introduced as part of a test to
+ see if my results were consistent,
+ <braunr> what context switch ?
+ <nlightnfotis> in hurd_threadvar_location
+ <braunr> what makes you think this is a context switch ?
+ <nlightnfotis> in threadvar.h, it calls __hurd_threadvar_location_from_sp.
+ <nlightnfotis> the full path for it is glibc/hurd/hurd/threadvar.h
+ <braunr> i don't see how giving me the path will explain why it's a context
+ switch
+ <braunr> and i can tell you right away it's not
+ <braunr> hurd_threadvar_location is basically a lookup returning the
+ address of the thread specific data
+ <nlightnfotis> wait a minute...does this mean that
+ hurd_threadvar_location_from_sp is also a lookup function for the same
+ reason
+ <nlightnfotis> ?
+ <braunr> yes
+ <braunr> isn't the name meaningful enough ?
+ <braunr> "location of the threadvars from stack pointer"
+ <nlightnfotis> I guess I made wrong deductions from when you originally
+ shared your findings...
+ <nlightnfotis> <braunr> thread = *(struct __pthread
+ **)__hurd_threadvar_location (_HURD_THREADVAR_THREAD);
+ <nlightnfotis> <braunr> so simply put, context switching doesn't fix up
+ thread specific data ...
+ <nlightnfotis> I thought that hurd_threadvar_location was doing the context
+ switching
+ <braunr> nlightnfotis: by context switching, i mean setcontext functions
+ <nlightnfotis> braunr: You mean the one in sysdeps/mach/hurd/i386?
+ <braunr> yes
+ <braunr> but
+ <braunr> do you understand what i want you to check now ?
+ <nlightnfotis> I think I got this time: Let me explain it:
+ <nlightnfotis> You suggested that stack sizes are fixed. That is the main
+ reason that the stack pointer should not be able to point outside of it.
+ <braunr> no
+ <braunr> locating threadvars is done by applying a mask, computed from the
+ stack size, on the stack pointer, to determine its base
+ <nlightnfotis> yeah, what __hurd_threadvar_location_from_sp is doing
+ <braunr> if size is a power of two, size - 1 is a mask that, if
+ complemented, aligns the address
+ <braunr> yes
+ <braunr> so, threadvars expect the stack pointer to always point to the
+ initial stack
+ <nlightnfotis> and we wanna prove that go violates this rule right? That
+ the stack pointer is not pointing at the initial stack
+ <braunr> yes
diff --git a/community/gsoc/project_ideas/download_backends.mdwn b/community/gsoc/project_ideas/download_backends.mdwn
index f794e814..c0bdc5b2 100644
--- a/community/gsoc/project_ideas/download_backends.mdwn
+++ b/community/gsoc/project_ideas/download_backends.mdwn
@@ -1,12 +1,12 @@
-[[!meta copyright="Copyright © 2009 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2009, 2013 Free Software Foundation, Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
-is included in the section entitled
-[[GNU Free Documentation License|/fdl]]."]]"""]]
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
[[!meta title="Use Internet Protocol Translators (ftpfs etc.) as Backends for Other Programs"]]
@@ -19,8 +19,9 @@ Download protocols like FTP, HTTP, BitTorrent etc. are very good candidates for
this kind of modularization: a program could simply use the download
functionality by accessing FTP, HTTP etc. translators.
-There is already an ftpfs translator in the Hurd tree, as well as an [httpfs
-translator on hurdextras](http://www.nongnu.org/hurdextras/#httpfs); however,
+There is already an [[hurd/translator/ftpfs]] translator in the Hurd tree, as
+well as an [[hurd/translator/httpfs]] on
+[hurdextras](http://www.nongnu.org/hurdextras/); however,
these are only suitable for very simple use cases: they just provide the actual
file contents downloaded from the URL, but no additional status information
that are necessary for interactive use. (Progress indication, error codes, HTTP
diff --git a/community/gsoc/project_ideas/mtab/discussion.mdwn b/community/gsoc/project_ideas/mtab/discussion.mdwn
new file mode 100644
index 00000000..716fb492
--- /dev/null
+++ b/community/gsoc/project_ideas/mtab/discussion.mdwn
@@ -0,0 +1,2072 @@
+[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_hurd]]
+
+# IRC, freenode, #hurd, 2013-04-17
+
+ <kuldeepdhaka> thinking how to get the listing. traversing would be
+ ineffecient, trying to come up with something better
+ <braunr> what listing ?
+ <braunr> and traversing what ?
+ <kuldeepdhaka> mtab
+ <braunr> well i assumed so
+ <braunr> be more precise please
+ <kuldeepdhaka> when the translator is done initalized <translation
+ info> are written to /etc/mtab <translation info> will be provided
+ by the translator, and when some one want to read the info just read it
+ this way if their is some credentials like ftp sites pass username can be
+ masked by the translator
+ <kuldeepdhaka> if some trans dont want to list them, no need to write to
+ file | while unmounting (sorry i couldnt find the right word) , it
+ will pass the mount node address | <translation info> will have special
+ structure to remove/add mounts example "a /mount-to /mount-from" = add
+ , "r /mount-to" = remove here "/mount-to" will be unique for every
+ mount
+ <kuldeepdhaka> this have a draw back , we would have to trust trans for the
+ listed data | also "/mount-to" + "/mount-from" could be used a
+ combination for making sure that other trans unable remove others trans
+ mount data
+ <kuldeepdhaka> sorry but "also "/mount-to" + "/mount-from" could be used a
+ combination for making sure that other trans unable remove others trans
+ mount data" this is a bad idea if we had to print the whole thing
+ <kuldeepdhaka> braunr, whats ur opinion?
+ <pinotree> you don't need a mtab to "unmount" things on hurd
+ <braunr> kuldeepdhaka: hum, have you read the project idea ?
+ <braunr>
+ http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/mtab/
+ <braunr> A more promising approach is to have mtab exported by a special
+ translator, which gathers the necessary information on demand. This could
+ work by traversing the tree of translators, asking each one for mount
+ points attached to it.
+ <kuldeepdhaka> pinotree, not to unmount, i mean is to remove the
+ <translation data>
+ <braunr> for a first implementation, i'd suggest a recursive traversal of
+ root-owned translators
+ <kuldeepdhaka> braunr, hum, but it did stated it as inefficient
+ <braunr> where ?
+ <kuldeepdhaka> para 5 , line 3
+ <kuldeepdhaka> and line 6
+ <braunr> no
+ <braunr> traversing "all" nodes would be inefficient
+ <braunr> translators which host the nodes of other translators could
+ maintain a simple list of active translators
+ <braunr> ext2fs, e.g. (if that's not already the case) could keep the list
+ of the translators it started
+ <braunr> we can already see that list with pstree for example
+ <braunr> but this new list would only retain those relevant for mtab
+ <braunr> i.e. root-owned ones
+ <pinotree> i would not limit to those though
+ <braunr> and then filter on their type (e.g. file system ones)
+ <braunr> pinotree: why ?
+ <pinotree> this way you could have proper per-user /proc/$pid/mounts info
+ <braunr> we could also very easily have a denial of service
+ <kuldeepdhaka> but how will the mount point and source point will be
+ listed?
+ <braunr> they're returned by the translator
+ <kuldeepdhaka> k
+ <braunr> you ask /, it returns its store and its options, and asks its
+ children recursively
+ <braunr> a /home translator would return its store and its options
+ <braunr> etc..
+ <braunr> each translator would build the complete path before returning it
+ <braunr> sort of, it's very basic
+ <braunr> but that would be a very hurdish way to do it
+ <kuldeepdhaka> shall /etc/mtab should be made seek-able and what should be
+ the filesize? content are generated on demand so, it could arise problem
+ (fsize:0 , seek-able:no), ur opinions?
+ <braunr> kuldeepdhaka: it should have all the properties of a regular file
+ <braunr> the filesize would be determined after it's generated
+ <braunr> being empty doesn't imply it's not seekable
+ <kuldeepdhaka> content is generated on demand so, could cause problem while
+ seeking and filesize, shall i still program as regular file?
+ <kuldeepdhaka> in two different read, it could generate different content,
+ though same seek pos is used...
+ <braunr> what ?
+ <braunr> the content is generated on open
+ <kuldeepdhaka> ooh, ok
+
+
+# IRC, freenode, #hurd, 2013-06-04
+
+ <safinaskar> how to see list of all connected translators?
+ <braunr> you can't directly
+ <braunr> you can use ps to list processes and guess which are translators
+ <braunr> (e.g. everything starting with /hurd/)
+ <braunr> a recursive call to obtain such a list would be useful
+ <braunr> similar to what's needed to implement /proc/mounts
+
+
+# IRC, freenode, #hurd, 2013-06-25
+
+In context of [[open_issues/mig_portable_rpc_declarations]].
+
+ <teythoon> should I go for an iterator like interface instead?
+ <teythoon> btw, what's the expected roundtrip time?
+ <braunr> don't think that way
+ <braunr> consider the round trip delay as varying
+ <teythoon> y, is it that bad?
+ <braunr> no
+ <braunr> but the less there is the better
+ <braunr> we think the same with system calls even if they're faster
+ <braunr> the delay itself isn't the real issue
+ <braunr> look at how proc provides information
+ <braunr> (in procfs for example)
+
+
+## IRC, freenode, #hurd, 2013-06-26
+
+ <teythoon> so tell me about the more hurdish way of dealing with that issue
+ <teythoon> creating a specialized translator for this?
+ <braunr> 11:45 < pinotree> there's also
+ http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/mtab/
+ about that topic
+ <braunr> you need to avoid thinking with centralization in mind
+ <braunr> the hurd is a distributed system in practice
+ <braunr> i think proc is the only centralized component in there
+ <teythoon> braunr: would having an mtab translator and having fs
+ translators register to thae be acceptable?
+ <teythoon> that*
+ <braunr> teythoon: why do you want to centralize it ?
+ <braunr> translators already register themselves when they get attached to
+ a node
+ <braunr> we don't want an additional registration
+ <braunr> have you read the link we gave you ?
+ <teythoon> I did and I got the message, but isn't the concept of
+ /proc/mounts or a mtab file already a centralized one?
+ <braunr> that doesn't mean the implementation has to be
+ <braunr> and no, i don't think it's centralized actually
+ <braunr> it's just a file
+ <braunr> you can build a file from many sources
+ <teythoon> or if we do it your way recursing on fs translators *but*
+ restricting this to root owned translators also suffering from
+ centralization wrt to the root user? I mean the concept of all mounted
+ filesystems does not apply cleanly to the hurd
+ <braunr> i don't understand
+ <braunr> restricting to the root user doesn't mean it's centralized
+ <braunr> trust has nothing to do with being centralized
+ <teythoon> I guess I'm not used to thinking this way
+ <braunr> teythoon: i guess that's the main reason why so few developers
+ work on the hurd
+ <teythoon> also the way fs notification is done is also centralized, that
+ could also be done recursively
+ <braunr> what doyou call fs notification ?
+ <teythoon> and the information I need could just be stuffed into the same
+ mechanism
+ <teythoon> fs translators being notified of system shutdown
+ <braunr> right
+ <braunr> that gets a bit complicated because the kernel is also a
+ centralized component
+ <braunr> it knows every memory object and their pagers
+ <braunr> it manages all virtual memory
+ <braunr> there are two different issues here
+ <braunr> syncing memory and shutting down file systems
+ <braunr> the latter could be done recursively, yes
+ <braunr> i wonder if the former could be delegated to external pagers as
+ well
+ <braunr> teythoon: but that's not the focus of your work aiui, it would
+ take much time
+ <teythoon> sure, but missing an mtab file or better yet /proc/mounts could
+ be an issue for me, at least a cosmetic one, if not a functional one
+ <braunr> i understand
+ <teythoon> and hacking up a quick solution for that seemed like a good
+ exercise
+ <braunr> i suggest you discuss it with your mentors
+ <braunr> they might agree to a temporary centralized solution
+ <braunr> although i don't think it's much simpler than the recursive one
+ <teythoon> braunr: would that be implemented in libdiskfs and friends?
+ <braunr> teythoon: i'm not sure, it might be a generic fs operation
+ <braunr> libnetfs etc.. are also mount points
+ <teythoon> so where would it go if it was generic?
+ <braunr> libfshelp perhaps
+ <teythoon> translator startup is handled in start-translator-long.c, so in
+ case a startup is successful, I'd add it to a list?
+ <braunr> i'd say so, yes
+ <teythoon> would that cover all cases, passive and active translators?
+ <braunr> that's another question
+ <braunr> do we consider passive translators as mounted ?
+ <teythoon> ah, that was not what i meant
+ <braunr> i know
+ <braunr> but it's related
+ <teythoon> start b/c of accessing a passive one vs. starting an active one
+ using settrans
+ <braunr> start_translator_xxx only spawn active translators
+ <braunr> it's the same
+ <teythoon> ok
+ <braunr> the definition of a passive translator is that it starts the
+ active translator on access
+ <teythoon> yeah I can see how that wouldn't be hard to implement
+ <braunr> i think we want to include passive translators in the mount table
+ <braunr> so registration must happen before starting the active one
+ <teythoon> so it's a) keeping a list of active translators and b) add an
+ interface to query fs translators for this list and c) an interface to
+ query mtab style information?
+ <braunr> keeping a list of all translators attached
+ <braunr> and yes
+ <braunr> well
+ <braunr> a is easy
+ <braunr> b is the real work
+ <braunr> c would be procfs using b
+ <teythoon> oh? I thought recursing on the translators and querying info
+ would be separate operations?
+ <braunr> why so ?
+ <braunr> the point is querying recursively :)
+ <braunr> and when i say recursively, it's only a logical view
+ <teythoon> ok, yes, it can be implemented this way, so we construct the
+ list while recursing on the translators
+ <braunr> i think it would be better to implement it the way looking up a
+ node is done
+ <teythoon> in a loop, using a stack?
+ <braunr> iteratively
+ <braunr> a translator would provide information about itself (if
+ supported), and referrences to translators locally registered to it
+ <teythoon> could you point me to the node lookup?
+ <teythoon> ah, yes
+ <braunr> eg., you ask /, it tells you it's on /dev/hd0, read-write, with
+ options, and send rights to /home, /proc, etc..
+ <braunr> well rights, references
+ <braunr> it could be the path itself
+ <teythoon> rights as in a port to the translators?
+ <braunr> i think the path would be better but i'm not sure
+ <braunr> it would also allow you to check the permissions of the node
+ before querying
+ <teythoon> path would be nicer in the presence of stacked translators
+ <braunr> and obviously you'd have the path right away, no need to provide
+ it in the reply
+ <teythoon> true
+
+ <teythoon> braunr: if we want to list passive translators (and I agree, we
+ should), it isn't sufficient to touch libfshelp, as setting a passive
+ translator is not handled there, only the startup
+ <braunr> teythoon: doesn't mean you can't add something there that other
+ libraries will use
+ <braunr> so yes, not sufficient
+
+
+## IRC, freenode, #hurd, 2013-06-29
+
+ <teythoon> braunr: diskfs_S_fsys_set_options uses diskfs_node_iterate to
+ recurse on active translators if do_children is given
+ <teythoon> braunr: I wonder how fast that is in practice
+ <teythoon> braunr: if it's fast enough, there might not even be a need for
+ a new function in fsys.defs
+ <teythoon> and no need to keep a list of translators for that reason
+ <teythoon> braunr: if it's not fast enough, then diskfs_S_fsys_set_options
+ could use the list to speed this up
+ <braunr> teythoon: on all nodes ?
+ <teythoon> braunr: i believe so, yes, see libdiskfs/fsys-options.c
+ <braunr> teythoon: well, if it really is all node, you clearly don't want
+ that
+
+
+## IRC, freenode, #hurd, 2013-07-01
+
+ <teythoon> I've ment to ask, the shiny new fsys_get_translators interface,
+ should it return the options for the queried translator or not?
+ <braunr> i don't think it should
+ <teythoon> ok
+ <braunr> let's walk through why it shouldn't
+ <teythoon> may I assume that the last argument returned by fsys_get_options
+ is the "source"?
+ <braunr> how would you know these options ?
+ <braunr> the source ?
+ <teythoon> I wouldn't actually
+ <braunr> yes, you wouldn't
+ <braunr> you'd have to ask the translators for that
+ <braunr> so the only thing you can do is point to them
+ <teythoon> well, the device to include in the mtab file
+ <braunr> and the client asks
+ <braunr> i don't know fsys_get_options tbh
+ <teythoon> well, both tmpfs and ext2fs print an appropriate value for
+ "device" as last argument
+ <braunr> looks like a bad interface to me
+ <braunr> options should be options
+ <braunr> there should be a specific call for the device
+ <braunr> but if everyone agrees with the options order, you can do it that
+ way for now i guess
+ <teythoon> one that could be used to recreate the "mount" using either
+ mount or settrans
+ <braunr> just comment it where appropriate
+ <teythoon> I thought that'd be the point?
+ <braunr> ?
+ <teythoon> % fsysopts tmp
+ <teythoon> /hurd/tmpfs --writable --no-inherit-dir-group --no-sync 48K
+ <braunr> where is the device ?
+ <teythoon> % settrans -ca tmp $(fsysopts tmp)
+ <braunr> 15:56 < teythoon> well, both tmpfs and ext2fs print an appropriate
+ value for "device" as last argument
+ <teythoon> 48K
+ <braunr> i don't see it
+ <braunr> really ?
+ <teythoon> yes
+ <braunr> what about ext2fs ?
+ <braunr> hm ok i see
+ <teythoon> % fsysopts /
+ <teythoon> ext2fs --writable --no-inherit-dir-group --sync=10
+ --store-type=typed device:hd0s1
+ <braunr> i don't think you should consider that as devices
+ <braunr> but really translator specific options
+ <pinotree> agree
+ <teythoon> I don't ;)
+ <teythoon> b/c the translator calling convention is hardcoded in the mount
+ utility
+ <braunr> ?
+ <teythoon> I think it's reasonable to assume that this mapping can be
+ reversed
+ <pinotree> theorically you can write a translator that takes no arguments,
+ but just options
+ <braunr> the 48K string for tmpfs is completely meaningless
+ <braunr> in fstab, it should be none
+ <pinotree> "tmpfs"
+ <braunr> the linux equivalent is the size option
+ <braunr> no, none
+ <braunr> it's totally ignored
+ <braunr> and it's recommended to set none rather than the type to avoid
+ confusion
+ <teythoon> u sure?
+ <teythoon> % settrans -cga tmp /hurd/tmpfs --mode=666 6M
+ <teythoon> % settrans -cga tmp /hurd/tmpfs --mode=666 6M
+ <teythoon> % fsysopts tmp
+ <teythoon> /hurd/tmpfs --writable --no-inherit-dir-group --no-sync 6M
+ <braunr> i've not explained myself clearly
+ <braunr> it's not ignored by the translator
+ <braunr> but in fstab, it should be in the options field
+ <braunr> it's not the source
+ <braunr> clearly not
+ <teythoon> ah
+ <braunr> now i'm talking about fstab, but iirc the format is similar in
+ mtab/mounts
+ <pinotree> close, but not the same
+ <braunr> yes, close
+ <teythoon> ok, so I'll put a method into libfshelp so that translators can
+ explicitly set a device and patch all existing translators to do so?
+ <braunr> teythoon: what i meant is that, for virtual vile systems (actually
+ file systems with no underlying devices), the device field is normally
+ ignored
+ <braunr> teythoon: why do you need that for exactly
+ <teythoon> right
+ <pinotree> do they even have a "device" field?
+ <braunr> (i can see why but i'd like more visibility)
+ <braunr> pinotree: not yet
+ <braunr> pinotree: that's what he wants to add
+ <braunr> but i'd like to see if there is another way to get the information
+ <braunr> 16:05 < braunr> teythoon: why do you need that for exactly
+ <teythoon> well if I'm constructing a mtab entry I need a value for the
+ device field
+ <braunr> do we actually need it to be valid ?
+ <teythoon> not necessarily I guess
+ <braunr> discuss it with your mentors then
+ <youpi> it has to be valid for e2fsck checks etc.
+ <braunr> doesn't e2fsck check fstab actually ?
+ <youpi> i.e. actually for the cases where it's trivial
+ <youpi> fstab doesn't tell it whether it's mounted
+ <youpi> I mean fsck checking whether it's mounted
+ <youpi> not fsck -a
+ <braunr> oh
+ <braunr> couldn't we ask the device instead ?
+ <braunr> looks twisted too
+ <youpi> that'd mean patching a lot of applications which do similar checks
+ <braunr> yes
+ <braunr> teythoon: propose an interface for that with your mentors then
+ <teythoon> yeah, but couldn't you lay it out a little, I mean would it be
+ one procedure or like three?
+ <braunr> 16:04 < teythoon> ok, so I'll put a method into libfshelp so that
+ translators can explicitly set a device and patch all existing
+ translators to do so?
+ <teythoon> ok
+ <braunr> why three ?
+ <teythoon> no, I mean when adding stuff to fsys.defs
+ <braunr> i understood that
+ <braunr> but why three ? :)
+ <teythoon> it'd be more generic
+ <braunr> really ?
+ <braunr> please show a quick example of what you have in mind
+ <teythoon> i honestly don't know, thus I'm asking ;)
+ <braunr> well first, this device thing bothers me
+ <braunr> when you look at how we set up our ext2fs translators, you can see
+ they use device:xxx
+ <braunr> and not /dev/xxx
+ <braunr> but ok, let's assume it's harmless
+ <teythoon> ok, but isn't the first way actually better?
+ <braunr> i think it ends up being the same
+ <braunr> ideally, that's what we want to use as device path
+ <teythoon> but you can recreate a storeio translator using the device:xxx
+ info, the node is useless for that
+ <braunr> so that we don't need to explicitely set it
+ <braunr> ?
+ <braunr> what do you mean ?
+ <teythoon> well, fsysopts / tells currently tells me device:hd0s1
+ <braunr> for /, there isn't much choice
+ <braunr> /dev isn't there yet
+ <teythoon> ah, got it
+ <teythoon> that's why it differs...
+ <braunr> differs ?
+ <braunr> from what ?
+ <braunr> other ext2fs translators are set the same way by the debian
+ installer for example
+ <teythoon> % fsysopts /media/scratch
+ <teythoon> /hurd/ext2fs --writable --no-inherit-dir-group /dev/hd1s1
+ <teythoon> here it uses the path to the node
+ <braunr> that's weird
+ <braunr> was that done by the debian installer ?
+ <teythoon> ah no, that was me
+ <braunr> :p
+ <braunr> $ fsysopts /home
+ <braunr> /hurd/ext2fs --writable --no-inherit-dir-group --store-type=device
+ hd0s6
+ <braunr> so as you can see, it's not that simple to infer the device path
+ <teythoon> oho, yet another way ;)
+ <teythoon> right then
+ <pinotree> isn't device:hd0s1 as shortcut for specifying the store type, as
+ done with --store-type=device hd0s1?
+ <braunr> but perhaps we don't need to
+ <braunr> yes it is
+ <pinotree> iirc it's something libstore does, per-store prefixes
+ <braunr> ah that sucks
+ <braunr> teythoon: you may need to normalize those strings
+ <braunr> so that they match what's in fstab
+ <braunr> i.e. unix /dev paths
+ <braunr> otherwise e2fsck still won't be able to find the translators
+ mounting the device
+ <braunr> well, if it's mounted actually
+ <braunr> it just needs to find the matching line in mtab aiui
+ <braunr> so perhaps a libfshelp function for that, yes
+ <teythoon> braunr: so you suggest adding a normalizing function to
+ libfshelp that creates a /dev/path?
+ <braunr> yes
+ <braunr> used by the call you intend to add, which returns that device
+ string as found in fstab
+ <teythoon> found in fstab? so this would only work for translators managed
+ by fstab?
+ <braunr> no
+ <teythoon> ah
+ <teythoon> a string like the ones found in fstab?
+ <braunr> yes
+ <braunr> so that fsck and friends are able to know whether a device is
+ mounted or not
+ <braunr> i don't see any other purpose for that string in mtab
+ <braunr> you'd take regular paths as they are, convert device:xxx to
+ /dev/xxx, and return "none" for the rest i suppose
+ <teythoon> ok
+ <braunr> i'm not even sure it's right
+ <braunr> youpi: are you sure it's required ?
+ <teythoon> well it's a start and I think it's not too much work
+ <braunr> aiui, e2fsck may simply find the mount point in fstab, and ask the
+ translator if it's mounted
+ <teythoon> we can refine this later on maybe?
+ <braunr> or rather, init scripts, using mountpoint, before starting e2fsck
+ <braunr> teythoon: sure
+ <teythoon> there's this mountpoint issue... I need to run fsysopts /
+ --update early in the boot process
+ <teythoon> otherwise the device ids returned by stat(2)ing / are wrong and
+ mountpoint misbehaves
+ <teythoon> i guess b/c it's the rootfs
+ <braunr> device ids ?
+ <teythoon> % stat / | grep Device
+ <teythoon> Device: 3h/3d Inode: 2 Links: 22
+ <braunr> do you mean the major/minor identifiers ?
+ <teythoon> I do. if I don't do the --update i get seemingly random values
+ <braunr> i guess that's expected
+ <braunr> we don't have major/minor values
+ <braunr> well, they're emulated
+ <teythoon> well, if that's fixable, that'd be really nice ;)
+ <braunr> we'll never have major/minor values
+ <teythoon> yeah, I understand that
+ <braunr> but they could be fixed by MAKEDEV when creating device nodes
+ <teythoon> but not having to call fsys_set_options on the rootfs to get the
+ emulation up to speed
+ <braunr> try doing it from grub
+ <braunr> not sure it's possible
+ <braunr> but worth checking
+ <teythoon> by means of an ext2fs flag?
+ <braunr> yes
+ <braunr> if there is one
+ <braunr> i don't know the --update flag, is it new from your work ?
+ <teythoon> braunr: no, it's been there before. -oremount gets mapped to
+ that
+ <braunr> it's documented by fsysopts, but not by the ext2fs translators
+ <teythoon> libdiskfs source says something about flushing buffers iirc
+ <braunr> -s
+ <braunr> what does it do ?
+ <braunr> teythoon: ok
+ <teythoon> braunr: so the plan is to automatically generate a device path
+ from the translators argz vector but to provide the functionality so
+ translators can set a more appropriate value? did I get the last part of
+ the discussion right?
+ <braunr> not set, return
+ <teythoon> yeah return from the procedure but settable using libfshelp?
+ <braunr> why settable ?
+ <braunr> you'd have a fsys call to obtain the dev string, and the server
+ side would call libfshelp on the fly to obtain a normalized value and
+ return it
+ <teythoon> ah, make a function overrideable that returns an appropriate
+ response?
+ <braunr> overrideable ?
+ <teythoon> like netfs_append_args
+ <braunr> you wouldn't change the command line, no
+ <teythoon> isn't that done using weak references or something?
+ <teythoon> no I know
+ <braunr> sorry i'm lost then
+ <teythoon> never mind, I'll propose a patch early to get your feedback
+ <youpi> braunr: am I sure that _what_ is required?
+ <youpi> the device?
+ <youpi> e2fsck surely needs it, yes
+ <braunr> a valid device path, yes
+ <youpi> it can't rely only on fstab
+ <braunr> yes
+ <youpi> since users may mount things by hand
+ <braunr> i've used strace on it and it does perform lookups there
+ <braunr> (although i also saw uuid magic that i guess wouldn't work yet on
+ the hurd)
+
+
+## IRC, freenode, #hurd, 2013-07-03
+
+ <teythoon> I added a procedure to fsys.defs, added a server stub to my
+ tmpfs translator and wrote a simple client, but something hasn't picked
+ up the new message yet
+ <teythoon> % ./mtab tmp
+ <teythoon> ./mtab: get_translator_info: (ipc/mig) bad request message ID
+ <teythoon> I guess it's libhurduser.so from glibc, not sure though...
+ <braunr> glibc would only have the client calls
+ <braunr> what is "% ./mtab tmp" ?
+ <teythoon> mtab is my mtab tool/soon to be a translator testing thing, tmp
+ is an active tmpfs with the appropriate server stub
+ <braunr> so mtab has the client call, right ?
+ <teythoon> yes
+ <braunr> then tmpfs doesn't
+ <teythoon> so what to do about it?
+ <teythoon> i set LD_LIBRARY_PATH to my hurd builds lib dir, is that
+ preserved by settrans -a?
+ <pinotree> not really
+ <braunr> not at all
+ <braunr> there is a wiki entry about that iirc
+ <pinotree> http://darnassus.sceen.net/~hurd-web/hurd/debugging/translator/
+ <teythoon> yeah, I read it too once
+ <teythoon> ah
+ <braunr> on the other hand, using export to set the environment should do
+ the work
+ <teythoon> yes, that did the trick, thanks :)
+ * teythoon got his EOPNOPSUPP... *nomnomnom
+ <braunr> ?
+ <braunr> same error ?
+ <teythoon> well I stubbed it out
+ <braunr> oh
+ <teythoon> no, that's what I've been expecting ;)
+ <pinotree> great
+ <braunr> :)
+ <braunr> yes that's better than "mig can't find it"
+ <teythoon> braunr: in that list of active and passive translators that will
+ have to be maintained, do you expect it should carry more information
+ other than the relative path to that translator?
+ <braunr> like what ?
+ <teythoon> dunno, maybe like a port to any active translator there
+ <teythoon> should we care if any active translator dies and remove the
+ entry if there's no passive translator that could restart it again?
+ <braunr> don't add anything until you see it's necessary or really useful
+ <braunr> yes
+ <braunr> think of something like sshfs
+ <braunr> when you kill it, it's not reported by mount any more
+ <teythoon> well, for a dynamically allocated list of strings I could use
+ the argz stuff, but if we'd ever add anything else, we'd need a linked
+ list or something, maybe a hash table
+ <teythoon> yes, I thought that'd be useful
+ <braunr> use libihash for no
+ <braunr> now
+ <teythoon> braunr: but what would I use as keys? the relative path should
+ be unique (unless translators are stacked... hmmm), but that's the value
+ I'd like to store and ihash keys are pointers
+ <teythoon> stacked translators are an kinda interesting case for mtab
+ anyways...
+ <braunr> why not store the string address ?
+ <braunr> i suppose that, for stacked translators, the code querying
+ information would only return the topmost translator
+ <braunr> since this is the one which matters for regular clients (if i'm
+ right)
+ <teythoon> wouldn't that map strings that are equal but stored at different
+ locations to different values?
+ <teythoon> that'd defeat the point
+ <teythoon> I suppose so, yes
+ <braunr> then add a layer that looks for existing strings before adding
+ <braunr> the list should normally be small so a linear lookup is fine
+ <teythoon> yeah sure, but then there's little advantage of using ihash in
+ the first place, isn't it?
+ <braunr> over what ?
+ <teythoon> over not using it at all
+ <braunr> how would you store the list then ?
+ <teythoon> it's either ll or ll+ihash
+ <braunr> uh no
+ <braunr> let me check
+ <braunr> there is ihash_iterate
+ <braunr> so you don't need a linked list
+ <teythoon> so how do I store my list of strings to deduplicate the keys?
+ <braunr> you store pointers
+ <braunr> and on addition, you iterate over all entries, making sure none
+ matches the new one
+ <braunr> and if it does, you replace it i guess
+ <braunr> depending on how you design the rest
+ <teythoon> in an dynamically allocated region of memory?
+ <braunr> i don't understand
+ <braunr> your strings should be dynmaically allocate, yes
+ <teythoon> no the array of char *
+ <braunr> your data structure being managed by libihash, you don't care
+ about allocation
+ <braunr> what array ?
+ <teythoon> ah, got it...
+ <teythoon> right.
+ <braunr> there is only one structure here, an ihash of char *
+ <teythoon> yes, I got the picture ;)
+ <braunr> goo
+ <braunr> d
+ <braunr> actually, the lookup wouldn't be linear since usually, hash tables
+ have stale entries
+ <teythoon> heh... what forest?!?
+ <braunr> but that's ok
+ <braunr> teythoon: ?
+ <teythoon> the one I couldn't make out b/c of all the trees...
+ <braunr> ?
+ <teythoon> ah, it's not important. there is this saying over here, not sure
+ if there's an english equivalent
+ <braunr> ok got it
+ <braunr> we have the same in french
+ <teythoon> I ran into a problem with my prototype
+ <teythoon> if an translator is set in e. g. diskfs_S_file_set_translator,
+ how do I get the path to that node?
+ <teythoon> I believe there cannot be a way to do that, b/c the mapping is
+ not bijective
+ <braunr> it doesn't have to be
+ <teythoon> ok, so how do I get *a* path for this node?
+ <braunr> that's another question
+ <braunr> do you see how the node is obtained ?
+ <braunr> np = cred->po->np;
+ <teythoon> yes
+ <braunr> the translation occurred earlier
+ <braunr> you need to find where
+ <braunr> then perhaps, you'll need to carry the path along
+ <braunr> or if you're lucky, it will still be there somewhere
+ <teythoon> the translation from path to node?
+ <braunr> yes
+ <teythoon> doesn't that happen in the client? and the client hands a file_t
+ to the file_set_translator routine?
+ <braunr> the relative lookup can't happen in the client
+ <braunr> the server can (and often does) retain information between two
+ RPCs
+ <teythoon> uh, I can access information from a previous rpc? is that
+ considered safe?
+ <braunr> think of open() then read()
+ <braunr> a simple int doesn't carry enough information
+ <braunr> that's why it's a descriptor
+ <teythoon> ah, the server retains some state, sure
+ <braunr> what it refers to is the state retained between several calls
+ <braunr> the object being invoked by clients
+ <braunr> teythoon: what is the "passive" parameter passed to
+ diskfs_S_file_set_translator ?
+ <teythoon> braunr: argz vector of the passive translator
+ <braunr> so it is a name
+ <braunr> but we also want active translators
+ <braunr> and what is active ?
+ <teythoon> not the name of the node though
+ <teythoon> active is the port (?) to the active translator
+ <teythoon> I guess
+ <braunr> fsys_t, looks that way yes
+ <braunr> i suppose you could add the path to the peropen structure
+ <teythoon> ok
+ <braunr> see diskfs_make_peropen
+ <teythoon> braunr: but translation happens in dir_lookup
+ <teythoon> in all places I've seen diskfs_make_peropen used, the path is
+ not available
+ <teythoon> why did you point me to diskfs_make_peropen?
+ <teythoon> s/dir_lookup/diskfs_lookup/
+ <teythoon> diskfs_lookup operates on struct node, so the path would have to
+ be stored there, right?
+ <braunr> teythoon: dir_lookup should call diskfs_make_peropen
+ <braunr> at least diskfs_S_dir_lookup does
+ <braunr> and the path is present there
+ <teythoon> braunr: right
+
+ <teythoon> hrm... I added a path field to struct peropen and initialize it
+ properly in diskfs_make_peropen, but some bogus values keep creeping in
+ :/
+ <braunr> first of all, make it a dynamically allocated string
+ <teythoon> it is
+ <braunr> not a fixed sized embedded array
+ <braunr> good
+ <teythoon> yes
+ <braunr> if you really need help debugging what's happening, feel free to
+ post your current changes somewhere
+ <teythoon> there is a struct literal in fsys-getroot.c, but i fixed that as
+ well
+ <teythoon> % ./mtab tmp
+ <teythoon> none tmp ../tmpfs/tmpfs writable,no-inherit-dir-group,no-sync 0
+ 0
+ <teythoon> none tmp/bar ../tmpfs/tmpfs
+ writable,no-inherit-dir-group,no-sync 0 0
+ <teythoon> none tmp/foo ../tmpfs/tmpfs
+ writable,no-inherit-dir-group,no-sync 0 0
+ <teythoon> none tmp/foo/bar ../tmpfs/tmpfs
+ writable,no-inherit-dir-group,no-sync 0 0
+ <teythoon> :)
+
+
+## IRC, freenode, #hurd, 2013-07-10
+
+ <teythoon> btw, I read getcwd.c and got the idea
+ <teythoon> however this situation is different afaict
+ <teythoon> getcwd has a port to the current working directory, right?
+ <teythoon> so they can do open_dir with .. as relative path
+ <teythoon> but all I've got is a port referencing the node the translator
+ is being attached to
+ <teythoon> s/open_dir/dir_lookup/
+ <teythoon> and that is not necessarily a directory, so dir_lookup fails
+ with not a directory
+ <teythoon> as far as I can see it is not possible to get the directory a
+ node is in from a port referencing that node
+ <teythoon> dir_lookup has to be handled by all nodes, not just directories
+ <teythoon> but file nodes only support "looking up" the empty string
+ <teythoon> not empty, but null:
+ <teythoon> This call is required to be supported by all files (even
+ non-directories) if the filename is null, and should function in that
+ case as a re-open of the file. */
+ <braunr> why do you want the directory ?
+ <braunr> 10:40 < teythoon> as far as I can see it is not possible to get
+ the directory a node is in from a port referencing that node
+ <teythoon> to readdir(3) it and figure out the name of the node the
+ translator is bound to
+ <teythoon> similar to what getcwd does
+ <braunr> that's out of the question
+ <teythoon> wasn't that was youpi was suggesting?
+ <braunr> you may have a lot of nodes in there, such a lookup shouldn't be
+ done
+ <braunr> i didn't see that detail
+ <teythoon> "│ Concerning storing the path, it's a bit sad to have to do
+ that, and
+ <teythoon> │ it'll become wrong if one moves the mount points. Another
+ way would
+ <teythoon> │ be to make the client figure it out by itself from a port to
+ the mount
+ <teythoon> │ point, much like glibc/sysdeps/mach/hurd/getcwd.c. It'll be
+ slower, but
+ <teythoon> │ should be safer. The RPC would thus return an array of
+ ports to the
+ <teythoon> │ mount points instead of an array of strings.
+ <braunr> yes i remember that
+ <braunr> but i didn't understand well how getcwd work
+ <braunr> s
+ <braunr> another scalability issue
+ <braunr> not a big one though, we rarely have translators in directories
+ with thousands of nodes
+ <braunr> so why not
+ <braunr> teythoon: do it as youpi suggested
+ <braunr> well if you can
+ <braunr> eh
+ <braunr> if not, i don't know
+ <braunr> 10:47 < teythoon> │ it'll become wrong if one moves the mount
+ points. Another way would
+ <teythoon> yes, I know... :/
+ <teythoon> well, I'm not even sure it is possible to get the directory a
+ node is in from the port referencing the node
+ <teythoon> as in, I'm not sure if the information is even there
+ <teythoon> b/c a filesystem is a tree, directories are nodes and files are
+ leafs
+ <teythoon> all non-leaf nodes reference their parent to allow traversing
+ the tree starting from any directory
+ <teythoon> but why would a leaf reference its parent(s - in case of
+ hardlinks)?
+ <braunr> uh, for the same reason ?
+ <teythoon> sure, it would be nice to do that, but I dont think this is
+ possible on unixy systems
+ <braunr> ?
+ <teythoon> you cannot say fchdir(2) to a fd that references a file
+ <braunr> do you mean /path/to/file/../ ?
+ <teythoon> yes
+ <teythoon> only that /path/to/file is given as fd or port
+ <braunr> when i pasted
+ <braunr> 10:49 < braunr> 10:47 < teythoon> │ it'll become wrong if one
+ moves the mount points. Another way would
+ <braunr> i was actually wondering if it was true
+ <teythoon> ah
+ <braunr> why can't the path be updated at the same time ?
+ <braunr> it's a relative path anyway
+ <braunr> completely managed by the parent translator
+ <teythoon> ah
+ <teythoon> right
+ <teythoon> it's still kind of hacky, but I cannot see how to do this
+ properly
+ <braunr> hacky ?
+ <teythoon> but yes, updating the path should work I guess
+ <teythoon> or sad
+ <braunr> what i find hacky is to set translators in two passes
+ <braunr> otherwise we'd only keep the translator paths
+ <braunr> not all paths
+ <teythoon> true
+ <braunr> but then, it only concerns open nodes
+ <braunr> and again, there shouldn't be too many of them
+ <braunr> so actually it's ok
+ <teythoon> braunr: I understand the struct nodes are cached in libdiskfs,
+ so wouldn't it be easier to attach the path to that struct instead of
+ struct peropen so that all peropen objects reference the same node
+ object?
+ <teythoon> so that the path can be updated if anyone dir_renames it
+ <teythoon> *all peropen objects derived from the same file name that is
+ <braunr> teythoon: i'm not sure
+ <braunr> nodes could be real nodes (i.e. inodes)
+ <braunr> there can be several paths for the same inode
+ <teythoon> braunr: I'm aware of that, but didn't we agree the other day
+ that any path would do?
+ <braunr> i don't remember we did
+ <braunr> i don't know the details well, but i don't think setting a
+ translator on a hard link should set the translator at the inode level
+ <braunr> on the other hand, if a new inode is created to replace the
+ previous one (or stack over it), then storing the path there should be
+ fine
+ <teythoon> braunr: I don't think I can update the paths if they're stored
+ in the peropen struct
+ <teythoon> how would I get a reference to all those peropen objects?
+ <braunr> ?
+ <braunr> first, what's the context when you talkb about updating paths ?
+ <teythoon> well, youpi was concerned about renaming a mount point
+ <teythoon> and you implied that this could be managed
+ <braunr> can we actually do that btw ?
+ <teythoon> what?
+ <braunr> renaming a mount point
+ <teythoon> yep, just tried
+ <braunr> i mean, on a regular unix system like linux
+ <braunr> $ mv test blah
+ <braunr> mv: cannot move `test' to `blah': Device or resource busy
+ <braunr> (using sshfs so YMMV)
+ <pinotree> do you have anything (shells, open files, etc) inside it?
+ <braunr> no
+ <braunr> i'll try with an empty loop-mounted ext4
+ <teythoon> I was testing on the Hurd, worked fine there even with a shell
+ inside
+ <braunr> same thing
+ <braunr> i consider it a bug
+ <braunr> we may want to check what posix says about it
+ <teythoon> o_O
+ <braunr> and decide not to support renaming
+ <teythoon> why?
+ <pinotree> start a discussion in ml, maybe roland can chime in
+ <braunr> it complicates things
+ <braunr> ah yes
+ <teythoon> sure, but I can move or rename a directory, why should it be
+ different with a mount point?
+ <braunr> because it's two of them
+ <braunr> they're stacked
+ <braunr> if we do want to support that, we must be very careful about
+ atomically updating all the stack
+ <teythoon> ok
+ <teythoon> braunr: I'm trying to detect dying translators to remove them
+ from the list of translators
+ <teythoon> what port can I use for that purpose?
+ <teythoon> if I use the bootstrap port, can I then use the same method as
+ init/init.c uses? just defining a do_mach_notify_dead_name function and
+ the multiplexer will call this?
+ <braunr> teythoon: possibly
+ <teythoon> braunr: we'll see shortly...
+ <teythoon> I get KERN_INVALID_CAPABILITY indicating that my bootstrap port
+ is invalid
+ <teythoon> when calling mach_port_request_notification to get the dead name
+ notification I mean
+ <braunr> is the translator already started when you do that ?
+ <teythoon> yes, at least I think so, I'm hooking into
+ diskfs_S_file_set_translator and that gets an active translators port
+ <teythoon> also the mach docs suggests that the notification port is
+ invalid, not the name port referencing the translator
+ <braunr> i guess it shouldn't
+ <braunr> oh
+ <braunr> please show the code
+ <braunr> but beware, if the translator is started, assume it could die
+ immediately
+ <teythoon> braunr: http://paste.debian.net/15371/ line 87
+ <braunr> teythoon: notify can't be bootstrap
+ <braunr> what do you have in mind when writing this ?
+ <braunr> i'm not sure i follow
+ <teythoon> I want to be notified if an active translator goes away to
+ remove it from the list of translators
+ <braunr> ok but then
+ <braunr> create a send-once right
+ <braunr> and wait on it
+ <braunr> also, why do you want to be notified ?
+ <braunr> isn't this already done ?
+ <braunr> or can't do it lazily on access attempt ?
+ <braunr> +you
+ <teythoon> in the client?
+ <braunr> in the parent server
+ <braunr> what happens currently when a translator dies
+ <braunr> is the parent notified ?
+ <braunr> or does it give an invalid right ?
+ <teythoon> ah, i think so
+ <braunr> then you don't need to do it again
+ <teythoon> right, I overlooked that
+
+
+## IRC, freenode, #hurd, 2013-07-12
+
+ <teythoon> recursively traversing all translators from / turns out to be
+ more dangerous than I expected
+ <teythoon> ... if done by a translator bound somewhere below /...
+ <teythoon> my interpretation is that the mtab translator tries to talk to
+ itself and deadlocks
+ <teythoon> (and as a side effect the whole system kinda just stops...)
+
+
+## IRC, freenode, #hurd, 2013-07-15
+
+ <youpi> teythoon: did you discuss with braunr about returning port vs path
+ in fsys_get_children?
+ <teythoon> youpi: we did
+ <teythoon> as I wrote I looked at the getcwd source you pointed me at
+ <teythoon> and I started to code up something similar
+ <teythoon> but as far as I can see there's no way to tell from a port
+ referencing a file the directory this file is located in
+ <youpi> ah, right, there was a [0] mail
+ <youpi> teythoon: because it doesn't have a "..", right
+ <teythoon> about Neals concerns, he's right about not covering passive
+ translators very well
+ <teythoon> but the solution he proposed was similar to what I tried to do
+ first
+ <youpi> I don't like half-covering passive translators at all, to be honest
+ :)
+ <youpi> either covering them completely, or not at all, would be fine
+ <teythoon> and then braunr convinced me that the "recursive" approach is
+ more elegant and hurdish, and I came to agree with him
+ <teythoon> youpi: one could scan the filesystem at translator startup and
+ populate the list
+ <youpi> by "Neal's solution", you mean an mtab registry?
+ <teythoon> yes
+ <braunr> so, let's see what linux does when renaming parent directories
+ <teythoon> mount points you mean?
+ <youpi> teythoon: browsing the whole filesystem just to find passive
+ translators is costly
+ <youpi> teythoon, braunr: and that won't prevent the user from unexpectedly
+ starting other translators at will
+ <braunr> scary
+ <teythoon> youpi: but that requires the privilege to open the device
+ <youpi> the fact that a passive translator is set is nothing more than a
+ user having the intent of starting a translator
+ <braunr> linux retains the original path in the mount table
+ <youpi> heh
+ <teythoon> youpi: any unprivileged user can trigger a translator startup
+ <youpi> sure, but root can do that too
+ <youpi> and expect the system to behave nicely
+ <teythoon> but if I'm root and want to fsck something, I won't start
+ translators accessing the device just before that
+ <teythoon> but if there's a passive translator targetting the device,
+ someone else might do that
+ <youpi> root does not always completely control what he's doing
+ <youpi> linux for instance does prevent from mounting a filesystem being
+ checked
+ <teythoon> but still, including passive translators in the list would at
+ least prevent anyone starting an translator by accident, isn't that worth
+ doing then?
+ <youpi> if there's a way to prevent root too, that's better than having a
+ half-support for something which we don't necessarily really want
+ <youpi> (i.e. an exclusive lock on the underlying device)
+ <teythoon> right, that would also do the trick
+ <teythoon> btw, some programs or scripts seem to hardcode /proc/mounts and
+ procfs and I cannot bind a translator to /proc/mounts since it is
+ read-only and the node does not exist
+ <kilobug> IMHO automatically starting translators is a generic feature, and
+ passive translator is just a specific instance of it; but we could very
+ well have, like an "autofs" that automatically start translators in tar
+ archives and iso images, allowing to cd into any tar/iso on the system;
+ implementing such things is part of the Hurd flexibility, the "core
+ system" shouldn't be too aware on how translators are started
+ <youpi> so in the end, storing where the active translator was started
+ first seems okayish according to what linux has been exposing for decades
+ <youpi> kilobug: indeed
+ <teythoon> it could serve a mounts with a passive translator by default, or
+ a link to /run/mtab, or an simple file so we could bind a translator to
+ that node
+ <youpi> I'd tend to think that /proc/mounts should be a passive translator
+ and /run/mtab / /etc/mtab a symlink to it
+ <youpi> not being to choose the translator is a concern however
+ <teythoon> ok, I'll look into that
+ <youpi> it could be an empty file, and people be able to set a translator
+ on it
+ <teythoon> if it had a passive translator, people still could bind their
+ own translator to it later on, right?
+ <teythoon> afaics the issue currently is mostly, that there is no mounts
+ node and it is not possible to create one
+ <youpi> right
+ <teythoon> cool
+ <youpi> so with the actual path, you can even check for caller's permission
+ to read the path
+ <youpi> i.e. not provide any more information than the user would be able
+ to get from browsing by hand
+ <teythoon> sure, that concern of Neil's is easy to address
+ <youpi> I'm not so much concerned by stale paths being shown in mtab
+ <youpi> the worst that can happen is a user not being able to umount the
+ path
+ <youpi> but he can settrans -g it
+ <youpi> (which he can't on linux ;) )
+ <teythoon> yes, and the device information is still valid
+ <youpi> yes
+ <braunr> despite the parent dir being renamed, linux is still able to
+ umount the new path
+ <teythoon> and so is our current umount
+ <braunr> good
+ <teythoon> (if one uses the mount point as argument)
+ <braunr> what's the current plan concerning /proc/mounts ?
+ <teythoon> serving a node with a passive translator record
+ <braunr> ?
+ <teythoon> so that /hurd/mtab / is started on access
+ <braunr> i mean, still planning on using the recursive approach instead of
+ a registry ?
+ <teythoon> ah
+ <teythoon> I do not feel confident enough to decide this, but I agree with
+ you, it feels elegant
+ <teythoon> and it works :)
+ <teythoon> modulo the translator deadlocking if it talks to itself, any
+ thoughts on that?
+ <youpi> it is a non-threaded translator I guess?
+ <teythoon> currently yes
+ <youpi> making it threaded should fix the issue
+ <teythoon> I tried to make the mtab translator multithreaded but that
+ didn't help
+ <youpi> that's odd
+ <teythoon> maybe I did it wrong
+ <braunr> i don't find it surprising
+ <braunr> well, not that surprising :p
+ <braunr> on what lock does it block ?
+ <teythoon> as far as i can see the only difference of hello and hellot-mt
+ is that it uses a different dispatcher and has lot's of locking, right?
+ <teythoon> braunr: I'm not sure, partly because that wrecked havoc on the
+ whole system
+ <teythoon> it just freezes
+ <teythoon> but it wasn't permanent. once i let it running and it recovered
+ <braunr> consider using a subhurd
+ <teythoon> ah right, I ment to set up one anyway, but my first attempts
+ were not successful, not sure why
+ <teythoon> anyway, is there a way to prevent this in the first place?
+ <teythoon> if one could compare ports that'd be helpful
+ <youpi> Mmm, did you try to simply compare the number?
+ <teythoon> with the bootstrap port I presume?
+ <youpi> Mmm, no, the send port and the receive port would be different
+ <youpi> no, with the receive port
+ <teythoon> ah
+ <braunr> comparing the numbers should work
+ <braunr> youpi: no they should be the same
+ <youpi> braunr: ah, then it should work yes
+ <braunr> that's why there are user ref counts
+ <youpi> ok
+ <braunr> only send-once rights have their own names
+ <teythoon> btw, I'll push my work to darnassus from now on,
+ e.g. http://darnassus.sceen.net/gitweb/?p=teythoon/hurd.git;a=shortlog;h=refs/heads/feature-mtab-translator-v3-wip
+
+
+## [[open_issues/libnetfs_passive_translators]]
+
+
+## IRC, freenode, #hurd, 2013-07-16
+
+ <teythoon> which port is the receive port of a translator? I mean, how is
+ it called in the source, there is no port in sight named receive anywhere
+ I looked.
+ <braunr> teythoon: what is the "receive port of a translator" ?
+ <teythoon> braunr: we talked yesterday about preventing the mtab deadlock
+ by comparing ports
+ <teythoon> I asked which one to use for the comparison, youpi said the
+ receive port
+ <braunr> i'm not sure what he meant
+ <braunr> it could be the receive port used for the RPC
+ <braunr> but i don't think it's exported past mig stub code
+ <teythoon> weird, I just reread it. I asked if i should use the bootstrap
+ port, and he said receive port, but it might have been addressed to you?
+ <teythoon> you were talking about send and receive ports being singletons
+ or not
+ <teythoon> umm
+ <braunr> no i answered him
+ <braunr> he was wondering if the receive port could actually be used for
+ comparison
+ <braunr> i said it can
+ <braunr> but still, i'm not sure what port
+ <braunr> if it's urgent, send him a mail
+ <teythoon> no, my pipeline is full of stuff I can do instead ;)
+ <braunr> :)
+
+
+## IRC, freenode, #hurd, 2013-07-17
+
+ <teythoon> braunr: btw, comparing ports solved the deadlock in the mtab
+ translator rather easily
+ <braunr> :)
+ <braunr> which port then ?
+ <teythoon> currently I'm stuck though, I'm not sure how to address Neals
+ concern wrt to access permission checks
+ <teythoon> I believe it's called control port
+ <braunr> ok
+ <teythoon> the one one gets from doing the handshake with the parent
+ <braunr> i thought it was the bootstrap port
+ <braunr> but i don't know the details so i may be wrong
+ <braunr> anyway
+ <teythoon> yes
+ <braunr> what is the permission problem again ?
+ <teythoon> 871u73j4zp.wl%neal@walfield.org
+ <braunr> well, you could perform a lookup on the stored path
+ <braunr> as if opening the node
+ <teythoon> if I look at any server implementation of a procedure from
+ fs.defs (say libtrivfs/file-chmod.c [bad example though, that looks wrong
+ to me]), there is permission checking being done
+ <teythoon> any server implementation of a procedure from fsys.defs lacks
+ permission checks, so I guess it's being done somewhere else
+ <braunr> i must say i'm a bit lost in this discussion
+ <braunr> i don't know :/
+ <braunr> can *you* sum up the permission problem please ?
+ <braunr> i mean here, now, in just a few words ?
+ <teythoon> ok, so I'm extending the fsys api with the get_children
+ procedure
+ <teythoon> that one should not return any children x/y if the user doing
+ the request has no read permissions on x
+ <braunr> really ?
+ <braunr> why so ?
+ <teythoon> the same way ls x would not reveal the existence of y
+ <braunr> i could also say unlike cat /proc/mounts
+ <braunr> i can see why we would want that
+ <braunr> i also can see why we could let this behaviour in place
+ <braunr> let's admit we do want it
+ <teythoon> true, but I thought this could easily be addressed
+ <braunr> what you could do is
+ <teythoon> now I'm not sure b/c I cannot even find the permission checking
+ code for any fsys_* function
+ <braunr> for each element in the list of child translators
+ <braunr> perform a lookup on the stored path on behalf of the user
+ <braunr> and add to the returned list if permission checks pass
+ <braunr> teythoon: note that i said lookup on the path, which is an fs
+ interface
+ <braunr> i assume there is no permission checking for the fsys interface
+ because it's done at the file (fs) level
+ <teythoon> i think so too, yes
+ <teythoon> sure, if I only knew who made the request in the first place
+ <teythoon> the file-* options have a convenient credential handle passed in
+ as first parameter
+ <teythoon> s/options/procedures/
+ <teythoon> surely the fsys-* procedures also have a means of retrieving
+ that information, I just don't know how
+ <braunr> mig magic
+ <braunr> teythoon: see file_t in hurd_types.defs
+ <braunr> there is the macro FILE_INTRAN which is defined in subdirectories
+ (or not)
+ <teythoon> ah, retrieving the control port requires permissions, and the
+ fsys-* operations then operate on the control port?
+ <braunr> see libdiskfs/fsmutations.h for example
+ <braunr> uh yes but that's for < braunr> i assume there is no permission
+ checking for the fsys interface because it's done at the file (fs) level
+ <braunr> i'm answering < teythoon> sure, if I only knew who made the
+ request in the first place
+ <braunr> teythoon: do we understand each other or is there still something
+ fuzzy ?
+ <teythoon> braunr: thanks for the pointers, I'll read up on that a bit
+ later
+ <braunr> teythoon: ok
+
+
+## IRC, freenode, #hurd, 2013-07-18
+
+ <teythoon> braunr: back to the permission checking problem for the
+ fsys_get_children interface
+ <teythoon> I can see how this could be easily implemented in the mtab
+ translator, it asks the translator for the list of children and then
+ checks if the user has permission to read the parent dir
+ <teythoon> but that is pointless, it has to be implemented in the
+ fsys_get_children server function
+ <braunr> yes
+ <braunr> why is it pointless ?
+ <teythoon> because one could circumvent the restriction by doing the
+ fsys_get_children call w/o the mtab translator
+ <braunr> uh no
+ <braunr> you got it wrong
+ <braunr> what i suggested is that fsys_get_children does it before
+ returning a list
+ <braunr> the problem is that the mtab translator has a different identity
+ from the users accessing it
+ <teythoon> yes, but I cannot see how to do this, b/c at this point I do not
+ have the user credentials
+ <braunr> get them
+ <teythoon> how?
+ <braunr> 16:14 < braunr> mig magic
+ <braunr> 16:15 < braunr> teythoon: see file_t in hurd_types.defs
+ <braunr> 16:16 < braunr> there is the macro FILE_INTRAN which is defined in
+ subdirectories (or not)
+ <braunr> 16:16 < braunr> see libdiskfs/fsmutations.h for example
+ <teythoon> i saw that
+ <braunr> is there a problem i don't see then ?
+ <braunr> i suppose you should define FSYS_INTRAN rather
+ <braunr> but the idea is the same
+ <teythoon> won't that change all the function signatures of the fsys-*
+ family?
+ <braunr> that's probably the only reason not to implement this feature
+ right now
+ <teythoon> then again, that change is probably easy and mechanic in nature,
+ might be an excuse to play around with coccinelle
+ <braunr> why not
+ <braunr> if you have the time
+ <teythoon> right, if this can be done, the mtab translator (if run as root)
+ could get credentials matching the users credentials to make that
+ request, right?
+ <braunr> i suppose
+ <braunr> i'm not sure it's easy to make servers do requests on behalf of
+ users on the hurd
+ <braunr> which makes me wonder if the mtab functionality shouldn't be
+ implemented in glibc eheheh ....
+ <braunr> but probably not
+ <teythoon> well, I'll try out the mig magic thing and see how painful it is
+ to fix everything ;)
+ <braunr> good luck
+ <braunr> honestly, i'm starting to think it's deviating too much from your
+ initial goal
+ <braunr> i'd be fine with a linux-like /proc/mounts
+ <braunr> with a TODO concerning permissions
+ <teythoon> ok, fine with me :)
+ <braunr> confirm it with the other mentors please
+ <braunr> we have to agree quickly on this
+ <teythoon> y?
+
+ <teythoon> braunr: I actually believe that the permission issue can be
+ addressed cleanly and unobstrusively
+ <teythoon> braunr: would you still be opposed to the get_children approach
+ if that is solved?
+ <teythoon> the filesystem is a tree and the translators "creating" that
+ tree are a more coarse version of that tree
+ <teythoon> having a method to traverse that tree seems natural to me
+ <braunr> teythoon: it is natural
+ <braunr> i'm just worried it's a bit too complicated, unnecessary, and
+ out-of-scope for the problem at hand
+ <braunr> (which is /proc/mounts, not to forget it)
+
+
+## IRC, freenode, #hurd, 2013-07-19
+
+ <teythoon> braunr: I think you could be a bit more optimistic and
+ supportive of the decentralized approach
+ <teythoon> I know the dark side has cookies and strong language and it's
+ mighty tempting
+ <teythoon> but both are bad for you :p
+
+
+## IRC, freenode, #hurd, 2013-07-22
+
+ <youpi> teythoon: AIUI, you should be able to run the mtab translator as
+ no-user (i.e. no uid)
+ <teythoon> youpi: yes, that works fine
+
+ <youpi> teythoon: so there is actually no need to define FSYS_INTRAN, doing
+ it by hand as you did is fine, right?
+ <youpi> (/me backlogs mails...)
+ <teythoon> youpi: yes, the main challenge was to figure out what mig does
+ and how the cpp is involved
+ <youpi> heh :)
+ <teythoon> my patch does exactly the same, but only for this one server
+ function
+ <teythoon> youpi: I'm confused by your mail, why are read permissions on
+ all path components necessary?
+ <braunr> teythoon: only execution normally
+ <youpi> teythoon: to avoid letting a user discover a translator running on
+ a hidden directory
+ <teythoon> braunr: exactly, and that is tested
+ <youpi> e.g. ~/home/foo is o+x, but o-r
+ <youpi> and I have a translator running on ~/home/foo/aZeRtYuyU
+ <youpi> I don't want that to show up on /proc/mounts
+ <braunr> youpi: i don't understand either: why isn't execution permission
+ enough ?
+ <teythoon> youpi: but that requires testing for read on the *last*
+ component of the *dirname* of your translator, and that is tested
+ <youpi> let me take another example :)
+ <youpi> e.g. ~/home/foo/aZeRtYuyU is o+x, but o-r
+ <youpi> and I have a translator running on ~/home/foo/aZeRtYuyU/foo
+ <youpi> ergl sorry, I meant this actually:
+ <teythoon> yes, that won't show up then in the mtab for users that are not
+ you and not root
+ <youpi> e.g. ~/home/foo is o+x, but o-r
+ <youpi> and I have a translator running on ~/home/foo/aZeRtYuyU/foo
+ <teythoon> ah
+ <teythoon> hmm, good point
+ <braunr> ?
+ * braunr still confused
+ <teythoon> well, qwfpgjlu is the secret
+ <teythoon> and that is revealed by the fsys_get_children procedure
+ <braunr> then i didn't understand the description of the call right
+ <braunr> > + /* check_access performs the same permission check as is
+ normally
+ <braunr> > + done, i.e. it checks that all but the last path components
+ are
+ <braunr> > + executable by the requesting user and that the last
+ component is
+ <braunr> > + readable. */
+ <teythoon> braunr: youpi argues that this is not enough in this case
+ <braunr> from that, it looks ok to me
+ <youpi> the function and the documentation agree, yes
+ <youpi> but that's not what we want
+ <braunr> and that's where i fail to understand
+ <youpi> again, see my example
+ <braunr> i am
+ <braunr> 10:43 < youpi> e.g. ~/home/foo is o+x, but o-r
+ <braunr> ok
+ <youpi> so the user is not supposed to find out the secret
+ <braunr> then your example isn't enough to describe what's wron
+ <braunr> g
+ <youpi> checking read permission only on ~/home/foo/aZeRtYuyU will not
+ garantee that
+ <braunr> ah
+ <braunr> i thought foo was the last component
+ <youpi> no, that's why I changed my example
+ <braunr> hum
+ <braunr> 10:43 < youpi> e.g. ~/home/foo is o+x, but o-r
+ <braunr> 10:43 < youpi> and I have a translator running on
+ ~/home/foo/aZeRtYuyU/foo
+ <braunr> i meant, the last foo
+ <teythoon> still, this is easily fixed
+ <youpi> sure
+ <youpi> just has to be :)
+ <teythoon> youpi, braunr: so do you think that this approach will work?
+ <youpi> I believe so
+ <braunr> i still don't see the problem, so don't ask me :)
+ <braunr> i've been sick all week end and hardly slept, which might explain
+ <braunr> in the example, "all but the last path components" is
+ "~/home/foo/aZeRtYuyU"
+ <braunr> right ?
+ <youpi> braunr: well, I haven't looked at the details
+ <youpi> but be it the last, or but-last doesn't change the issue
+ <youpi> if my ~/hidden is o-r,o+x
+ <youpi> and I have a translator on ~/hidden/a/b/c/d/e
+ <youpi> checking only +x on hidden is not ok
+ <braunr> but won't the call also check a b c d ?
+ <youpi> yes, but that's not what matters
+ <youpi> what matters is that hidden is o-r
+ <braunr> hm
+ <youpi> so the mtab translator is not supposed to reveal that there is an
+ "a" in there
+ <braunr> ok i'm starting to understand
+ <braunr> so r must be checked on all components too
+ <youpi> yes
+ <braunr> right
+ <youpi> to simulate the user doing ls, cd, ls, cd, etc.
+ <braunr> well, not cd
+ <braunr> ah
+ <youpi> for being able to do ls, you have to be able to do cd
+ <braunr> as an ordered list of commands
+ <braunr> ok
+ <teythoon> agreed. can you think of any more issues?
+ <braunr> so both x and r must be checked
+ <youpi> so in the end this RPC is really a shortcut for a find + fsysopts
+ script
+ <youpi> teythoon: I don't see any
+ <braunr> teythoon: i couldn't take a clear look at the patch but
+ <braunr> do you perform a lookup on all nodes ?
+ <teythoon> yes, all nodes on the path from the root to the one specified by
+ the mount point entry in the active translator list
+ <braunr> let me rephrase
+ <braunr> do you at some point do a lookup, similar to a find, on all nodes
+ of a translator ?
+ <teythoon> no
+ <braunr> good
+ <teythoon> yes
+ <braunr> iirc, neal raised that concern once
+ <teythoon> and I'll also fix settrans --recursive not to iterate over *all*
+ nodes either
+ <braunr> great
+ <braunr> :)
+ <teythoon> fsys_set_options with do_children=1 currently does that (I've
+ only looked at the diskfs version)
+
+
+## IRC, freenode, #hurd, 2013-07-27
+
+ <teythoon> youpi: ah, I just found msg_get_init_port, that should make the
+ translator detection feasible
+
+
+## IRC, freenode, #hurd, 2013-07-31
+
+ <teythoon> braunr: can I discover the sender of an rpc message?
+ <braunr> teythoon: no
+ <braunr> teythoon: what do you mean by "sender" ?
+ <teythoon> braunr: well, I'm trying to do permission checks in the
+ S_proc_mark_essential server function
+ <braunr> ok so, the sending user
+ <braunr> that should be doable
+ <teythoon> I've got a struct proc *p courtesy of a mig intran mutation and
+ a port lookup
+ <teythoon> but that is not necessarily the sender, right?
+ <braunr> proc is really the server i know the least :/
+ <braunr> there is permission checking for signals
+ <braunr> it does work
+ <braunr> you should look there
+ <teythoon> yes, there are permission checks there
+ <teythoon> but the only argument the rpc has is a mach_port_t refering to
+ an object in the proc server
+ <braunr> yes
+ <teythoon> anyone can obtain such a handle for any process, no?
+ <braunr> can you tell where it is exactly please ?
+ <braunr> i don't think so, no
+ <teythoon> what?
+ <braunr> 14:42 < teythoon> but the only argument the rpc has is a
+ mach_port_t refering to an object in the proc server
+ <teythoon> ah
+ <braunr> the code you're referring to
+ <braunr> a common way to give privileges to public objects is to provide
+ different types of rights
+ <braunr> a public (usually read-only) right
+ <braunr> and a privileged one, like host_priv which you may have seen
+ <braunr> acting on (modifying) a remote object normally requires the latter
+ <teythoon> http://paste.debian.net/20795/
+ <braunr> i thought you were referring to existing code
+ <teythoon> well, there is existing code doing permission checks the same
+ way I'm doing it there
+ <braunr> where is it please ?
+ <braunr> mgt.c ?
+ <teythoon> proc/mgt.c (S_proc_setowner) for example
+ <teythoon> yes
+ <braunr> that's different
+ <teythoon> but anyone can obtain such a reference by doing proc_pid2proc
+ <braunr> the sender is explicitely giving the new uid
+ <braunr> yes but not anyone is already an owner of the target process
+ <braunr> (although it may look like anyone has the right to clear the owner
+ oO)
+ <teythoon> see, that's what made me worry, it is not checked who's the
+ sender of the message
+ <teythoon> unless i'm missing something here
+ <teythoon> ah
+ <teythoon> I am
+ <teythoon> pid2proc returns EPERM if one is not the owner of the process in
+ question
+ <teythoon> all is well
+ <braunr> ok
+ <braunr> it still requires the caller process though
+ <teythoon> what?
+ <braunr> see check_owner
+ <braunr> the only occurrence i find in the hurd is in libps/procstat.c
+ <braunr> MGET(PSTAT_PROCESS, PSTAT_PID, proc_pid2proc (server, ps->pid,
+ &ps->process));
+ <braunr> server being the proc server AIUI
+ <teythoon> yes, most likely
+ <braunr> but pid2proc describes this first argument to be the caller
+ process
+ <teythoon> ah but it is
+ <braunr> ?
+ <teythoon> mig magic :p
+ <teythoon> MIGSFLAGS="-DPROCESS_INTRAN=pstruct_t reqport_find (process_t)"
+ \
+ <teythoon> MIGSFLAGS="-DPROCESS_INTRAN=pstruct_t reqport_find (process_t)"
+ \
+ <braunr> ah nice
+ <braunr> hum no
+ <braunr> this just looks up the proc object from a port name, which is
+ obvious
+ <braunr> what i mean is
+ <braunr> 14:53 < braunr> MGET(PSTAT_PROCESS, PSTAT_PID, proc_pid2proc
+ (server, ps->pid, &ps->process));
+ <braunr> this is done in libps
+ <braunr> which can be used by any process
+ <braunr> server is the proc server for this process (it defines the process
+ namespace)
+ <teythoon> yes, but isn't the port to the proc server different for each
+ process?
+ <braunr> no, the port is the same (the name changes only)
+ <braunr> ports are global non-first class objects
+ <teythoon> and the proc server can thus tell with the lookup which process
+ it is talking to?
+ <braunr> that's the thing
+ <braunr> from pid2proc :
+ <braunr> S_proc_pid2proc (struct proc *callerp
+ <braunr> [...]
+ <braunr> if (! check_owner (callerp, p))
+ <braunr> check_owner (struct proc *proc1, struct proc *proc2)
+ <braunr> "Returns true if PROC1 has `owner' privileges over PROC2 (and can
+ thus get its task port &c)."
+ <braunr> callerp looks like it should be the caller process
+ <braunr> but in libps, it seems to be the proc server
+ <braunr> this looks strange to me
+ <teythoon> yep, to me too, hence my confusion
+ <braunr> could be a bug that allows anyone to perform pid2proc
+ <teythoon> braunr: well, proc_pid2proc (getproc (), 1, ...) fails with
+ EPERM as expected for me
+ <braunr> ofc it does with getproc()
+ <braunr> but what forces a process to pass itself as the first argument ?
+ <teythoon> braunr: nothing, but what else would it pass there?
+ <braunr> 14:53 < braunr> MGET(PSTAT_PROCESS, PSTAT_PID, proc_pid2proc
+ (server, ps->pid, &ps->process));
+ <braunr> everyone knows the proc server
+ <braunr> ok now, that's weird
+ <braunr> teythoon: does getproc() return the proc server ?
+ <teythoon> I think so, yes
+ <teythoon> damn those distributed systems, all of their sources are so
+ distributed too
+ <braunr> i suspect there is another layer of dark glue in the way
+ <teythoon> I cannot even find getproc :/
+ <braunr> hurdports.c:GETSET (process_t, proc, PROC)
+ <braunr> that's the dark glue :p
+ <teythoon> ah, so it must be true that the ports to the proc server are
+ indeed process specific, right?
+ <braunr> ?
+ <teythoon> well, it is not one port to the proc server that everyone knows
+ <braunr> it is
+ <braunr> what makes you think it's not ?
+ <teythoon> proc_pid2proc (getproc (), 1, ...) fails with EPERM for anyone
+ not being root, but succeeds for root
+ <braunr> hm right
+ <teythoon> if getproc () were to return the same port, the proc server
+ couldn't distinguish these
+ <braunr> indeed
+ <braunr> in which case getproc() actually returns the caller's process
+ object at its proc server
+ <teythoon> yes, that is better worded
+ <braunr> teythoon: i'm not sure it's true actually :/
+ <teythoon> braunr: well, exploit or it didn't happen
+ <braunr> teythoon: getproc() apparently returns a bootstrap port
+ <braunr> we must find the code that sets this port
+ <braunr> i have a hard time doing that :/
+ <pinotree> isn't part of the stuff which is passed to a new process by
+ exec?
+ <teythoon> braunr: I know that feeling
+ <braunr> pinotree: probably
+ <braunr> still hard to find ..
+ <pinotree> search in glibc
+ <teythoon> braunr: exec/exec.c:1654 asks the proc server for the proc
+ object to use for the new process
+ <teythoon> so how much of hurd do I have to rebuild once i changed struct
+ procinfo in hurd_types.h?
+ <teythoon> oh noez, glibc uses it too :/
+
+
+## IRC, freenode, #hurd, 2013-08-01
+
+ <teythoon> I need some pointers on building the libc, specifically how to
+ point libcs build system to my modified hurd headers
+ <teythoon> nlightnfotis: hi
+ <teythoon> nlightnfotis: you rebuild the libc right? do you have any hurd
+ specific pointers for doing so?
+ <nlightnfotis> teythoon, I have not yet rebuild the libc (I was planning
+ to, but I followed other courses of action) Thomas had pointed me to some
+ resources on the Hurd website. I can look them up for you
+ <nlightnfotis> teythoon, here are the instructions
+ http://darnassus.sceen.net/~hurd-web/open_issues/glibc/debian/
+ <nlightnfotis> and the eglibc snapshot is here
+ http://snapshot.debian.org/package/eglibc/
+ <teythoon> nlightnfotis: yeah, I found those. the thing is I changed a
+ struct in the hurd_types.h header, so now I want to rebuild the libc with
+ that header
+ <teythoon> and I cannot figure out how to point libcs build system to my
+ hurd headers
+ <teythoon> :/
+ <nlightnfotis> can you patch eglibc and build that one instead?
+ <pochu> teythoon: put your header in the appropriate /usr/include/ dir
+ <teythoon> pochu: is there no other way?
+ <pinotree> iirc nope
+ <pochu> teythoon: you may be able to pass some flag to configure, but I
+ don't know if that will work in this specific case
+ <teythoon> ouch >,< that explains why I haven't found one
+ <pochu> check ./configure --help, it's usually FOO_CFLAGS (so something
+ like HURD_CFLAGS maybe)
+ <pochu> but then you may need _LIBS as well depending on how you changed
+ the header... so in the end it's just easier to put the header in
+ /usr/include/
+ <braunr> teythoon: did you find the info for your libc build ?
+ <teythoon> braunr: well, i firmlinked my hurd_types.h into /usr/include/...
+ <braunr> ew
+ <braunr> i recommend building debian packages
+ <teythoon> but the build was not successful, looks unrelated to my changes
+ though
+ <teythoon> I tried that last week and the process took more than eight
+ hours and did not finish
+ <braunr> use darnassus
+ <braunr> it takes about 6 hours on it
+ <teythoon> I shall try again and skip the unused variants
+ <braunr> i also suggest you use ./debian/rules build
+ <braunr> and then interrupt the build process one you see it's building
+ object files
+ <braunr> go to the hurd-libc-i386 build dir, and use make lib others
+ <braunr> make lib builds libc, others is for companion libraries lik
+ libpthread
+ <braunr> actually building libc takes less than an hour
+ <braunr> so once you validate your build this way, you know building the
+ whole debian package will succedd
+ <braunr> succeed*
+ <teythoon> so how do I get the build system to pick up my hurd_types.h?
+ <braunr> sorry if this is obvious to you, you might be more familiar with
+ debian than i am :)
+ <braunr> patch the hurd package
+ <braunr> append your own version string like +teythoon.hurd.1
+ <braunr> install it
+ <braunr> then build libc
+ <braunr> i'll reboot darnassus so you have a fresh and fast build env
+ <braunr> almost a month of uptime without any major issue :)
+ <teythoon> err, but I cannot install my hurd package on darnassus, can I? I
+ don't think that'd be wise even if it were possible
+ <braunr> teythoon: rebooted, enjoy
+ <braunr> why not ?
+ <braunr> i often do it for my own developments
+ <braunr> teythoon: screen is normally available
+ <braunr> teythoon: be aware that fakeroot-tcp is known to hang when pfinet
+ is out of ports (that's a bug)
+ <braunr> it takes more time to reach that bug since a patch that got in
+ less than a year ago, but it still happens
+ <braunr> the hurd packages are quick to build, and they should only provide
+ the new header, right ?
+ <braunr> you can include the functionality too in the packages if you're
+ confident enough
+ <teythoon> but my latest work on the killing of essential processes issues
+ involves patching hurd_types.h and that in a way that breaks the ABI,
+ hence the need to rebuild the libc (afaiui)
+ <braunr> teythoon: yes, this isn't uncommon
+ <teythoon> braunr: this is much more intrusive than anything I've done so
+ far, so I'm not so confident in my changes for now
+ <braunr> teythoon: show me the patch please
+ <teythoon> braunr: it's not split up yet, so kind of messy:
+ http://paste.debian.net/21403/
+ <braunr> teythoon: did you make sure to add RPCs at the end of defs files ?
+ <teythoon> yes, I got burned by this one on my very first attempt, you
+ pointed out that mistake
+ <braunr> :)
+ <braunr> ok
+ <braunr> you're changing struct procinfo
+ <braunr> this really breaks the abi
+ <teythoon> yes
+ <braunr> i.e. you can't do that
+ <teythoon> I cannot put it at the end b/c of that variable length array
+ <braunr> you probably should add another interface
+ <teythoon> that'd be easier, sure, but this will slow down procfs even
+ more, no?
+ <braunr> that's secondary
+ <braunr> it won't be easier, breaking the abi may break updates
+ <braunr> in which case it's impossible
+ <braunr> another way would be to ues a new procinfo struct
+ <braunr> like struct procinfo2
+ <braunr> but then you need a transition step so that all users switch to
+ that new version
+ <braunr> which is the best way to deal with these issues imo, but this time
+ not the easiest :)
+ <teythoon> ok, so I'll introduce another rpc and make sure that one is
+ extensible
+ <braunr> hum no
+ <braunr> this usually involves using a version anyway
+ <teythoon> no? but it is likely that we need to save more addresses of this
+ kind in the future
+ <braunr> in which case it will be hanlded as an independant problem with a
+ true solution such as the one i mentioned
+ <teythoon> it could return an array of vm_address_ts with a length
+ indicating how many items were returned
+ <braunr> it's ugly
+ <braunr> the code is already confusing enough
+ <braunr> keep names around for clarity
+ <teythoon> ok, point taken
+ <braunr> really, don't mind additional RPCs when first adding new features
+ <braunr> once the interface is stable, a new and improved version becomes a
+ new development of its own
+ <braunr> you're invited to work on that after gsoc :)
+ <braunr> but during gsoc, it just seems like an unnecessary burden
+ <teythoon> ok cool, I really like that way of extending Hurd, it's really
+ easy
+ <teythoon> and feels so natural
+ <braunr> i share your concern about performances, and had a similar problem
+ when adding page cache information to gnumach
+ <braunr> in the end, i'll have to rework that again
+ <braunr> because i tried to extend it beyond what i needed
+ <teythoon> true, I see how that could happen easily
+ <braunr> the real problem is mig
+ <braunr> mig limits subsystems to 100 calls
+ <braunr> it's clearly not enough
+ <braunr> in x15, i intend to use 16 bits for subsystems and 16 bits for
+ RPCs, which should be plenty
+ <teythoon> that limit seems rather artificial, it's not a power of two
+ <braunr> yes it is
+ <teythoon> so let's fix it
+ <braunr> mach had many artificial static limits
+ <braunr> eh :D
+ <braunr> not easy
+ <braunr> replies are encoded by taking the request ID and adding 100
+ <teythoon> uh
+ <braunr> "uh" indeed
+ <teythoon> so we need an intermediate version of mig that accepts both
+ id+100 and dunno id+2^x as replies for id
+ <teythoon> or -id - 1
+ <braunr> that would completely break the abi
+ <teythoon> braunr: how so? the change would be in the *_server functions
+ and be compatible with the old id scheme
+ <braunr> how do you make sure id+2^x doesn't conflict with another id ?
+ <teythoon> oh, the id is added to the subsystem id?
+ <teythoon> to obtain a global message id?
+ <braunr> yes
+ <teythoon> ah, I see
+ <teythoon> ah, but the hurd subsystems are 1000 ids apart
+ <teythoon> so id+100 or id +500 would work
+ <braunr> we need to make sure it's true
+ <braunr> always true
+ <teythoon> so how many bits do we have for the message id in mach?
+ <teythoon> (mig?)
+ <braunr> mach shouldn't care, it's entirely a mig thing
+ <braunr> well yes and no
+ <braunr> mach defines the message header, which includes the message id
+ <braunr> see mach/message.h
+ <braunr> mach_msg_id_t msgh_id;
+ <braunr> typedef integer_t mach_msg_id_t;
+ <teythoon> well, if that is like a 32 bit integer, then allow -id-1 as
+ reply and forbid ids > 2^x / 2
+ <braunr> yes
+ <braunr> seems reasonable
+ <teythoon> that'd give us an smooth upgrade path, no?
+ <braunr> i think so
+
+
+## IRC, freenode, #hurd, 2013-08-28
+
+ <youpi> teythoon: Mmm, your patch series does not make e.g. ext2fs provide
+ a diskfs_get_source, does it?
+
+
+## IRC, freenode, #hurd, 2013-08-29
+
+ <teythoon> youpi: that is correct
+ <youpi> teythoon: Mmm, I must be missing something then: as such the patch
+ series introduces an RPC, but only EOPNOTSUPP is ever returned in all
+ cases for now?
+ <youpi> ah
+ <youpi> /* Guess based on the last argument. */
+ <youpi> since ext2fs & such report their options with store last, it seems
+ ok indeed
+ <youpi> it still seems a bit lame not to return that information in
+ get_source
+ <teythoon> yes
+ <teythoon> well, if it had been just for me, I would not have created that
+ rpc, but only guessing was frowned uppon iirc
+ <teythoon> then again, maybe this should be used and then the mtab
+ translator could skip any translators that do not provide this
+ information to filter out non-"filesystem" translators
+ <youpi> guessing is usually trap-prone, yes
+ <youpi> if it is to be used by mtab, then maybe it should be documented as
+ being used by mtab
+ <youpi> otherwise symlink would set a source, for instance
+ <youpi> while we don't really want it here
+ <teythoon> why would the symlink translator answer to such requests? it is
+ not a filesystem-like translator
+ <youpi> no, but the name & documentation of the RPC doesn't tell it's only
+ for filesystem-like translators
+ <youpi> well, the documentation does say "filesystem"
+ <youpi> but it does not clearly specify that one shouldn't implement
+ get_source if one is not a filesystme
+ <youpi> "If the concept of a source is applicable" works for a symlink
+ <youpi> that could be the same for eth-filter, etc.
+ <teythoon> right
+ <youpi> Mmm, that said it's fsys.defs
+ <youpi> not io.defs
+ <youpi> teythoon: it is the fact that we get EOPNOTSUPP (i.e. fsys
+ interface supported, just not that call), and not MIG_BAD_ID (i.e. fsys
+ interface not supported), that filters out symlink & such, right?
+ <teythoon> that's what I was thinking, but that's based on my
+ interpretation of EOPNOPSUPP of course ;)
+ <youpi> teythoon: I believe that for whatever is a bit questionable, even
+ if you put yourself on the side that people will probably agree on, the
+ discussion will still take place so we make sure it's the right side :)
+ <youpi> (re: start/end_code)
+ <teythoon> I'm not sure I follow
+ <teythoon> youpi: /proc/pid/stat seems to be used a lot:
+ http://codesearch.debian.net/search?q=%22%2Fproc%2F.*%2Fstat%22
+ <teythoon> that does not mean that start/endcode is used, but still it
+ seems like a good thing to mimic Linux closely
+ <youpi> stat is used a lot for cpu usage for instance, yes
+ <youpi> start/endcode, I really wonder who is using it
+ <youpi> using it for kernel thread detection looks weird to me :)
+ <youpi> (questionable): I mean that even if you take the time to put
+ yourself on the side that people will probably agree on, the discussion
+ will happen
+ <youpi> it has to happen so people know they agree on it
+ <youpi> I've seen that a lot in various projects (not only CS-related)
+ <teythoon> ok, I think I got it
+ <teythoon> it's to document the reasons for (not) doing something?
+ <youpi> something like this, yes
+ <youpi> even if you look right, people will try to poke holes
+ <youpi> just to make sure :)
+ <teythoon> btw, I think it's rather unusual that our storeio experiments
+ would produce such different results
+ <teythoon> you're right about the block device, no idea why I got a
+ character file there
+ <teythoon> I used settrans -ca /tmp/hello.unzipped /hurd/storeio -T
+ gunzip:file /tmp/hello
+ <teythoon> also I tried stacking the translator on /tmp/hello directly,
+ from what I've gathered that should be possible, but I failed
+ <teythoon> ftr I use the exec server with all my patches, so the unzipping
+ code has been removed from it
+ <youpi> ah, I probably still have it
+ <youpi> it shouldn't matter here, though
+ <teythoon> I agree
+ <youpi> how would you stack it?
+ <youpi> I've never had a look at that
+ <youpi> I'm not sure attaching the translator to the node is done before or
+ after the translator has a change to open its target
+ <teythoon> right
+ <teythoon> but it could be done, if storeio used the reference to the
+ underlying node, no?
+ <youpi> yes
+ <youpi> btw, you had said at some point that you had issues with running
+ remap. Was the issue what you fixed with your patches?
+ * youpi realizes that he should have shown the remap.c source code during
+ his presentation
+ <teythoon> well, I tried to remap /servers/exec (iirc) and that failed
+ <teythoon> then again, I recently played with remap and all seemed fine
+ <teythoon> but I'm sure it has nothing to do with my patches
+ <youpi> ok
+ <teythoon> those I came up with investigating fakeroot-hurd
+ <teythoon> and I saw that this also aplies to remap.sh
+ <teythoon> *while
+ <youpi> yep, they're basically the same
+ <teythoon> btw, I somehow feel settrans is being abused for chroot and
+ friends, there is no translator setting involved
+ <youpi> chroot, the command? or the settrans option?
+ <youpi> I don't understand what you are pointing at
+ <teythoon> the settrans option being used by fakeroot, remap and (most
+ likely) our chroot
+ <youpi> our chroot is just a file_reparent call
+ <youpi> fakeroot and remap do start a translator
+ <teythoon> yes, but it is not being bound to a node, which is (how I
+ understand it) what settrans does
+ <teythoon> the point being that if settrans is being invoked with --chroot,
+ it does something completely different (see the big if (chroot) {...}
+ blocks)
+ <teythoon> to a point that it might be better of in a separate command
+ <youpi> Mmm, indeed, a lot of the options don't make sense for chroot
+
+
+## IRC, freenode, #hurd, 2013-09-06
+
+ <braunr> teythoon: do you personally prefer /proc being able to implement
+ /proc/self on its own, or using the magic server to tell clients to
+ resolve those specific cases themselves ?
+ <pinotree> imho solving the "who's the sender of an rpc" could solve both
+ the SCM_CREDS implementation and the self case in procfs
+
+[[open_issues/SENDMSG_SCM_CREDS]],
+[[hurd/translator/procfs/jkoenig/discussion]], *`/proc/self`*.
+
+ <braunr> pinotree: yes
+ <braunr> but that would require servers impersonating users to some extent
+ <braunr> and this seems against the hurd philosophy
+ <pinotree> and there was also the fact that you could create a
+ fake/different port when sending an rpc
+ <braunr> to fake what ?
+ <pinotree> the sender identiy
+ <pinotree> *identity
+ <braunr> what ?
+ <braunr> you mean intermediate servers can do that
+ <teythoon> braunr: I don't know if I understand all the implications of
+ your question, but the magic server is the only hurd server that actually
+ implements fsys_forward (afaics), so why not use that?
+ <braunr> teythoon: my question was rather about the principle
+ <braunr> do people find it acceptable to entrust a server with their
+ authority or not
+ <braunr> on the hurd, it's clearly wrong
+ <braunr> but then it means you need special cases everywhere, usually
+ handled by glibc
+ <braunr> and that's something i find wrong too
+ <braunr> it restricts extensibility
+ <braunr> the user can always change its libc at runtime, but in practice,
+ it's harder to perform than simply doing it in the server
+ <teythoon> braunr: then I think I didn't get the question at all
+ <braunr> teythoon: it's kind of the same issue that you had with the mtab
+ translator
+ <braunr> about showing or not some entries the user normally doesn't have
+ access to
+ <braunr> this problem occurs when there is more than one server on the
+ execution path and the servers beyond the first one need credentials to
+ reply something meaningful
+ <braunr> the /proc/self case is a perfect one
+ <braunr> (conceptually, it's client -> procfs -> symlink)
+ <braunr> 1/ procfs tells the client it needs to handle this specially,
+ which is what the hurd does with magic
+ <braunr> 2/ procfs assumes the identity of the client and the symlink
+ translator can act as expected because of that
+ <braunr> teythoon: what way do you find better ?
+ <teythoon> braunr: by "procfs assumes the identity" you mean procfs
+ impersonating the user?
+ <braunr> yes
+ <teythoon> braunr: tbh I still do not see how this can be implemented at
+ all b/c the /proc/self symlink is not about identity (which can be
+ derived from the peropen struct initially created by fsys_getroot) but
+ the pid of the callee (which afaics is nowhere to be found)
+ <teythoon> s/callee/caller/
+ <teythoon> the one doing the rpc
+ <braunr> impersonating the user isn't only about identity
+ <braunr> actually, it's impersonating the client
+ <teythoon> yes, client is the term >,<
+ <braunr> so basically, asking proc about the properties of the process
+ being impersonated
+ <teythoon> proc o_O
+ <braunr> it's not hard, it's just a big turn in the way the system would
+ function
+ <braunr> teythoon: ?
+ <teythoon> you lost me somewhere
+ <braunr> the client is the process
+ <braunr> not the user
+ <teythoon> in order to implement /proc/self properly, one has to get the
+ process id of the process doing the /proc/self lookup, right?
+ <braunr> yes
+ <braunr> actually, we would even slice it more and have the client be a
+ thread
+ <teythoon> so how do you get to that piece of information at all?
+ <braunr> the server inherits a special port designating the client, which
+ allows it to query proc about its properties, and assume it's identity in
+ servers such as auth
+ <braunr> its*
+ <teythoon> ah, but that kind of functionality isn't there at the moment, is
+ it?
+ <braunr> it's not, by design
+ <teythoon> right, hence my confusion
+ <braunr> instead, servers use the magic translator to send a "retry with
+ special handling" message to clients
+ <teythoon> right, so the procfs could bounce that back to the libc handler
+ that of course knows its pid
+ <braunr> yes
+ <teythoon> right, so now at last I got the whole question :)
+ <braunr> :)
+ <teythoon> ugh, I just found the FS_RETRY_MAGICAL handler in the libc :-/
+ <braunr> ?
+ <braunr> why "ugh" ?
+ <teythoon> well, I'm inclined to think this is the bad kind of magic ;)
+ <braunr> do i need to look at the code to understand ?
+ <teythoon> ok, so I think option 1/ is easily implemented, option 2/ has
+ consequences that I cannot fully comprehend
+ <braunr> same for me
+ <teythoon> no, but you yourself said that you do not like that kind of
+ logic being implemented in the libc
+ <braunr> well
+ <braunr> easily
+ <braunr> i'm not so sure
+ <braunr> it's easy to code, but i assume checking for magic replies has its
+ cost
+ <teythoon> why not? the code is doing a big switch over the retryname
+ supplied by the server
+ <teythoon> we could stuff getpid() logic in there
+ <braunr> 14:50 < braunr> it's easy to code, but i assume checking for magic
+ replies has its cost
+ <teythoon> what kind of cost? computational cost?
+ <braunr> yes
+ <braunr> the big switch you mentioned
+ <braunr> run every time a client gets a reply
+ <braunr> (unless i'm mistaken)
+ <teythoon> a only for RETRY_MAGICAL replies
+ <braunr> but you need to test for it
+ <teythoon> switch (retryname[0])
+ <teythoon> {
+ <teythoon> case '/':
+ <teythoon> ...
+ <teythoon> that should compile to a jump table, so the cost of adding
+ another case should be minimal, no?
+ <braunr> yes
+ <braunr> but
+ <braunr> it's even less than that
+ <braunr> the real cost is checking for RETRY_MAGICAL
+ <braunr> 14:55 < teythoon> a only for RETRY_MAGICAL replies
+ <braunr> so it's basically a if
+ <braunr> one if, right ?
+ <teythoon> no, it's switch'ing over doretry
+ <teythoon> you should pull up the code and see for yourself. it's in
+ hurd/lookup-retry.c
+ <braunr> ok
+ <braunr> well no, that's not what i'm looking for
+ <teythoon> it's not o_O
+ <braunr> i'm looking for what triggers the call to lookup_retry
+ <braunr> teythoon: hm ok, it's for lookups only, that's decent
+ <braunr> teythoon: 1/ has the least security implications
+ <teythoon> yes
+ <braunr> it could slightly be improved with e.g. a well defined interface
+ so a user could preload a library to extend it
+ <teythoon> extend the whole magic lookup thing?
+ <braunr> yes
+ <teythoon> but that is no immediate concern, you are trying to fix
+ /proc/self, right?
+ <braunr> no, i'm thinking about the big picture for x15/propel, keeping the
+ current design or doing something else
+ <teythoon> oh, okay
+ <braunr> solving /proc/self looks actually very easy
+ <teythoon> well, I'd say this depends a lot on your trust model then
+ <teythoon> do you consider servers trusted?
+ <teythoon> (btw, will there be mutual authentication of clients/servers in
+ propel?)
+ <braunr> there were very interesting discussions about that during the
+ l4hurd project
+ <braunr> iirc, shapiro insisted that using a server without trusting it
+ (and there were specific terminology about trusting/relying/etc..) is
+ nonsense
+ <braunr> teythoon: i haven't thought too much about that yet, for now it's
+ supposed to be similar to what the hurd does
+ <teythoon> hm, then again trust is not an on/off thing imho
+ <braunr> ?
+ <teythoon> trusting someone to impersonate yourself is a very high level of
+ trust
+ <teythoon> s/is/requires/
+ <teythoon> the mobile code paper suggests that mutual authentication might
+ be a good thing, and I tend to agree
+ <braunr> i'll have to read that again
+ <braunr> teythoon: for now (well, when i have time to work on it again
+ .. :))
+ <braunr> i'm focusing on the low level stuff, in a way that won't disturb
+ such high level features
+ <braunr> teythoon: have you found something related to a thread-specific
+ port in the proc server ?
+ <braunr> hurd/process.defs:297: /* You are not expected to understand
+ this. */
+ <braunr> \o/
+ <teythoon> braunr: no, why would I (the thread related question)
+ <teythoon> braunr: yes, that comment also cought my eye :/
+ <braunr> teythoon: because you read a lot of the proc code lately
+ <braunr> so maybe your view of it is better detailed than mine
+
+
+## IRC, freenode, #hurd, 2013-09-13
+
+ * youpi crosses fingers
+ <youpi> yay, still boots
+ <youpi> teythoon: I'm getting a few spurious entries in /proc/mounts
+ <youpi> none /servers/socket/26 /hurd/pfinet interface=/dev/eth0, etc.
+ <youpi> /dev/ttyp0 /dev/ttyp0 /hurd/term name,/dev/ptyp0,type,pty-master 0
+ 0
+ <youpi> /dev/sd1 /dev/cons ext2fs
+ writable,no-atime,no-inherit-dir-group,store-type=typed 0 0
+ <youpi> fortunately mount drops most of them
+ <youpi> but not /dev/cons
+ <youpi> spurious entries in df are getting more and more common on linux
+ too anyway...
+ <youpi> ah, after a console restart, I don't have it any more
+ <youpi> I'm getting df: `/dev/cons': Operation not supported instead
+
+
+## IRC, freenode, #hurd, 2013-09-16
+
+ <youpi> teythoon: e2fsck does not seem to be seeing that a given filesystem
+ is mounted
+ <youpi> /dev/sd0s1 on /boot type ext2 (rw,no-inherit-dir-group)
+ <youpi> and still # e2fsck -C 0 /dev/sd0s1
+ <youpi> e2fsck 1.42.8 (20-Jun-2013)
+ <youpi> /dev/sd0s1 was not cleanly unmounted, check forced.
+ <youpi> (yes, both /etc/mtab and /run/mtab point to /proc/mounts)
+ <tschwinge> Yes, that is a "known" problem.
+ <youpi> tschwinge: no, it's supposed to be fixed by the mtab translator :)
+ <pinotree> youpi: glibc's paths.h points to /var/run/mtab (for us)
+ <tschwinge> youpi: Oh. But this is by means of mtab presence, and not by
+ proper locking? (Which is at least something, of course!)
+ <youpi> /var/run points to /run
+ <youpi> tschwinge: yes
+ <youpi> anyway, got to run
+
+
+## IRC, freenode, #hurd, 2013-09-20
+
+ <braunr> teythoon: how come i see three mtab translators running ?
+ <braunr> 6 now oO
+ <braunr> looks like df -h spawns a few every time
+ <teythoon> yes, weird...
+ <braunr> accessing /proc/mounts does actually
+ <braunr> teythoon: more bug fixing for you :)
+
+
+## IRC, freenode, #hurd, 2013-09-23
+
+ <teythoon> so it might be a problem with either libnetfs (which afaics has
+ never supported passive translator records before) or procfs, but tbh I
+ haven't investigated this yet
diff --git a/community/gsoc/project_ideas/object_lookups.mdwn b/community/gsoc/project_ideas/object_lookups.mdwn
index 5075f783..88ffc633 100644
--- a/community/gsoc/project_ideas/object_lookups.mdwn
+++ b/community/gsoc/project_ideas/object_lookups.mdwn
@@ -40,3 +40,32 @@ accurate measurements in a system that lacks modern profiling tools would also
be helpful.
Possible mentors: Richard Braun
+
+
+# IRC, freenode, #hurd, 2013-09-18
+
+In context of [[!message-id "20130918081345.GA13789@dalaran.sceen.net"]].
+
+ <teythoon> braunr: (wrt the gnumach HACK) funny, I was thinking about doind
+ the same for userspace servers, renaming ports to the address of the
+ associated object, saving the need for the hash table...
+ <braunr> teythoon: see
+ http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/object_lookups/
+ <braunr> teythoon: my idea is to allow servers to set a label per port,
+ obtained at mesage recv time
+ <braunr> because, yes, looking up an object twice is ridiculous
+ <braunr> you normally still want port names to be close to 0 because it
+ allows some data structure optimizations
+ <teythoon> braunr: yes, I feared that ports should normally be smallish
+ integers and contigious at best
+ <teythoon> braunr: interesting that you say there that libihash suffers
+ from high collision rates
+ <teythoon> I've a theory to why that is, libihash doesn't do any hashing at
+ all
+ <pinotree> there are notes about that in the open_issues section of the
+ wiki
+ <teythoon> but I figured that this is probably ok for port names, as they
+ are small and contigious
+ <neal> braunr: That's called protected payload.
+ <neal> braunr: The idea is that the kernel appends data to the message in
+ flight.
diff --git a/community/gsoc/project_ideas/sound/discussion.mdwn b/community/gsoc/project_ideas/sound/discussion.mdwn
new file mode 100644
index 00000000..4a95eb62
--- /dev/null
+++ b/community/gsoc/project_ideas/sound/discussion.mdwn
@@ -0,0 +1,47 @@
+[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!taglink open_issue_documentation]]: update [[sound]] page.
+
+
+# IRC, freenode, #hurd, 2013-09-01
+
+ <rekado> I'm new to the hurd but I'd love to learn enough to work on sound
+ support.
+ <rekado>
+ http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/sound/
+ says drivers should be ported to GNU Mach as a first step.
+ <rekado> Is this information still current or should the existing Linux
+ driver be wrapped with DDE instead?
+ <auronandace> if i recall correctly dde is currently only being used for
+ network drivers. i'm not sure how much work would be involved for sound
+ or usb
+
+
+## IRC, freenode, #hurd, 2013-09-02
+
+ <rekado> The sound support proposal
+ (http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/sound/)
+ recommends porting some other kernel's sound driver to GNU Mach. Is this
+ still current or should DDE be used instead?
+ <pinotree> rekado: dde or anything userspace-based is generally preferred
+ <braunr> rekado: both are about porting some other kernel's sound driver
+ <braunr> dde is preferred yes
+ <rekado> This email says that sound drivers are already partly working with
+ DDE: http://os.inf.tu-dresden.de/pipermail/l4-hackers/2009/004291.html
+ <rekado> So, should I just try to get some ALSA kernel parts to compile
+ with DDE?
+ <pinotree> well, what is missing is also the dde←→hurd glue
+ <braunr> rekado: there is also a problem with pci arbitration
+ <rekado> pinotree: I assumed DDEKit works with the hurd and we could use
+ any DDE/<other kernel> glue code with it
+ * rekado looks up pci arbitration
+ <pinotree> only for networking atm
+ <rekado> ah, I see.