diff options
Diffstat (limited to 'community')
-rw-r--r-- | community/gsoc/2013/hacklu.mdwn | 1482 | ||||
-rw-r--r-- | community/gsoc/2013/nlightnfotis.mdwn | 2587 | ||||
-rw-r--r-- | community/gsoc/project_ideas/download_backends.mdwn | 11 | ||||
-rw-r--r-- | community/gsoc/project_ideas/mtab/discussion.mdwn | 1167 | ||||
-rw-r--r-- | community/gsoc/project_ideas/object_lookups.mdwn | 29 | ||||
-rw-r--r-- | community/gsoc/project_ideas/sound/discussion.mdwn | 47 |
6 files changed, 5317 insertions, 6 deletions
diff --git a/community/gsoc/2013/hacklu.mdwn b/community/gsoc/2013/hacklu.mdwn index d0185c60..b7de141b 100644 --- a/community/gsoc/2013/hacklu.mdwn +++ b/community/gsoc/2013/hacklu.mdwn @@ -615,3 +615,1485 @@ In context of [[open_issues/libpthread/t/fix_have_kernel_resources]]: found that. <tschwinge> hacklu: That's how I found it, yes. <hacklu> tschwinge: :) + + +# IRC, freenode, #hurd, 2013-07-14 + + <hacklu> hi. what is a process's msgport? + <hacklu> And where can I find the msg_sig_post_untraced_request()? + <hacklu> (msg_sig_post* in [hurd]/hurd/msg_defs) + <hacklu> this is my debugger demo code + https://github.com/hacklu/HDebugger.git use make test to run the demo. I + put a breakpoint before the second printf in hello_world(inferior + program). but I can't resume execution from that. + <hacklu> could somebody give me some suggestions? thanks so much. + <teythoon> hacklu: % make test + <teythoon> make: *** No rule to make target `exc_request_S.c', needed by + `all'. Stop. + <hacklu_> teythoon: updated, forget to git add that file . + <teythoon> hacklu_: cool, seems to work now + <teythoon> will look into this tomorrow :) + <hacklu_> exit + <hacklu_> teythoon: not work. the code can,t resume from a breakpoint + + +# IRC, freenode, #hurd, 2013-07-15 + + <hacklu> hi, this is my weekly + report. http://hacklu.com/blog/gsoc-weekly-report4-148/ + <hacklu> sadly to unsolve the question of resume from breakpoint. + <teythoon> hacklu: have you tried to figure out what gdb does to resume a + process? + <hacklu> teythoon: hi. em, I have tried, but haven't find the magic in gdb + yet. + <teythoon> have you tried rpctrace'ing gdb? + <hacklu> no, rpctrace has too many noise. I turned on the debug in gdb. + <hacklu> I don't want rpctrace start gdb as its child task. if it can + attach at some point instead of at start + <teythoon> hacklu: you don't need to use gdb interactively, you could pipe + some commands to it + <hacklu> teythoon: that sounds a possible way. I am try it, thank you + <hacklu> youpi: gdb can't work correctlly with rpctrace even in batch + mode. + <hacklu> get something like this "rpctrace: get an unknown send right from + process 2151" + <youpi> hacklu: well, ideally, fix rpctrace ); + <youpi> ;) + <youpi> hacklu: but you can also as on the list, perhaps somebody knows + what you need + <hacklu> ok. + <hacklu> or I should debug gdb more deeply. + <youpi> do both + <youpi> so either of them may win first + + <hacklu> braunr: I have found that, if there is no exception appears, the + signal thread will not be createed. Then there is only one thread in the + task. + + +# IRC, freenode, #hurd, 2013-07-17 + + <hacklu__> braunr: ping + <braunr> hacklu__: yes ? + <hacklu__> I have reply your email + <braunr> i don't understand + <braunr> "I used this (&_info)->suspend_count to get the sc value." + <braunr> before the thread_info call ? + <hacklu__> no, after the call + <braunr> but you have a null pointer + <braunr> the info should be returned in info, not _info + <hacklu__> strange thing is the info is a null pointer. but _info not + <braunr> _info isn't a pointer, that's why + <braunr> the kernel will use it if the data fits, which is usually the case + <hacklu__> in the begin , the info=&_info. + <braunr> and it will dynamically allocate memory if it doesn't + <braunr> yes + <braunr> info should still have that value after the call + <hacklu__> but the call had change it. this is what I can;t understand. + <braunr> are you completely sure err is 0 on return ? + <hacklu__> since the parameter is a pointer to pointer, the thread_info can + change it , but I don't think it is a good ideal to set it to null + pointer without any err . + <hacklu__> yes. i am sure + <braunr> info_len is wrong + <braunr> it should be the number of integers in _info + <braunr> i.e. sizeof(_info) / sizeof(unsigned int) + <braunr> i don't think that's the problem though + <braunr> yes, THREAD_BASIC_INFO_COUNT is already exactly that + <braunr> hm not exactly + <braunr> yes, exactly in fact + <hacklu__> I try to set it by hand, not use the macro. + <braunr> the macro is already defined as #define THREAD_BASIC_INFO_COUNT + (sizeof(thread_basic_info_data_t) / sizeof(natural_t)) + <hacklu__> the info_len is 13. I checked. + <braunr> so, i said something wrong + <braunr> the call doesn't reallocate thread_info + <braunr> it uses the provided storage, nothing else + <braunr> yes, your call is wrong + <braunr> use thread_info (thread->port, THREAD_BASIC_INFO, (int *) info, + &info_len); + <hacklu__> em. thread_info (thread->port, THREAD_BASIC_INFO, (int *) &info, + &info_len); + <braunr> &info would make the kernel erase the memory where info (the + pointer) was stored + <braunr> info, not &info + <braunr> or &_info directly + <braunr> i don't see the need for an intermediate pointer here + <braunr> ideally, avoid the cast + <hacklu__> but in gnu-nat.c line 3338, it use &info. + <braunr> use a union with both thread_info_data_t and + thread_basic_info_data_t + <braunr> well, try it my way + <braunr> i think they're wrong + <hacklu__> ok, you are right, use info it is ok. the value is the same as + &_info after the call. + <hacklu__> but the suspend_count is zero again. + <braunr> check the rest of the result to see if it's consistent + <hacklu__> I think this line need a patch. + <hacklu__> what you mean the rest of the result? + <braunr> the thread info + <braunr> run_state, sleep_time, creation_time + <braunr> see if they make sense + <hacklu__> ok, I try to dump it + <braunr> bbl + <hacklu__> braunr: thread [118] suspend_count=0 + <hacklu__> run_state=3, flags=1, sleep_time=0, + creation_time.second=1374079641 + <hacklu__> something like this, seems no problems. + + +# IRC, freenode, #hurd, 2013-07-18 + + <hacklu__> how to get the thread state from TH_STATE_WAITING to + TH_STATE_RUNNING + <braunr> hacklu__: + http://www.gnu.org/software/hurd/gnumach-doc/Thread-Execution.html#Thread-Execution + <braunr> hacklu__: ah waiting + <braunr> hacklu__: this means the thread is waiting for an event + <braunr> so probably waiting for a message + <braunr> or an internal kernel event + <hacklu__> braunr: so I need to send it a message. I think I maybe forget + to send some reply message. + <braunr> hacklu__: i'm really not sure about those low level details + <braunr> confirm before doing anything + <hacklu__> the gdb has called msg_sig_post_untraced_request(), I don't get + clear about this function, I just call it as the same, maybe I am wrong . + <hacklu__> how will if I send a CONT to the stopped process? maybe I should + try this. + <hacklu__> when the inferior is in waiting + status(TH_STATE_WAITING,suspend_count=0), I use kill to send a CONT. then + the become(TH_STATE_STOP,suspend_count=1). when I think I am near the + success,I call thread_resume(),inferior turn out to be (TH_STATE_WAITING, + suspend_count=0). + <braunr> so yes, probably waiting for a message + <hacklu__> braunr: after send a CONT to the inferior, then send a -9 to the + debugger, the inferior continue!!! + <braunr> probably because it was notified there wasn't any sender any more + <hacklu__> that's funny, I will look deep into thread_resume and kill + <braunr> (gdb being the sender here) + <hacklu__> in hurd, when gdb attach a inferior, send signal to the + inferior, who will get the signal first? the gdb or the inferior? + <hacklu__> quite differnet with linux. seems the inferior get first + <braunr> do you mean gdb catches its own signal through ptrace on linux ? + <hacklu__> kkk + <braunr> ? + + +# IRC, freenode, #hurd, 2013-07-20 + + <hacklu> braunr: yeah, on Linux the gdb catch the signal from inferior + before the signal handler. And that day my network was broken, I can't + say goodbye to you. sorry for that. + + +# IRC, freenode, #hurd, 2013-07-22 + + <hacklu> hi all, this is my weekly + report. http://hacklu.com/blog/gsoc-weekly-report5-152/ + <teythoon> good to hear that you got the resume issue figured out + <hacklu> teythoon: thanks :) + <teythoon> hacklu: so your next step is to port gdbserver to hurd? + <hacklu> yep, I am already begin to. + <hacklu> before the mid-evaluate, I must submit something. I am far behind + my personal expections + <tschwinge> hacklu: You've made great progress! Sorry, for not being able + to help you very much: currently very busy with work. :-| + <tschwinge> hacklu: Working on gdbserver now is fine. I understand you + have been working on HDebugger to get an understanding of how everyting + works, outside of the huge GDB codebase. It's of course fine to continue + working on HDebugger to test things, etc., and that also counts very much + for the mid-term evaluation, so nothing to worry about. :-) + <hacklu> but I have far away behind my application on GSOC. I haven't + submit any patches. is it ok? + <tschwinge> hacklu: Don't worry. Before doing the actual work, things + always look much simpler than they will be. So I was expecting/planning + for that. + <tschwinge> The Hurd system is complex, with non-trivial and sometimes + asynchronous communication between the different components, and so it + takes some time to get an understanding of all that. + <hacklu> yes, I haven't get all clear about the signal post. that's too + mazy. + <tschwinge> hacklu: It surely is, yes. + <hacklu> tschwinge: may you help me to understand the msg_sig_post(). I + don't want to understand all details now, but I want to get the _right_ + understanding of the gerneral. + <hacklu> as I have mentioned on my weekly report, gdb is listening on the + inferior's exception port, then gdb post a signal to that port. That + says: gdb post a message to herself, and handle it. is this right? + <hacklu> tschwinge: [gdb]/gdb/gnu-nat.c (line 1371), and + [glibc]/hurd/hurdsig.c(line 1390) + <tschwinge> hacklu: My current understanding is that this is a "real" + signal that is sent to the debugged process' signal thread (msgport), and + when that process is resumed, it will process that signal. + <tschwinge> hacklu: This is different from the Mach kernel sending an + exception signal to a thread's exception port, which GDB is listening to. + <tschwinge> Or am I confused? + <hacklu> is the msgport equal the exception port? + <hacklu> in my experience, when the thread haven't cause a exception, the + signal thread will not be created. after the exception occured, the + signal thread is come out. so somebody create it, who dose? the mach + kernel? + <tschwinge> hacklu: My understanding is that the signal thread would always + be present, because it is set up early in a process' startup. + <hacklu> but when I call task_threads() before the exception appears, only + on thread returned. + <tschwinge> "Interesting" -- another thing to look into. + <tschwinge> hacklu: Well, you must be right: GDB must also be listening to + the debugged process' msgport, because otherwise it wouldn't be able to + catch any signals the process receives. Gah, this is all too complex. + <hacklu> tschwinge: that's maybe not. gdb listening on the task's exception + port, and the signal maybe handle by the signal thread if it could + handle. otherwise the signal thread pass the exception to the task's + exception port where gdb catched. + <tschwinge> hacklu: Ah, I think I now get it. But let me first verify... + ;-) + + <hacklu> something strange. I have write a program to check whether create + signal threads at begining, the all created! + <hacklu> tschwinge: this is my test code and + result. http://pastebin.com/xtM6DUnG + cat test.c + #define _GNU_SOURCE 1 + #include <stdlib.h> + #include <stdio.h> + #include <errno.h> + #include <mach.h> + #include <mach_error.h> + int main(int argc,char** argv) + { + mach_port_t task_port; + thread_array_t threads[5]; + mach_msg_type_number_t num_threads[5]; + error_t err; + task_port = mach_task_self(); + int i; + int j; + for(i=0;i<5;i++) + if(task_port){ + err = task_threads(task_port,&threads[i],&num_threads[i]); + if(err) + printf("err\n"); + } + for(i=0;i<5;i++){ + printf("===============\n"); + printf("has %d threads now\n",num_threads[i]); + for(j=0;j<num_threads[i];j++) + printf("thread[%d]=%d\n",j,threads[i][j]); + } + return 0; + } + + + and the output + ./a.out + =============== + has 2 threads now + thread[0]=87 + thread[1]=97 + =============== + has 2 threads now + thread[0]=87 + thread[1]=97 + =============== + has 2 threads now + thread[0]=87 + thread[1]=97 + =============== + has 2 threads now + thread[0]=87 + thread[1]=97 + =============== + has 2 threads now + thread[0]=87 + thread[1]=97 + <hacklu> tschwinge: the result is different with HDebugger case. + + <tschwinge> hacklu: It is my understanding that the two sig_post_untraced + RPC calls in inf_signal indeed are invoked on the real msgport (signal + thread) if the debugged process. + <tschwinge> That port is retrieved via the + INF_MSGPORT_RPC/INF_RESUME_MSGPORT_RPC macro, which invoked + proc_getmsgport on the proc server, and that will return (unless + overridden by proc_setmsgport, but that isn't done in GDB) the msgport as + set by [glibc]/hurd/hurdinit.c:_hurd_new_proc_init or _hurd_setproc. + <tschwinge> inf_signal is called from gnu_resume, which is via + [target_ops]->to_resume is called from target.c:target_resume, which is + called several places, for example infrun.c:resume which is used to a) + just resume the debugged process, or b) resume it and have it handle a + Unix signal (such as SIGALRM, or so), when using the GDB command »signal + SIGALRM«, for example. + <tschwinge> So such a signal would then not be intercepted by GDB itself. + <tschwinge> By the way, this is all just from reading the code -- I hope I + got it all right. + + <tschwinge> Another thing: In Mach 3 Kernel Principles, the standard + sequence described on pages 22, 23 is thread_suspend, thread_abort, + thread_set_state, thread_resume, so you should probably do that in + HDebugger too, and not call thread_set_state before. + <tschwinge> I would hope the GDB code also follows the standard sequence? + Can you please check that? + + <tschwinge> The one thing I'm now confused about is where/how GDB + intercepts the standard setup (probably in glibc's signaling mess?) so + that it receives any signals raised in the debugged process. + <tschwinge> But I'll have to continue later. + + <hacklu___> tschwinge: thanks for your detail answers. I don't realize that + the gnu_resume will resume for handle a signal, much thanks for point + this:) + <hacklu___> tschwinge: I am not exactly comply with <Mach 3 kernel + principles> when I call thread_set_state. but I have called a + task_suspend before. I think it's not too bad:) + <tschwinge> hacklu___: Yes, but be aware that gnu_resume is only relevant + if a signal is to be forwarded to the debugged process (to be handled + there), but not for the case where GDB intercepts the signal (such as + SIGSEGV), and handles it itself without then forwarding it to the + application. See the »info signals« GDB command. + <hacklu___> I also confused about when to start the signal thread. I will + do more experiment. + <hacklu___> I have found this: when the inferior is stop at a breakpoint, I + use kill to send a CONT to it, the HDebugger will get this message who + listening on the exception port. + + +# IRC, freenode, #hurd, 2013-07-28 + + <hacklu_> how to understand the rpctrace output? + <hacklu_> like this. 142<--143(pid15921)->proc_mark_stop_request (19 0) + 125<--1 + <hacklu_> 27(pid-1)->msg_sig_post_request (20 5 task108(pid15919)); + <hacklu_> what is the (pid-1)? the kernel? + <teythoon> 1 is /hurd/init + <hacklu_> pid-1 not means minus 1? + <teythoon> ah, funny, you're right... I dunno then + <teythoon> 2 is the kernel though + <hacklu_> the 142<--143 is port name? + <teythoon> could very well be, but I'm not sure, sorry + <hacklu_> the number must be the port name. + <teythoon> anyone knows why /hurd/init does not get dead name notifications + for /hurd/exec like it does for any other essential server? + <teythoon> as far as I can see it successfully asks for them + <teythoon> about rpctrace, it poses as the kernel for its children, parses + and relays any messages sent over the childrens message port, right? + + +# IRC, freenode, #hurd, 2013-07-29 + + <hacklu_> hi. this is my weekly + report. http://hacklu.com/blog/gsoc-weekly-report6-156/ + <teythoon> hacklu_: the inferior voluntarily stops itself if it gets a + signal and notifies its tracer? + <hacklu_> yes + <teythoon> what if it chose not to do so? undebugable program? + <hacklu_> debugged program will be set an flag so called + hurdsig_traced. normal program will handle the signal by himself. + <hacklu_> in my env, I found that when GDB attach a running program, gdb + will not catch the signal send to the program. May help me try it? + <teythoon> it doesn't? I'll check... + <teythoon> hacklu_: yes, you're right + <hacklu_> you can just gdb a loop program, and kill -CONT to it. If I do + this I will get "Can't wait for pid 12332:NO child processes" warning. + <teythoon> yes, I noticed that too + <teythoon> does gdb reparent the tracee? + <hacklu_> I don't think this is a good behavior. gdb should get inferior's + signal + <teythoon> absolutely + <hacklu_> In linux it does, not sure about hurd. but I think it should. + <teythoon> definitively. there is proc_child in process.defs, but that may + only be used once to set the parent of a process + <hacklu_> gdb doesn't set the inferior as its child process if attached a + running procss in HURD. + + <tschwinge> hacklu_: So you figured out this tracing/signal stuff. Great! + <hacklu_> tschwinge: Hi. not exactly. + <hacklu_> as I have mentioned, gdb can't get signal when attach to a + running process. + <hacklu_> I also want to know how to build glibc in hurd. I have got this " + relocation error: ./libc.so: symbol _dl_find_dso_for_object, version + GLIBC_PRIVATE not defined in file ld.so.1 with link time reference" when + use LD_PRELOAD=./my_build_glibc/libc.so + <tschwinge> hacklu: You can't just preload the new libc.so, but you'll also + need to use the new ld.so. Have a look at [glibc-build]/testrun.sh for + how to invoke these properly. Or, link with + »-Wl,-dynamic-linker=[glibc-build]/elf/ld.so,-rpath,[glibc-build]:[glibc-build]/elf + -L [glibc-build] -L [glibc-build]/elf«. If using the latter, I suggest + to also add »-Wl,-t« to verify that you're linking against the correct + libraries, and »ldd + <tschwinge> [executable]« to verify that [€xecutable] will load the correct + libraries when invoked. + <hacklu> I will try that, and I can't find this call + pthread_cond_broadcast(). which will called in the proc_mark_stop + <tschwinge> hacklu: Oh, right, you'll also need to add libpthread (I think + that's the directory name?) to the rpath and -L commands. + <hacklu> is libpthread a part of glibc or hurd? + <pinotree> glibc + <NlightNFotis> hacklu: it is a different repository available here + http://git.savannah.gnu.org/cgit/hurd/libpthread.git/ + <hacklu> tschwinge: thanks for that, but I don't think I need help about + the comiler error now, it just say missing some C file. I will look into + the Makefile to verify. + <NlightNFotis> but I think it's a part of glibc as a whole + <tschwinge> hacklu: OK. + <tschwinge> glibc is/was a stand-alone package and library, but in Debian + GNU/Hurd is nowadays integrated into glibc's build process. + <hacklu> NlightNFotis: thanks. I only add hurd, glibc, gdb,mach code to my + cscope file. seems need to add libpthread. + <tschwinge> hacklu: If you use the Debian glibc package, our libpthread + will be in the libpthread subdirectory. + <tschwinge> Ignore nptl, which is used for the Linux kernel. + <hacklu> tschwinge:BTW, I have found that, to continue the inferior from a + breakpoint, doesn't need to call msg_sig_post_untraced. just call + thread_abort and thread_resume is already ok. + <hacklu> I get the glibc from http://git.savannah.gnu.org/cgit/hurd. + <tschwinge> hacklu: That sounds about right, because you want the inferior + to continue normally, instead of explicitly sending a (Unix) signal to + it. + <tschwinge> hacklu: I suggest you use: »apt-get source eglibc« on your Hurd + system. + <tschwinge> hacklu: The Savannah repository does not yet have libpthread + integrated. I have this on my TODO list... + <hacklu> tschwinge: no, apt-get source doesn't work in my Hurd. I got any + code from git clone *** + <pinotree> you most probably lack the deb-src entry in your sources.list + <tschwinge> hacklu: Do you have deb-src lines in /etc/apt/source-list? Or + how does it fail? + <hacklu> tschwinge: I have deb-src lines. and apt-get complain that: E: + Unable to find a source package for eglibc or E: Unable to find a source + package for glibc + <youpi> hacklu: which deb-src lines do you have? + <hacklu> and piece of my source_list : deb + http://ftp.debian-ports.org/debian unreleased main deb-src + http://ftp.debian-ports.org/debian unreleased main + <youpi> you also need a deb-src line with the main archive + <youpi> deb-src http://cdn.debian.net/debian unstable main + <tschwinge> hacklu: Oh, hmm. And you did run »apt-get update« before? + That aside, there also is <http://snapshot.debian.org/package/eglibc/> + that you can use. You'll need the *.dsc and *.debian.tar.xz files + corresponbding to your version of glibc, and the *.orig.tar.xz file. And + then run »dpkg-source -x *.dsc«. + <tschwinge> The Debian snapshot is often very helpful if you need source + packages that are no longer in the main Debian repository. + <youpi> or simply running dget on the dsc url + <tschwinge> Oh. Good to know. + <youpi> e.g. dget + http://cdn.debian.net/debian/pool/main/e/eglibc/eglibc_2.17-7.dsc + <hacklu> the network is slowly. and I am in apt-get update. + <youpi> I will be away from this evening until sunday, too + <hacklu> what the main difference between the source site? + <hacklu> is dget means wget? + <pinotree> no + <hacklu> not exist in linux? + <pinotree> it does, in devscripts + <pinotree> it's a debian tool + <hacklu> oh, yes, I have installed devscripts. + <hacklu> I have got the libphread code, thanks. + + <braunr> teythoon: the simple fact that this msg thread exists to receive + requests and that these requests are sent by ps and procfs is a potential + DoS + <teythoon> braunr: but does that mean that on Hurd a process can prevent a + debugger from intercepting signals? + <braunr> teythoon: yes + <braunr> that's not a problem for interactive programs + <braunr> it's part of the hurd design that programs have limited trust in + each other + <braunr> a user can interrupt his debugger if he sees no activity + <braunr> that's more of a problem for non interactive system stuff like + init scripts + <braunr> or procfs + <hacklu> why gdb can't get inferior's signal if attach a running process? + <braunr> hacklu: try to guess + <hacklu> braunr: it is not a reasonable thing. I always think it should + catch the signal. + <braunr> hacklu: signals are a unix thing built on top of mach + <braunr> hacklu: think in terms of ports + <braunr> all communication on the hurd goes through ports + <hacklu> but when use gdb to start a process and debugg it, this way, gdb + can catch the signal + <braunr> hacklu: my guess is : + <braunr> when starting a process, gdb can act as a proxy, much like + rpctrace + <braunr> when attaching, it can't + <hacklu> braunr: ah, my question should ask like this: why gdb can't set + the inferior as its child process when attaching it? or it can not ? + <braunr> hacklu: i'm not sure, the proc server is one of the parts i know + the less + <braunr> but again, i guess there is no facility to update the msg port of + a process in the proc server + <braunr> check that before taking it as granted + <hacklu> braunr: aha, I alway think you know everything:) + <tschwinge> braunr: There is: setmsgport or similar. + <braunr> if there is one, gdb doesn't use it + <tschwinge> hacklu: That is a good question -- I can't answer it off-hand, + but it might be possible (by setting the tracing flag, and such things). + Perhaps it's just a GDB bug, which omits to do that. Perhaps just a + one-line code change, perhaps not. That's a new bug (?) report that we + may want to have a look at later on. + <tschwinge> hacklu: But also note, this new problem is not really related + to your gdbserver work -- but of course you're fine to have a look at it + if you'd like to. + <hacklu> I just to ask for whether this is a normal behavior. this is + related to my gdbserver work, as gdbserver also need to attach a running + process... + <braunr> gdbserver can start a process just like gdb does + <braunr> you may want to focus on that first + <tschwinge> Yes. + <tschwinge> Attaching to processes that are already running is, I think, + always more complicated compared to the case where GDB/gdbserver has + complete control about the inferior right from the beginning. + <hacklu> yes, I am only focus on start one. the attach way I haven't + research now. + <tschwinge> hacklu: That's totally fine. You can just say that attaching + to processes is not supported yet. + <hacklu> that's sound good:) + <tschwinge> Ther will likely be more things in gdbserver that you won't be + able to easily support, so it's fine to do it step-by-step. + <tschwinge> And then later add more features incrementally. + <tschwinge> That's also easier for reviewing the patches. + + <hacklu> and one more question I have ask yestoday. what is the rpctrace + output (pid-1) mean? + <tschwinge> hacklu: Another thing I can't tell off-hand. I'll try to look + it up. + <teythoon> hacklu, tschwinge: my theory is that it is in fact an error + message, maybe the proc server did not now a pid for the task + <braunr> hacklu: utsl + <hacklu> tschwinge: for saving your time, I will look the code myself, I + don;t think this is a real hard question need you to help me by reading + the source code. + <tschwinge> teythoon, hacklu: Yes, from a quick inspection it looks like + task2pid returning a -1 PID -- but I can't tell yet what that is supposed + to mean, if it's an actualy bug, or just means there is no data + available, or similar. + <hacklu> braunr: utsl?? + <tschwinge> hacklu: http://www.catb.org/~esr/jargon/html/U/UTSL.html + <hacklu> tschwinge: thank you. braunr like say abbreviation which I can't + google out. + <tschwinge> hacklu: Again, if this affects your work, it is fine to have a + look at that presumed rpctrace problem, if not, it is fine to have a look + at it if you'd like to, and otherwise, we'll file it as a possible bug to + be looked at laster. + <tschwinge> hacklu: Now you learned that one. :-) + <hacklu> tschwinge: ok , this doesn't affect me now. If I have time I will + figure out it. + + <teythoon> btw, what about the copyright assignment process? + <tschwinge> teythoon, hacklu: You still haven't heard from the FSF about + your copyright assignments? What's the latest you have heard? + <hacklu> tschwinge: I have wrote a emali to ask for that, but no reply. + <teythoon> tschwinge: last and only response I got was on July 1st, the + last ping with explicit request for confirmation was on July the 12th + <tschwinge> hacklu: When did you send this email? + <hacklu> tschwinge: last week. + <tschwinge> teythoon: I suggest you send another inquiry, and please put me + in CC. And if there'S no answer within a couple days (well, I'm away + until Monday...), I'll follow up. + <tschwinge> hacklu: Likewise for you; depending on when exactly ;-) you + sent the last email. (Always allow for a few days until you exect an + answer, but if nothing happend within a week for such rather simple + administrative tasks, better ask again, unfrotunately.) + <hacklu> tschwinge:ok , I will email more + + <hacklu> how to understand the asyn RPC? + <braunr> hacklu: hm ? + <hacklu> for instance, [hurd]/proc/main.c proc_server is loop in listening + message. and handle it by message_demuxer. + <hacklu> but when I send a request like proc_wait_request() to it, will it + block in the message_demuxer? + <hacklu> and where is the function of + ports_manage_port_operations_multithread()? + <braunr> this one is in libports + <braunr> it's the last thing a server calls after bootstrapping itself + <braunr> message_demuxer normally blocks, yes + <braunr> but it's not "async" + <hacklu> the names seems the proc_server is listening message with many + threads? + <braunr> every server in the hurd does + <braunr> threads are created by ports_manage_port_operations_multithread + when incoming messages can't be processed quick enough by the set of + already existing threads + <hacklu> if too many task send request to the server, will it ddos? + <braunr> yes + <teythoon> every server but /hurd/init + <braunr> (and /hurd/hello) + <braunr> hacklu: that's, in my opinion, a major design defect + <hacklu> yes, that is reasonable. + <braunr> that's what causes what i like to call thread storms on message + floods ... :) + <braunr> my hurd clone is intended to address such major issues + <teythoon> couldn't that be migitated by some kind of heuristic? + <braunr> it already is .. + <hacklu> I don't image that the port_manage_port_operations_multithread + will dynamically create threads. I thought the server will hang if all + work thread is in use. + <braunr> that would also be a major defect + <braunr> creating as many threads as necessary is a good thing + <braunr> the problem is the dos + <braunr> hacklu: btw, ddos is "distributed" dos, and it doesn't really + apply to what can happen on the hurd + <hacklu> why not ? as far as I known, the message transport is + transparent. hurd has the chance to be DDOSed + <braunr> we don't care about the distributed property of the dos + <hacklu> oh, I know what you mean. + <braunr> it simply doesn't matter + <braunr> on thread calling select in an event loop with a low timeout (high + frequency) on a bunch of file descriptors is already enough to generate + many dead-name notifications + <tschwinge> Oh! Based on what I've read in GDB source code, I thought the + proc server was single-threaded. However, it no longer is, after 1996's + Hurd commit fac6d9a6d59a83e96314103b3181f6f692537014. + <braunr> those notifications cause message flooding at servers (usually + pflocal/pfinet), which spawn a lot of threads to handle those messages + <braunr> one* thread + <hacklu> tschwinge: ah, the comment in gnu_nat.c is out of date! + <braunr> hacklu: and please, please, clean the hello_world processes you're + creating on darnassus + <braunr> i had to do it myself again :/ + <hacklu> braunr: [hacklu@darnassus ~]$ ps ps: No applicable processes + <braunr> ps -eflw + <braunr> htop + <tschwinge> hacklu: Probably the proc_wait_pid and proc_waits_pending stuff + could be simplified then? (Not an urgent issue, of course, will file as + an improvement for later.) + <hacklu> braunr: ps -eflw |grep hacklu + <hacklu> 1038 12360 10746 26 26 2 87 22 148M 1.06M 97:21001 S + p1 0:00.00 grep --color=auto hacklu + <braunr> 15:08 < braunr> i had to do it myself again :/ + <teythoon> braunr: so as a very common special case, a lot of dead name + notifications cause problems for pf*? + <braunr> and use your numeric uid + <braunr> teythoon: yes + <hacklu> braunr: I am so sorry. I only used ps to check. forgive me + <braunr> teythoon: simply put, a lot of messages cause problems + <braunr> select is one special use case + <teythoon> braunr: blocking other requests? + <braunr> the other is page cache writeback + <braunr> creating lots of threads + <braunr> potentially deadlocking on failure + <braunr> and in the case of writebacks, simply starving + <teythoon> braunr: but dead name notifications should mostly trigger + cleanup actions, couldn't those be handled by a different thread(pool) + than the rest? + <braunr> that's why you can bring down a hurd system with a simple cp + bigfile somewhere, bigfile being a few hundreds MiBs + <braunr> teythoon: it doesn't change the problem + <braunr> threads are per task + <braunr> and the contention would remain the same + <teythoon> hm + <braunr> since dead-name notifications are meant to release resources + created by what would then be "regular" threads + <braunr> don't worry, there is a solution + <braunr> it's simple + <braunr> it's well known + <braunr> it's just hard to directly apply to the hurd + <braunr> and impossible to enforce on mach + <hacklu> tschwinge: I am confuzed after I have look into S_proc_wait() + [hurd/proc/wait.c], it has relate pthread_hurd_cond_wait_np. I can't find + out when it will return. And the signal is report to the debuger by + S_proc_wait. + <teythoon> braunr: a pointer please ;) + <braunr> teythoon: basically, synchronous ipc + <braunr> then, enforcing one server thread per client thread + <braunr> and replace mach-generated notifications with messages sent from + client threads + <braunr> the only kind of notification required by the hurd are no-senders + notifications + <braunr> this happens when a client releases all references it has to a + resource + <braunr> so it's easy to make that synchronous as well + <braunr> trying to design RPCs as closely as system calls on monolithic + kernels helps in viewing how this works + <braunr> the only real additions are address space crossing, and capability + invocation + <teythoon> sounds reasonable, why is it hard to apply to the hurd? most + rpcs are synchonous, no? + <braunr> mach ipc isn't + <hacklu> braunr: When client C send a request to server S, but doesn't wait + for the reply message right now, for a while, C call mach_msg to recieve + reply. Can I think this is a synchronous RPC? + <braunr> a malicious client can still overflow message queues + <braunr> hacklu: no + <teythoon> yes, I can see how this is impossible to enforce, but still we + could all try to play nice :) + <braunr> teythoon: no + <braunr> :) + <braunr> async ipc is heavy, error-prone, less performant than sync ipc + <braunr> some async ipc is necessary to handle asynchronous events, but + something like unix signals is actually a lot more appropriate + <braunr> we're diverging from the gsoc though + <braunr> don't waste too much time on that + <teythoon> 15:13 < braunr> it's just hard to directly apply to the hurd + <teythoon> I wont + <teythoon> why is it hard + <braunr> almost everything is synchronous on the hurd + <braunr> except a few critical bits + <braunr> signals :) + <braunr> and select + <braunr> and pagecache writebacks + <braunr> fixing those parts require some work + <braunr> which isn't trivial + <braunr> for example, select should be rewritten not to use dead-name + notifications + <teythoon> adding a light weight signalling mechanism to mach and using + that instead of async ipc? + <braunr> instead of destroying ports once an event has been received, it + should (synchyronously) remove the requests installed at remote servers + <braunr> uh no + <braunr> well maybe but that would be even harder + <tschwinge> hacklu: This (proc/wait.c) is related to POSIX thread + cancellation -- I don't think you need to be concerned about that. That + function's "real" exit points are earlier above. + <braunr> teythoon: do you understand what i mean about select ? + <teythoon> ^^ is that a no go area? + <braunr> for now it is + <braunr> we don't want to change the mach interface too much + <teythoon> yes, I get the point about select, but I haven't looked at its + implementation yet + <hacklu> tschwinge: when I want to know the child task's state, I call + proc_wait_request(), unless the child's state not change. the + S_proc_wait() will not return? + <braunr> it creates ports, puts them in a port set, gives servers send + rights so they can notify about events + <teythoon> y not? it's not that hurd is portable to another mach, or is it? + and is there another that we want to be compatible with? + <braunr> when an event occurs, all ports are scanned + <braunr> then destroyed + <braunr> on destruction, servers are notified by mach + <braunr> the problem is that the client is free to continue and make more + requests while existing select requests are still being cancelled + <teythoon> uh, yeah, that sounds like a costly way of notifying somewone + <braunr> the cost isn't the issue + <braunr> select must do something like that on a multiserver system, you + can't do much about it + <braunr> but it should be synchronous, so a client can't make more requests + to a server until the current select call is complete + <braunr> and it shouldn't use a server approach at the client side + <braunr> client -> server should be synchronous, and server -> client + should be asynchronous (e.g. using a specific SIGSELECT signal like qnx + does) + <braunr> this is a very clean way to avoid deadlocks and denials of service + <teythoon> yes, I see + <braunr> qnx actually provides excellent documentation about these issues + <braunr> and their ipc interface is extremely simple and benefits from + decades of experience on the subject + <tschwinge> hacklu: This function implements the POSIX wait call, and per + »man 2 wait«: »The wait() system call suspends execution of the calling + process until one of its children terminates.« + <tschwinge> hacklu: This is implemented in glibc in sysdeps/posix/wait.c, + sysdeps/unix/bsd/bsd4.4/waitpid.c, sysdeps/mach/hurd/wait4.c, by invoking + this RPC synchronously. + <tschwinge> hacklu: GDB on the other hand, uses this infrastructure (as I + understand it) to detect (that is, to be informed) when a debuggee exits + (that is, when the inferior process terminates). + <tschwinge> hacklu: Ah, so maybe I miss-poke earlier: the + pthread_hurd_cond_wait_np implements the blocking. And depending on its + return value the operation will be canceled or restarted (»start_over«). + <tschwinge> s%maybe%% + <tschwinge> hacklu: Does this information help? + <hacklu> tschwinge: proc_wait_request is not only to detect the inferior + exit. it also detect the child's state change + <braunr> as tschwinge said, it's wait(2) + <hacklu> tschwinge: and I have see this, when kill a signal to inferior, + the gdb will get the message id=24120 which come from S_proc_wait + <hacklu> braunr: man 2 wait says: wait, waitpid, waitid - wait for process + to change state. (in linux, in hurd there is no man wait) + <braunr> uh + <braunr> there is, it's the linux man page :) + <braunr> make sure you have manpages-dev installed + <hacklu> I always think we are talk about linux's manpage :/ + <hacklu> but regardless the manpage, gdb really call proc_wait_request() to + detect whether inferior's changed states + <braunr> in any case, keep in mind the hurd is intended to be a posix + system + <braunr> which means you can always refer to what wait is expected to do + from the posix spec + <braunr> see + http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html + <hacklu> braunr: even in the manpags under hurd, man 2 wait also says: wait + for process to change state. + <braunr> yes + <braunr> that's what it's for + <braunr> what's the problem ? + <hacklu> the problem is what tschwinge has said I don't understand. like + and per »man 2 wait«: »The wait() system call suspends execution of the + calling process until one of its children terminates.« + <braunr> terminating is a form of state change + <braunr> historically, wait was intended to monitor process termination + only + <hacklu> so the thread become stoped wait also return + <braunr> afterwards, process tracing was added too + <braunr> what ? + <hacklu> so when the child state become stopped, the wait() call will + return? + <braunr> yes + <hacklu> and I don't know this pthread_hurd_cond_wait_np. + <braunr> wait *blocks* until the process it references changes state + <braunr> pthread_hurd_cond_wait_np is the main blocking function in hurd + servers + <braunr> well, pthread_hurd_cond_timedwait_np actually + <braunr> all blocking functions end up there + <braunr> (or in mach_msg) + <braunr> (well pthread_hurd_cond_timedwait_np calls mach_msg too) + <hacklu> since I use proc_wait_request to get the state change, so the + thread in proc_server will be blocked, not me. is that right? + <braunr> no + <braunr> both + <hacklu> this is just a request, why should block me? + <braunr> because you're waiting for the reply afterwards + <braunr> or at least, you should be + <braunr> again, i'm not familiar with those parts + <hacklu> after call proc_wait_request(), gdb does a lot stuffs, and then + call mach_msg to recieve reply. + <braunr> ok + <hacklu> I think it will be blocked only in mach_msg() if need. + <braunr> usually, xxx_request are the async send-only versions of RPCs + <tschwinge> Yes, that'S my understanding too. + <braunr> and xxx_reply the async receive-only + <braunr> so that makes sense + <hacklu> so I have ask you is it a asyn RPC. + <braunr> yes + <braunr> 15:18 < hacklu> braunr: When client C send a request to server S, + but doesn't wait for the reply message right now, for a while, C call + mach_msg to recieve reply. Can I think this is a synchronous RPC? + <braunr> 15:19 < braunr> hacklu: no + <braunr> if it's not synchronous, it's asynchronous + <hacklu> sorry, I spell wrong. missing a 'a' :/ + <tschwinge> S_proc_wait_reply will then be invoked once the procserver + actually answers the "blocking" proc_wait call. + <tschwinge> Putting "blocking" in quotes, because (due to the asyncoronous + RPC invocation), GDB has not actually blocked on this. + <braunr> well, it doesn't call proc_wait + <hacklu> tschwinge: yes, the S_proc_wait_reply is called by + process_reply_server(). + <hacklu> tschwinge: so the "blocked" one is the thread in proc_server . + <tschwinge> braunr: Right. »It requests the proc_wait service.« + <braunr> gdb will also block on mach_msg + <braunr> 16:05 < braunr> both + <hacklu> braunr: yes, if gdb doesn't call mach_msg to recieve reply it will + not be blocked. + <braunr> i expect it will always call mach_msg + <braunr> right ? + <hacklu> braunr: yes, but before it call mach_msg, it does a lot other + things. but finally will call mach_msg + <braunr> that's ok + <braunr> that's the kind of things asynchronous IPC allows + <hacklu> tschwinge: I have make a mistake in my week report. The signal + recive by inferior is notified by the proc_server, not the + send_signal. Because the send_singal send a SIGCHLD to gdb's msgport not + gdbself. That make sense. + + +# IRC, freenode, #hurd, 2013-07-30 + + <hacklu> braunr: before I go to sleep last night, this question pop into my + mind. How do you find my hello_world is still alive on darnassus? The + process is not a CPU-heavy or IO-heavy guy. You will not feel any + performance penalization. I am so curious :) + <teythoon> hacklu: have you looked into patching the proc server to allow + reparenting of processes? + <hacklu> teythoon:not yet + <teythoon> hacklu: i've familiarized myself with proc in the last week, + this should get you started nicely: http://paste.debian.net/19985/ + diff --git a/proc/mgt.c b/proc/mgt.c + index 7af9c1a..a11b406 100644 + --- a/proc/mgt.c + +++ b/proc/mgt.c + @@ -159,9 +159,12 @@ S_proc_child (struct proc *parentp, + if (!childp) + return ESRCH; + + + /* XXX */ + if (childp->p_parentset) + return EBUSY; + + + /* XXX if we are reparenting, check permissions. */ + + + mach_port_deallocate (mach_task_self (), childt); + + /* Process identification. + @@ -176,6 +179,7 @@ S_proc_child (struct proc *parentp, + childp->p_owner = parentp->p_owner; + childp->p_noowner = parentp->p_noowner; + + + /* XXX maybe need to fix refcounts if we are reparenting, not sure */ + ids_rele (childp->p_id); + ids_ref (parentp->p_id); + childp->p_id = parentp->p_id; + @@ -183,11 +187,14 @@ S_proc_child (struct proc *parentp, + /* Process hierarchy. Remove from our current location + and place us under our new parent. Sanity check to make sure + parent is currently init. */ + - assert (childp->p_parent == startup_proc); + + assert (childp->p_parent == startup_proc); /* XXX */ + if (childp->p_sib) + childp->p_sib->p_prevsib = childp->p_prevsib; + *childp->p_prevsib = childp->p_sib; + + + /* XXX we probably want to keep a reference to the old + + childp->p_parent around so that if the debugger dies or detaches, + + we can reparent the process to the old parent again */ + childp->p_parent = parentp; + childp->p_sib = parentp->p_ochild; + childp->p_prevsib = &parentp->p_ochild; + <teythoon> the code doing the reparenting is already there, but for now it + is only allowed to happen once at process creation time + <hacklu> teythoon: good job. This is in my todo list, when I implement + attach feature to gdbserver I will need this + <braunr> hacklu: i use htop + <teythoon> braunr: why is that process so disruptive? + <braunr> the big problem with those stale processes is that they're in a + state that prevents one important script to complete + <braunr> there is a bug on the hurd with regard to terminals + <braunr> when you log out of an ssh session, the terminal remains open for + some reason (bad reference counting somewhere, but it's quite tricky to + identify) + <braunr> to work around the issue, i have a cron job that calls a script to + kill unused terminals + <braunr> this works by listing processes + <braunr> your hello_world processes block that listing + <teythoon> uh, how so? + <hacklu> braunr: ok. I konw. + <braunr> teythoon: probably the denial of service we were talking about + yesterday + <teythoon> select flooding a server? + <braunr> no, a program refusing to answer on its msg port + <braunr> ps has an option -M : + <braunr> -M, --no-msg-port Don't show info that uses a process's + msg port + <braunr> the problem is that my script requires those info + <teythoon> ah, I see, right + <braunr> hacklu being working on gdb, it's not surprising he's messing with + that + <teythoon> yes indeed. couldn't ps use a timeout to detect that? + <hacklu> braunr: yes, once I have found ps will hang when I has run + hello_world in a breakpoint state. + <teythoon> braunr: thanks for explaining the issue, i always wondered why + that process is such big a deal ;) + <braunr> teythoon: how do you tell between processes being slow to answer + and intentionnally refusing to answer ? + <braunr> a timeout is almost never the right solution + <braunr> sometimes it's the only solution though, like for networking + <braunr> but on a system running on a local machine, there is usually + another way + <teythoon> braunr: I don't of course + <braunr> ? + <braunr> ah ok + <braunr> it was rethorical :) + <teythoon> yes I know, and I was implying that I wasn't expecting a timeout + to be the clean solution + <teythoon> and the current behaviour is hardly acceptable + <braunr> i agree + <braunr> it's ok for interactive cases + <braunr> you can use Ctrl-C, which uses a 3 seconds delay to interrupt the + client RPC if nothing happens + <teythoon> braunr: btw, what about *_reply.defs? Should I add a + corresponding reply simpleroutine if I add a routine? + <braunr> normally yes + <braunr> right, forgot about that + <teythoon> so that the procedure ids are kept in sync in case one wants to + do this async at some point in the future? + <braunr> yes + <braunr> this happened with select + <braunr> i had to fix the io interface + <teythoon> ok, noted + + +# IRC, freenode, #hurd, 2013-07-31 + + <hacklu> Do we need write any other report for the mid-evaluation? I have + only submit a question-answer to google. + + +# IRC, freenode, #hurd, 2013-08-05 + + <hacklu> hi, this is my weekly + report. http://hacklu.com/blog/gsoc-weekly-report7build-gdbserver-on-gnuhurd-164/ + <hacklu> youpi: can you show me some suggestions about how to design the + interface and structure of gdbserver? + <youpi> hacklu: well, I've read your blog entry, I was wondering about + tschwinge's opinion, that's why I asked whether he was here + <youpi> I would tend to start from an existing gdbserver, but as I haven't + seen the code at all, I don't know how much that can help + <hacklu> so you mean I shoule get a worked gdbserver then to improve it? + <youpi> I'd say so, but again it's not a very strong opinion + <youpi> I'd rather let tschwinge comment on this + <hacklu> youpi: ok :) + + <youpi> how about the copyright assignments? did hacklu or teythoon receive + any answer? + <teythoon> youpi: I did, the copyright clerk told me that he finally got my + papers and that everything is in order now + <youpi> few! + <youpi> s/f/ph + <youpi> teythoon: you mean all steps are supposed to be done now, or is he + doing the last steps? I don't see your name in the copyright folder yet + <teythoon> youpi: well, he said that he had the papers and they are about + to be signed + <youpi> teythoon: ok, so it's not finished, that's why your name is not on + the list yet + <youpi> this paper stuff is really a pain + <hacklu> youpi: I haven't got any answer from FSF now. + <youpi> did you ping them recently? + <hacklu> I have pinged 2 week ago. + <hacklu> what you mean of ping? I just write an email to him. Is it enough? + <youpi> yes + + +# IRC, freenode, #hurd, 2013-08-12 + + <hacklu> hi, this is my weekly report + http://hacklu.com/blog/gsoc-weekly-report8-168/ . sorry for so late. + + <youpi> hacklu: it seems we misunderstood ourselves last week, I meant to + start from the existing gdbserver implementation + <youpi> but never mind :) + <youpi> starting from the lynxos version was a good idea + <hacklu> youpi: em... yeah, the lynxos port is so clean and simple. + + <hacklu> youpi: aha, the "Remote connection closed" problem has been fixed + after I add a init_registers_i386() and set the structure target_desc. + <hacklu> but I don't get understand with the structure target_desc. I only + know it is auto-generated which configured by the configure.srv. + <tschwinge> Hi! + <tschwinge> hacklu: In gdbserver, you should definitely re-use existing + infrastructure, especially anything that deals with the + protocol/communication with GDB (that is, server.c and its support + files). + <tschwinge> hacklu: Then, for the x86 GNU Hurd port, it should be + implemented in the same way as an existing port. The Linux port is the + obvious choice, of course, but it is also fine to begin with something + simpler (like the LynxOS port you've chosen), and then we can still add + more features later on. That is a very good approach actually. + <tschwinge> hacklu: The x86 GNU Hurd support will basically consist of + three pieces -- exactly as with GDB's native x86 GNU Hurd port: x86 + processor specific (tge existing gdbserver/i386-low.c etc. -- shouldn't + need any modifications (hopefully)), GNU Hurd specific + (gdbserver/gnu-hurd-low.c (or similar)), and x86 GNU Hurd specific + (gdbserver/gnu-hurd-x86-low.c (or similar)). + <tschwinge> s%tge%the + <hacklu> tschwinge: now I have only add a file named gnu-low.c, I should + move some part to the file gnu-i386-low.c I think. + <tschwinge> hacklu: That's fine for the moment. We can move the parts + later (everything with 86 in its name, probably). + <hacklu> that's ok. + <hacklu> tschwinge: Can I copy code from gnu-nat.c to + gdbserver/gnu-hurd-low.c? I think the two file will have many same code. + <tschwinge> hacklu: That's correct. Ideally, the code should be shared + (for example, in a file in common/), but that's an ongoing discussion in + GDB, for other duplicated code. So, for the moment, it is fine to copy + the parts you need. + <tschwinge> hacklu: Oh, but it may be a good idea to add a comment to the + source code, where it is copied from. + <hacklu> maybe I can do a common-part just for hurd gdb port. + <tschwinge> That should make it easier later on, to consolidate the + duplicated code into one place. + <tschwinge> Or you can do that, of course. If it's not too difficult to + do? + <hacklu> I think at the begining it is not difficult. But when the + gdbserver code grow, the difference with gdb is growing either. That will + be too many #if else. + <tschwinge> I think we should check with the GDB maintainers, what they + suggest. + <tschwinge> hacklu: Please send an email To: <gdb@sourceware.org> Cc: + <lgustavo@codesourcery.com>, <thomas@codesourcery.com>, and ask about + this: you need to duplicate code that already exists in gnu-nat.c for new + gdbserver port -- how to share code? + <hacklu> tschwinge: ok, I will send the email right now. + <hacklu> tschwinge: need I cc to hurd mail-list? + <tschwinge> hacklu: Not really for that questions, because that is a + question only relevant to the GDB source code itself. + <hacklu> tschwinge: got it. + +[[!message-id +"CAB8fV=jzv_rPHP3-HQVBA-pCNZNat6PNbh+OJEU7tZgQdKX3+w@mail.gmail.com"]]. + + +# IRC, freenode, #hurd, 2013-08-19 + +<http://hacklu.com/blog/gsoc-weekly-report9-172/>. + + <hacklu__> when and where is the best time and place to get the regitser + value in gdb? + <youpi> well, I'm not sure to understand the question + <youpi> you mean in the gdb source code, right? + <youpi> isn't it already done in gdb? + <youpi> probably similarly to i386? + <youpi> (linux i386 I mean) + <hacklu__> I don't find the fetch_register or relate function implement in + gnu-nat.c + <hacklu__> so I can't make decision how to implement this in gdbserver. + <youpi> it's in i386gnu-nat.c, isn't it? + <hacklu__> yeah. + <youpi> does that answer your issue? + <hacklu__> thank you. I am so stupid + + +# IRC, freenode, #hurd, 2013-08-26 + + < hacklu> hello everyone, this is my week + report. http://hacklu.com/blog/gsoc-weekly-report10-174/ + + < hacklu> btw, my FSF copyright assignment has been concepted. They guy + said, they have recived my mail for a while but forget to handle it. + + < hacklu> but now I face a new problem, when I typed the first continue + command, gdb will continue all the breakpoint, and the inferior will run + until normally exit. + + +# IRC, freenode, #hurd, 2013-08-30 + + <hacklu> tschwinge: hi, does gdb's attach feature work correctlly on Hurd? + <hacklu> on my hurd-box, the gdb can't attach to a running process, after a + attaching, when I continue, gdb complained "can't find pid 12345" + <teythoon> hacklu: attaching works, not sure why gdb is complaining + <hacklu> teythoon: yeah, it can attaching, but can't contine process. + <hacklu> in this case, the debugger is useless if it can't resume execution + <teythoon> hacklu: well, gdb on Linux reacts a little differently, but for + me attaching and then resuming works + <hacklu> teythoon: yes, gdb on linux works well. + <teythoon> % gdb --pid 21506 /bin/sleep + <teythoon> [...] + <teythoon> (gdb) c + <teythoon> Continuing. + <teythoon> warning: Can't wait for pid 21506: No child processes + <teythoon> # pkill -SIGILL sleep + <teythoon> warning: Pid 21506 died with unknown exit status, using SIGKILL. + <hacklu> yes. I used a sleep program to test too. + <teythoon> I believe that the warning and deficiencies with the signal + handling are b/c on Hurd the debuggee cannot be reparented to the + debugger + <hacklu> oh, I remembered, I have asked this before. + <tschwinge> Confirming that attaching to a process in __sleep -> __mach_msg + -> mach_msg_trap works fine, but then after »continue«, I see »warning: + Can't wait for pid 4038: No child processes« and three times »Can't fetch + registers from thread bogus thread id 1: No such thread« and the sleep + process exits (normally, I guess? -- interrupted "system call"). + <tschwinge> If detaching (exit GDB) instead, I see »warning: Can't modify + tracing state for pid 4041: No such process« and the sleep process exits. + <tschwinge> Attaching to and then issueing »continue« in a process that is + not currently in a mach_msg_trap (tested a trivial »while (1);«) seems to + work. + <tschwinge> hacklu: ^ + <hacklu> tschwinge: in my hurdbox, if I just attach a while(1), the system + is near down. nothing can happen, maybe my hardware is slow. + <hacklu> so I can only test on the sleep one. + <hacklu> my gdbserver doesn't support attach feature now. the other basic + feather has implement. I am doing test and review the code now. + <tschwinge> Great! :-) + <tschwinge> It is fine if attaching does not work currently -- can be added + later. + <hacklu> btw, How can I submit my code? put the patch in email directly? + <tschwinge> Did you already run the GDB testsuite using your gdbserver? + <hacklu> no, haven't yet + <tschwinge> Either that, or a Git branch to pull from. + <hacklu> I think I should do more review and test than I submit patches. + <tschwinge> hacklu: See [GDB]/gdb/testsuite/boards/native-gdbserver.exp + (and similar files) for how to run the GDB testsuite with gdbserver. + <hacklu> ok. + <tschwinge> But don't be disappointed if there are still a lot of failures, + etc. It'll already be great if some basic stuff works. + <hacklu> now it can set and remove breakpoint. show register, access + variables. + <tschwinge> ... which already is enogh for a lot of debugging sessions. + :-) + <hacklu> I will continue to make it more powerful. + <hacklu> :) + <tschwinge> Yes, but please first work on polishing the existing code, and + get it integrated upstream. That will be a great milestone. + <tschwinge> No doubt that GDB maintainers will have lots of comments about + proper formatting of the source code, and such things. Trivial, but will + take time to re-work and get right. + <hacklu> oh, I got it. I will give my pathch before this weekend. + <tschwinge> Then once your basic gdbserver is included, you can continue to + implement additional features, piece by piece. + <tschwinge> And then we can run the GDB testsuite with gdbserver and + compare that there are no regressions, etc. + <tschwinge> Heh, »before the weekend« -- that's soon. ;-) + <hacklu> honestly to say, most of the code is copyed from other files, I + haven't write too many code myself. + <tschwinge> Good -- this is what I hoped. Often, more time in software + development is spent on integrating existing things rathen than writing + new code. + <hacklu> but I have spent a lot of time to get known the code and to debug + it to work. + <tschwinge> Thzis is normal, and is good in fact: existing code has already + been tested and documented (in theory, at least...). + <tschwinge> Yes, that's expected too: when relying on/reusing existing + code, you first have to understand it, or at least its interfaces. Doing + that, you're sort of "mentally writing the existing code again". + <tschwinge> So, this sounds all fine. :-) + <hacklu> your words make me happy. + <hacklu> :) + <tschwinge> Well, I am, because this seems to be going well. + <hacklu> thank you. I am going to coding now~~ + + +# IRC, freenode, #hurd, 2013-09-02 + + <hacklu> hi, this is my weekly + report. http://hacklu.com/blog/gsoc-weekly-report11-181/ + + <hacklu> please give me any advice on how to use mig to generate stub-files + in gdbserver? + <braunr> hacklu: + http://darnassus.sceen.net/gitweb/rbraun/slabinfo.git/blob/HEAD:/Makefile + <hacklu> braunr: shouldnt' I work like this + https://github.com/hacklu/gdbserver/blob/gdbserver/gdb/config/i386/i386gnu.mh + ? + <braunr> hacklu: seems that you need server code + <braunr> other than that i don't see the difference + <hacklu> gdb use autoconf to generate the Makefile, and part from the *.mh + file, but in gdbserver, there is no .mh like files. + <braunr> hacklu: why can't you reuse /i386gnu.mh ? + <hacklu> braunr: question is that, there are something not need in + /i386gnu.mh. + <braunr> hacklu: like what ? + <hacklu> braunr: like fork-child.o msg_U.o core-regset.o + <braunr> hacklu: well, adjust the dependencies as you need + <braunr> hacklu: do you mean they become useless for gdbserver but are + useful for gdb ? + <hacklu> braunr: yes, so I need another one gnu.mh file. + <hacklu> braunr: but the gdbserver's configure doesn't have any *.mh file, + can I add the first one? + <braunr> or adjust the values of those variables depending on the building + mode + <braunr> maybe + <braunr> tschwinge is likely to better answer those questions + <hacklu> braunr: ok, I will wait for tschwinge's advice. + <luisgpm> hacklu, The gdb/config/ dir is for files related to the native + gdb builds, as opposed to a cross gdb that does not have any native bits + in it. In the latter, gdbserver will be used to touch the native layer, + and GDB will only guide gdbserver through the debugging session... + <luisgpm> hacklu, In case you haven't figured that out already. + <hacklu> luisgpm: I am not very clear with you. According to your words, I + shouldn't use gdb/config for gdbserver? + <luisgpm> hacklu, Correct. You should use configure.srv for gdbserver. + <luisgpm> hacklu, gdb/gdbserver/configure.srv that is. + <luisgpm> hacklu, gdb/configure.tgt for non-native gdb files... + <luisgpm> hacklu, and gdb/config for native gdb files. + <luisgpm> hacklu, The native/non-native separation for gdb is due to the + possibility of having a cross gdb. + <congzhang> what's srv file purpose? + <luisgpm> hacklu, gdbserver, on the other hand, is always native. + <luisgpm> Doing the target-to-object-files mapping. + <hacklu> how can I use configure.srv to config the MIG to generate + stub-files? + <luisgpm> What are stub-files in this context? + <hacklu> On Hurd, some rpc stub file are auto-gen by MIG with *.defs file + <braunr> luisgpm: c source code handling low level ipc stuff + <braunr> mig is the mach interface generator + <tschwinge> luisgpm, hacklu: If that is still helpful by now, in + <http://news.gmane.org/find-root.php?message_id=%3C87ppwqlgot.fsf%40kepler.schwinge.homeip.net%3E> + I described the MIG usage in GDB. (Which also states that ptrace is a + system call which it is not.) + <tschwinge> hacklu: For the moment, it is fine to indeed copy the rules + related to MIG/RPC stubs from gdb/config/i386/i386gnu.mh to a (possibly + new) file in gdbserver. Then, later, we should work out how to properly + share these, as with all the other code that is currently duplicated for + GDB proper and gdbserver. + <luisgpm> hacklu, tschwinge: If there is code gdbserver and native gdb can + use, feel free to put them inside gdb/common for now. + <tschwinge> hacklu, luisgpm: Right, that was the conclusion from + <http://news.gmane.org/find-root.php?message_id=%3CCAB8fV%3Djzv_rPHP3-HQVBA-pCNZNat6PNbh%2BOJEU7tZgQdKX3%2Bw%40mail.gmail.com%3E>. + <hacklu> tschwinge, luisgpm : ok, I got it. + <hacklu> tschwinge: sorry for haven't submit pathes yet, I will try to + submit my patch tomorrow. + +[[!message-id "CAB8fV=iw783uGF8sWyqJNcWR0j_jaY5XO+FR3TyPatMGJ8Fdjw@mail.gmail.com"]]. + + +# IRC, freenode, #hurd, 2013-09-06 + + <hacklu> If I want compile a file which is not in the current directory, + how should I change the Makefile. I have tried that obj:../foo.c, but the + foo.o will be in ../, not in the current directory. + <hacklu> As say, When I build gdbserver, I want to use [gdb]/gdb/gnu-nat.c, + How can I get the gnu-nat.o under gdbserver's directory? + <hacklu> tschwinge: ^^ + <tschwinge> Hi! + <tschwinge> hacklu: Heh, unexpected problem. + <tschwinge> hacklu: How is this handled for the files that are already in + gdb/common/? I think these would have the very same problem? + <hacklu> tschwinge: ah. + <hacklu> I got it + <tschwinge> I see, for example: + <tschwinge> ./gdb/Makefile.in:linux-btrace.o: + ${srcdir}/common/linux-btrace.c + <tschwinge> ./gdb/gdbserver/Makefile.in:linux-btrace.o: + ../common/linux-btrace.c $(linux_btrace_h) $(server_h) + <hacklu> If I have asked before, I won't use soft link to solve this. + <tschwinge> But isn't that what you've been trying? + <hacklu> when this, where the .o file go to? + <tschwinge> Yes, symlinks can't be used, because they're not available on + every (file) system GDB can be built on. + <tschwinge> I would assume the .o files to go into the current working + directory. + <tschwinge> Wonder why this didn't work for you. + <hacklu> in gdbserver/configure.srv, there is a srv_tgtobj="gnu_nat.c ..", + if I change the Makefile.in, it doesn't gdb's way. + <hacklu> So I can't use the variable srv_tgtobj? + <tschwinge> That should be srv_tgtobj="gnu_nat.o [...]"? (Not .c.) + <hacklu> I have try this, srv_tgtobj="../gnu_nat.c", then the gnu_nat.o is + generate in the parent directory. + <hacklu> s/.c/.o + <hacklu> (wrong input) + <hacklu> For my understand now, I should set the srv_tgtobj="", and then + set the gnu_nat.o:../gnu_nat.c in the gdbserver/Makefile.in. right? + <tschwinge> Hmm, I thought you'd need both. + <tschwinge> Have you tried that? + <hacklu> no, haven't yet. I will try soon. + <hacklu> I have met an strange thing. I have this in Makefile, + i386gnu-nat.o:../i386gnu-nat.c $(CC) -c $(CPPFLAGS) $(INTERNAL_CFLAGS) $< + <hacklu> When make, it will complain that: no rules for target + i386gnu-nat.c + <hacklu> but I also have a line gnu-nat.o:../gnu-nat.c ../gnu-nat.h. this + works well. + <tschwinge> hacklu: Does it work if you use $(srcdir)/../i386gnu-nat.c + instead of ../i386gnu-nat.c? + <tschwinge> Or similar. + <hacklu> I have try this, i386gnu-nat.c: echo "" ; then it works. + <hacklu> (try $(srcdir) ing..) + <hacklu> make: *** No rule to make target `.../i386gnu-nat.c', needed by + `i386gnu-nat.o'. Stop. + <hacklu> seems no use. + <hacklu> tschwinge: I have found another thing, if I rename the + i386gnu-nat.o to other one, like i386gnu-nat2.o. It works! + + +# IRC, freenode, #hurd, 2013-09-07 + + <hacklu> hi, I have found many '^L' in gnu-nat.c, should I fix it or keep + origin? + <LarstiQ> hacklu: fix in what sense? + <hacklu> remove the line contains ^L + <LarstiQ> hacklu: see bottom of + http://www.gnu.org/prep/standards/standards.html#Formatting + <LarstiQ> hacklu: "Please use formfeed characters (control-L) to divide the + program into pages at logical places (but not within a function)." + <LarstiQ> hacklu: so unless a reason has come up to deviate from the gnu + coding standards, those ^L's are there by design + <hacklu> LarstiQ: Thank you! I always think that are some format error. I + am stupid. + <LarstiQ> hacklu: not stupid, you just weren't aware + * LarstiQ thought the same when he first encountered them + + +# IRC, freenode, #hurd, 2013-09-09 + + <youpi> hacklu_, hacklu__: I don't know what tschwinge thinks, but I guess + you should work with upstream on integration of your existing work, this + is part of the gsoc goal: submitting one's stuff to projects + <tschwinge> youpi: Which is what we're doing (see the patches recently + posted). :-) + <youpi> ok + <hacklu__> youpi: I always doing what you have suggest. :) + <hacklu> I have asked in my new mail, I want to ask at here again. Should + I change the gdb use lwp filed instead of tid field? There are + <hacklu> too many functions use tid. Like + <hacklu> named tid in the structure proc also. + <hacklu> make_proc(),inf_tid_to_thread(),ptid_build(), and there is a field + <hacklu> (sorry for the bad \n ) + <hacklu> and this is my weekly + report. http://hacklu.com/blog/gsoc-weekly-report12-186/ + <hacklu> And in Pedro Alves's reply, he want me to integration only one + back-end for gdb and gdbserver. but the struct target_obs are just + decalre different in both of the two. How can I integrate this? or I got + the mistaken understanding? + <hacklu> tschwinge: ^^ + <tschwinge> hacklu: I will take this to email, so that Pedro et al. can + comment, too. + <tschwinge> hacklu: I'm not sure about your struct target_ops question. + Can you replay to Pedro's email to ask about this? + <hacklu> tschwinge: ok. + <tschwinge> hacklu: I have sent an email about the LWP/TID question. + <hacklu> tschwinge: Thanks for your email, now I know how to fix the + LWP/TID for this moment. + <tschwinge> hacklu: Let's hope that Pedro also is fine with this. :-) + <hacklu> tschwinge: BTW, I have a question, if we just use a locally + auto-generated number to distignuish threads in a process, How can we do + that? + <hacklu> How can we know which thread throwed the exception? + <hacklu> I haven't thought about this before. + <tschwinge> hacklu: make_proc sets up a mapping from Mach threads to GDB's + TIDs. And then, for example inf_tid_to_thread is used to look that up. + <hacklu> tschwinge: oh, yeah. that is. + + +# IRC, freenode, #hurd, 2013-09-16 + + <tschwinge> hacklu: Even when waiting for Pedro (and me) to comment, I + guess you're not out of work, but can continue in parallel with other + things, or improve the patch? + <hacklu> tschwinge: honestly to say, these days I am out of work T_T after + I have update the patch. + <hacklu> I am not sure how to improve the patch beyond your comment in the + email. I have just run some testcase and nothing others. + <tschwinge> hacklu: I have not yet seen any report on the GDB testsuite + results using your gdbserver port (see + gdb/testsuite/boards/native-gdbserver.exp). :-D + <hacklu> question is, the resule of that testcase is just how many pass how + many not pass. + <hacklu> and I am not sure whether need to give this information. + <tschwinge> Just as a native run of GDB's testsuite, this will create *.sum + and *.log files, and these you can diff to those of a native run of GDB's + testsuite. + <hacklu> https://paste.debian.net/41066/ this is my result + === gdb Summary === + + # of expected passes 15573 + # of unexpected failures 609 + # of unexpected successes 1 + # of expected failures 31 + # of known failures 57 + # of unresolved testcases 6 + # of untested testcases 47 + # of unsupported tests 189 + /home/hacklu/code/gdb/gdb/testsuite/../../gdb/gdb version 7.6.50.20130619-cvs -nw -nx -data-directory /home/hacklu/code/gdb/gdb/testsuite/../data-directory + + make[3]: *** [check-single] Error 1 + make[3]: Leaving directory `/home/hacklu/code/gdb/gdb/testsuite' + make[2]: *** [check] Error 2 + make[2]: Leaving directory `/home/hacklu/code/gdb/gdb' + make[1]: *** [check-gdb] Error 2 + make[1]: Leaving directory `/home/hacklu/code/gdb' + make: *** [do-check] Error 2 + <hacklu> I got a make error so I don't get the *.sum and *.log file. + <tschwinge> Well, that should be fixed then? + <tschwinge> hacklu: When does university start again for you? + <hacklu> My university have start a week ago. + <hacklu> but I will fix this, + <tschwinge> Oh, OK. So you won't have too much time anymore for GDB/Hurd + work? + <hacklu> it is my duty to finish my work. + <hacklu> time is not the main problem to me, I will shedule it for myself. + <tschwinge> hacklu: Thanks! Of course, we'd be very happy if you stay with + us, and continue working on this project (or another one)! :-D + <hacklu> I also thanks all of you who helped me and mentor me to improve + myself. + <hacklu> then, what the next I can do is that fix the testcase failed? + <tschwinge> hacklu: It's been our pleasure! + <tschwinge> hacklu: A comparison of the GDB testsuite results for a native + and gdbserver run would be good to get an understanding of the current + status. + <hacklu> ok, I will give this comparison soon. BTW,should I compare the + native gdb result with the one before my patch + <tschwinge> You mean compare the native run before and after your patch? + Yes, that also wouldn't hurt to do, to show that your patch doesn't + introduce any regressions to the native GDB port. + <hacklu> ok, beside this I should compare the native gdb with gdbserver ? + <tschwinge> Yes. + <hacklu> beside this, what I can do more? + <tschwinge> No doubt, there will be differences between the native and + gdbserver test runs -- the goal is to reduce these. (This will probably + translate to: implement more stuff for the Hurd port of gdbserver.) + <hacklu> ok, I know it. Start it now + <tschwinge> As time permits. :-) + <hacklu> It's ok. :) + + +# IRC, freenode, #hurd, 2013-09-23 + + <hacklu_> I have to go out in a few miniutes, will be back at 8pm. I am + sorry to miss the meeting this week, I will finishi my report soon. + <hacklu_> tschwinge, youpi ^^ diff --git a/community/gsoc/2013/nlightnfotis.mdwn b/community/gsoc/2013/nlightnfotis.mdwn index 43f9b14c..a9176f51 100644 --- a/community/gsoc/2013/nlightnfotis.mdwn +++ b/community/gsoc/2013/nlightnfotis.mdwn @@ -448,3 +448,2590 @@ License|/fdl]]."]]"""]] <tschwinge> nlightnfotis: OK, so probably waiting at the FSF office to be processed. Let's allow for some more time. After all, this is not critical for your progress. + + +# IRC, freenode, #hurd, 2013-07-10 + + <nlightnfotis> tschwinge: I have run the diff of the GCC repo on the Hurd + against the one on my host linux os, and there was nothing relevant to + fixcontext and initcontext that are the ones that fail the + compilation. In any case I did recheck out the branch, and I have + attempted a build with it. It fails at the same point. Now I am + attempting a build with the -w (inhibit warnings) flag enabled + <tschwinge> nlightnfotis: Have there been any differences in the diff? + There should be none at all. + <nlightnfotis> tschwinge: there were some small changes due to the repo's + being checked out at different times. It was a large diff however. I + inspected it and didn't find anythign that was of much use. Here it is in + case you might want to see it: + https://www.dropbox.com/s/ilgc3skmhst7lpv/diffs_in_git.txt + <tschwinge> nlightnfotis: Well, the idea of this exercise precisely was to + use the same Git revisions on both sides of the diff -- to show that + there are no spurious differences -- which can't be shown from your + 124486 lines diff. (Even though indeed there is no difference in + libgo/configure that would explain the mis-match, but who knows what else + might be relevant for that. + <tschwinge> Would you please repeat that? + <nlightnfotis> tschwinge: I will do so. It was wrong from me to not diff + against the same revisions, but going through the diff results grepping + for the problematic code didn't yield any results, so I thought that + might not be the issue. + <nlightnfotis> I will perform the diff again tomorrow morning and report on + the results. + <tschwinge> nlightnfotis: Anyway, if you checked out again, the latest + revision, and it still fails in exactly the same way, there is something + wrong. + <tschwinge> nlightnfotis: And -w won't help, as there is a hard error + involved. + <tschwinge> nlightnfotis: Are yous till working on GSoC things today? + <nlightnfotis> tschwinge: yeah I am here. I decided to do the diff today + instead of tomorrow. + <nlightnfotis> It finished now btw + <nlightnfotis> let me tell you + <nlightnfotis> ah and this time, the gits were checked out at the same time + <nlightnfotis> from the same source + <nlightnfotis> and are at the same branch + <tschwinge> nlightnfotis: Coulod you upload the + gccbuild/i686-unknown-gnu0.3/libgo/config.log of the build that failed? + <nlightnfotis> tschwinge: sure. give me a minute + <nlightnfotis> tschwinge: there is something strange going on. The two + repos are at the exact same state (or at least should be, and the logs + indicate them to be) but still the diff output is 4.4 mb + <nlightnfotis> but no presence of initcontext of fixcontext + <nlightnfotis> tschwinge: the config.log file --> + http://pastebin.com/bSCW1JfF + <nlightnfotis> wow! I can see several errors in the config.log file + <nlightnfotis> but I am not so sure about their fatality. Config returns 0 + at the end of the log + <tschwinge> nlightnfotis: As the configure scripts probe for all kings of + features on all kings of strange systems, it's to be expected that some + of these fail on GNU/Hurd. + <tschwinge> What is not expected, however, is: + <tschwinge> configure:15046: checking whether setcontext clobbers TLS + variables + <tschwinge> [...] + <tschwinge> configure:15172: ./conftest + <tschwinge> /root/gcc_new/gcc/libgo/configure: line 1740: 1015 Aborted + ./conftest$ac_exeext + <tschwinge> Hmm. apt-cache policy libc0.3 + <tschwinge> nlightnfotis: ^ + <nlightnfotis> tschwinge: Installed 2.13-39+hurd.3 + <nlightnfotis> Candidate: 2.1-6 + <nlightnfotis> *2.17 + <tschwinge> Bummer. + <tschwinge> nlightnfotis: As indicated in + <http://news.gmane.org/find-root.php?message_id=%3C87li6cvjnl.fsf%40kepler.schwinge.homeip.net%3E> + and thereabouts, you need 2.17-3+hurd.4 or later... + <tschwinge> Well. + <tschwinge> At least that now explains what is going on. + <nlightnfotis> tschwinge: i see. I am in the process of updating my hurd + vm. I saw that libc has also been updated to 2.17 + <nlightnfotis> I will confirm when updating is done + <tschwinge> nlightnfotis: Anyway, is the diff between the two repositories + empty now or are there still differences? + <nlightnfotis> there are differences + <nlightnfotis> and they were checked out at the same time + <nlightnfotis> from the same source + <nlightnfotis> (the official git mirror) + <nlightnfotis> and they are both at the same branch + <nlightnfotis> and still diff output is 4.4 MB + <nlightnfotis> but quick grepping into it and there is not mention of + initcontext or fixcontext + <tschwinge> That's... unexpected. + <nlightnfotis> may be a mistake I am making + <nlightnfotis> but considering that diff run for some time before + completing + <tschwinge> In both Git repositories, »git rev-parse HEAD« shows the same + thing? + <tschwinge> Could you please upload the diff again? + <nlightnfotis> tschwinge: confirmed. libc is now version 2.17-1 + <nlightnfotis> tschwinge: http://pastebin.com/bSCW1JfF + <nlightnfotis> for the rev-parse give me a second + <tschwinge> nlightnfotis: Where is libc0.3 2.17-1 coming from? You need + 2.17-3+hurd.4 or later. + <nlightnfotis> it is 2.17-7+hurd.1 + <tschwinge> OK, good. + <tschwinge> The URL you just have is the config.log file, not the diff. + <tschwinge> s%have%gave + <nlightnfotis> oh my mistake + <nlightnfotis> wait a minute + <nlightnfotis> the two repos have different output to rev-parse + <tschwinge> Phew. + <tschwinge> That explains. + <tschwinge> So the Git branches are at different revisions. + <nlightnfotis> that confused me... when I run git pull -a the branches that + were changed were all updated to the same revision + <nlightnfotis> unless... there were some automatic merges in the *host* GCC + repo required during some pulls + <nlightnfotis> but that was some time ago + <nlightnfotis> would it have messed my local history that much? + <nlightnfotis> that's the only thing that may be different between the two + repos + <nlightnfotis> they checkout from the same source + <tschwinge> nlightnfotis: At which revisions are the two + repositories/branches? + <tschwinge> I have never used »put pull -a«. What does that do? + <nlightnfotis> tschwinge: from what I know it does an automatic git fetch + followed by git merge. The -a flag must signal to pull all branches (I + think it's possible to pull only one branch) + <tschwinge> That's the --all option. -a is something different (that I + don't understand off-hand). + <tschwinge> Well, --all means to pull all remotes. + <tschwinge> But you just want the GCC upstream, I guess. + <tschwinge> I always use git fetch and git merge manually. + <nlightnfotis> oh my god! You are write. -a is equivallent to --append + <nlightnfotis> + https://www.kernel.org/pub/software/scm/git/docs/git-pull.html + <nlightnfotis> git pull must be safe though + <nlightnfotis> + http://stackoverflow.com/questions/292357/whats-the-difference-between-git-pull-and-git-fetch + <nlightnfotis> without the -a + <nlightnfotis> *right + <nlightnfotis> why did I even write "right" as "write" above I don't + even... + <nlightnfotis> what did I write in the sentence above + <nlightnfotis> oh my god... + <nlightnfotis> tschwinge: they are indeed on different revisions: The host + repo's last commit was made by me apparently, to merge master into + tschwinge/t/hurd/go, whereas the last commit of the Hurd repo was by you + and it reverted commit 2eb51ea + <nlightnfotis> and that should also explain the large diff file + <nlightnfotis> with master merged into the tschwinge/t/hurd/go branch + <nlightnfotis> I will purge the debian repo and redownload it + <nlightnfotis> *reclone it + <nlightnfotis> that should bring it to a safe state I suppose. + + +# IRC, freenode, #hurd, 2013-07-11 + + <teythoon> nlightnfotis: how's your build going? + <nlightnfotis> I tried one earlier and it seemed to build without any + issues, something that was...strange. I am repeating the build now, but I + am saving the compilation output this time to study it. + <teythoon> it was strange that the build succeeded? that sounds sad :/ + <nlightnfotis> teythoon: considering that 3 weeks now I failed to build it + without errors, it sure seems weird that it builds without errors now :) + <braunr> what did you change ? + <nlightnfotis> braunr: not many things apparently. To be honest the change + that seemed to do the trick was (under thomas' guidance) update of libc + from 2.13 to 2.17 + <braunr> well that can explain + <nlightnfotis> tschwinge: Big update! GCC-go not compiles without errors + under the Hurd. I have done 2 compilations so far, none of which had + issues. Time needed for full build (without bootstrap) is 45 minutes +- 1 + minute. I also run the test suite, and I can confirm your results + <pinotree> s/not/now/, perhaps? + <nlightnfotis> pinotree yeah. I don't know how it came up with not there. I + meant now + <nlightnfotis> tschwinge: link for the go.sum is here --> + https://www.dropbox.com/s/7qze9znhv96t1wj/go.sum + + +# IRC, freenode, #hurd, 2013-07-12 + + <tschwinge> nlightnfotis: Great! So you finally reproduced my results. + :-) + <nlightnfotis> tschwinge: Yep! I am now building a blog, so that I can move + my reports there, so that they are more detailed, to allow for greater + transparency of my actions + <tschwinge> nlightnfotis: Did you recently (in email, I think?) indicate + that there is another Go testsuite, for libgo? + <tschwinge> nlightnfotis: As you prefer. + <nlightnfotis> tschwinge: there seemed to be one, at least in linux. I + think I saw one in the Hurd too. + <tschwinge> Oh indeed there is a libgo testsuite, too. + <nlightnfotis> as a matter of fact, make check-go + <nlightnfotis> did check for the lib + <nlightnfotis> but lib was failing + <nlightnfotis> yeah + <tschwinge> So please have a look at that testsuite's results, too, and + compare to the GNU/Linux ones. + <nlightnfotis> sure. I can do that now. + <tschwinge> And for the go.sum you posted, please have a look at the tests + that do not pass (»grep -v ^PASS: < go.sum«), assuming they do pass on + GNU/Linux. + <tschwinge> I suggest you add a list of the differences between GNU/Linux + and GNU/Hurd testresults to the wiki page, + <http://darnassus.sceen.net/~hurd-web/open_issues/gccgo/>, at the end of + the Part I section. + <nlightnfotis> I'm on it. + <tschwinge> For now, please ignore any failing tests that have »select« in + their name -- that is, do file them, but do not spend a lot of time + figuring out what might be wrong there. + <tschwinge> The Hurd's select implementation is a bit of a beast, and I + don't want you -- at this time -- spend a lot of time on that. We + already know there are some deficiencies, so we should postpone that to + later. + <nlightnfotis> tschwinge: noted. + <tschwinge> So what I would like at the moment, is a list of the testresult + differences to GNU/Linux, then from the go.log file any useful + information about the failing test (which perhaps already explains) + what's going wrong, and then a analysis of the failure. + <tschwinge> nlightnfotis: I assume you must be really happy that you + finally got it build fine, and reproduced my results. :-) + <nlightnfotis> tschwinge: yeah! I can not hide from you the fact that + failing all those builds made me really nervous about me missing my + schedule. Having finally built that and revisiting my application I can + see I am on schedule, but I have to intensify my work to compensate for + any potential unforeseen obstacles + <nlightnfotis> , in the futute + <nlightnfotis> *future + + +# IRC, freenode, #hurd, 2013-07-15 + + <youpi> nlightnfotis: btw, do you have a weekly progress report? + <nlightnfotis> youpi: not yet. Will write it shortly and post it here. I + made a new blog to keep track of my progress. + <nlightnfotis> Will report much more frequently now via my blog + <youpi> did you add your blog url to the hurd iwki? + <nlightnfotis> currently I am running gcc tests on both gcc go and libgo to + see what the differences are with Linux + <nlightnfotis> I believe I have done so, let me see + <nlightnfotis> youpi: gccgo passes most of its tests (it fails a small + number, and I am looking into those tests) but libgo fails 130/131 tests + (on the Hurd that is) + <youpi> ok + + <nlightnfotis> guys I wrote my report. This time I made it available on my + personal blog. You can find it here: + www.fotiskoutoulakis.com/blog/2013/07/15/gsoc-week-4-report/ As always, + open to (and encouraging) criticism, suggestions, anything that might + help me. + <nlightnfotis> I also have to mention that now that my personal website is + online, I will report much more frequently, to the scale of reporting day + by day, or every 2-3 days. + <youpi> nlightnfotis: without spending time on select, it'd be good to have + an idea of what is going wrong + <braunr> eh, go having trouble with select + <youpi> select is a beast, but we do have fixed things lately and we don't + currently know any issue still pending + <nlightnfotis> youpi: are you suggesting to not skip the select tests too? + <braunr> select is kind of critical .. + <braunr> as youpi said, if you can determine what's wrong, at the interface + level (not the implementation), it would be a good thing to do + <youpi> so we know what's wrong + <youpi> we're not asking to fix it, though + <nlightnfotis> braunr: youpi: noted. Thanks for the feedback. Is there + something else you might want me to improve? Something with the report + itself? Something you were expecting to see but I failed to provide? + <braunr> no it's ok + <braunr> it's short, readable, and readily answers the questions i might + have had so it's good + <braunr> as you say, now you have to work on the core of your task :) + <youpi> note: the "select" word in the testsuite is not strictly bound to + the C "select" + <youpi> so it is probably really worth digging a bit at least on the go + side + <braunr> but it's really worth doing in the end, as it will probably reveal + some nasty bugs on the way + <nlightnfotis> I appreciate your input. I will start working on it asap + (today) and will report on Wednesday perhaps (or Thursday at worst). + + +# IRC, freenode, #hurd, 2013-07-18 + + <nlightnfotis> braunr: I found out what was causing the fails in the tests + <nlightnfotis> in both libgo and gccgo + <nlightnfotis> it's a assertion: mach_port_t ktid = __mach_thread_self (); + int ok = thread->kernel_thread == ktid; __mach_port_deallocate + ((__mach_task_self_ + 0), ktid); ok; }) + <braunr> is all that the assertion ? + <nlightnfotis> yes + <braunr> please paste the code somewhere + <braunr> or is it in libpthread ? + <nlightnfotis> http://pastebin.com/G2w9d474 + nonblock.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed. + 9 FAIL: go.test/test/chan/nonblock.go execution, -O2 -g + <braunr> yes + <braunr> that's related to my current work on thread destruction + +[[open_issues/libpthread/t/fix_have_kernel_resources]]. + + <braunr> thread resources recycling is buggy + <braunr> i suggest you make your own thread pool if you can + <nlightnfotis> I will look into it further and let you know. Thanks for + that. + + +# IRC, freenode, #hurd, 2013-07-22 + + <nlightnfotis> tschwinge, I have found what is failing both libgo and gccgo + tests, but for the life of me, I can not really find the offending code + on any repository. + <nlightnfotis> not even the eglibc-source debian package. it's driving me + insane. + <tschwinge> nlightnfotis: If this is driving you insane, we should quickly + have a look at that! + <nlightnfotis> thanks tschwinge: I have found that the offending code is an + assertion: { mach_port_t ktid = __mach_thread_self (); int ok = + thread->kernel_th read == ktid; __mach_port_deallocate ((__mach_task_s + elf_ + 0), ktid); ok; } on a file called pt-create.c under the + libpthread on line 167 + <nlightnfotis> but for the life of me, I can not find that piece of code + anywhere. And when I mean anywhere, I mean anywhere. I have looked for it + on all of the branches of glibc, libpthread and the source code of + eglibc. + <nlightnfotis> that's why if you don't mind I would like to write my report + in a day or two, when (hopefully) I will have more progress to report on. + <youpi> nlightnfotis: isn't that libpthread/sysdeps/mach/pt-thread-start.c + ? + <youpi> or rather, ./sysdeps/mach/hurd/pt-sysdep.h + <nlightnfotis> youpi: let me check this out. If that's it I'm gonna cry. + <youpi> which unfortunately is inlined in a lot of places + <youpi> nlightnfotis: does the assertion not tell you the file & line? + <nlightnfotis> youpi: holy smokes! That's the code I was looking for! Oh + boy. Yeah the logs do tell me, but it was very misleading. So misleading, + taht I was actually looking at the wrong place. All logs suggest that + this piece of code is at libpthread/pthread/pt-create.c in line 167 + <youpi> what is that line in your tree? + <youpi> a call to _pthread_self(), isn't it? + <youpi> then it's not actually misleading, this is indeed where the + pt-sysdep.h definition gets inlined + <nlightnfotis> it seems so, yeah. it's err = __pthread_sigstate + (_pthread_self (), 0, 0, &sigset, 0); + <youpi> nlightnfotis: and what is the backtrace? + <nlightnfotis> youpi: _pthread_create_internal: Assertion failed. + <nlightnfotis> The assertion is the one above + <youpi> nlightnfotis: sure, but what is the backtrace? + <nlightnfotis> I don't have the full backtrace. These are the logs from the + compiler. All I can get is: reports like this: nonblock.x: + ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ + mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread + == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); + ok; })' failed. + <youpi> nlightnfotis: you should probably have a look at running the tests + by hand + <youpi> so you can run them in a debugger, and get backtraces etc. + <braunr> nlightnfotis: did i answer that ? + <nlightnfotis> braunr: which one? + <braunr> the problems you're seeing are the pthread resources leaks i've + been trying to fix lately + <braunr> they're not only leaks + <braunr> creation and destruction are buggy + <nlightnfotis> I have read so in + http://www.gnu.org/software/hurd/libpthread.html. I believe it's under + Thread's Death right? + <braunr> nlightnfotis: yes but it's buggy + <braunr> and the description doesn't describe the bugs + <nlightnfotis> so we will either have to find a temporary workaround, or + better yet work on a fix, right? + <braunr> nlightnfotis: i also told you the work around + <braunr> nlightnfotis: create a thread pool + <nlightnfotis> braunr: since thread creation is also buggy, wouldn't the + thread pool be buggy too? + <braunr> nlightnfotis: creation *and* destruction is buggy + <braunr> nlightnfotis: i.e. recycling is buggy + <braunr> nlightnfotis: the hurd servers aren't affected much because the + worker threads are actually never destroyed on debian (because of a + debian specific patch) + + <teythoon> youpi, nlightnfotis, hacklu_: btw, what about the copyright + assignment process + <tschwinge> nlightnfotis just got his on file, so there is progress. + <tschwinge> I have email from Donald R Robertson III + <copyright-clerk@fsf.org> about that -- but it is not yet present in the + FSF copyright.list file... + <tschwinge> I think I received that email because I was CCed on + nlightnfotis' submission. + <nlightnfotis> tschwinge: I have got the papers, and they were signed by + the FSF. They stated delivery date 11 of July, but the documents were + signed on the 10th of July :P + <tschwinge> Ah, no, I received it via hurd-maintainers@gnu.org -- and the + strange thing is that not all assignments that got processed got sent + there... + <tschwinge> At the recent GNU Tools Cauldron we also discussed this in the + GCC context; and their experience was the very same. Emails get lost, + and/or take ages to be processed, etc. + <tschwinge> It seems the FSF is undermanned. + + +# IRC, freenode, #hurd, 2013-07-27 + + <nlightnfotis> I have one question about the Mach sources: I can see it + uses its own scheduler (more like, initializes) and also does the same + for the linux scheduler. Which one does it use? + <youpi> it doesn't use the linux scheduler + <youpi> the linux glue just glues linux scheduling concepts onto the mach + scheduler + <nlightnfotis> ohh I see now. Thanks for that youpi. + + +# IRC, freenode, #hurd, 2013-07-28 + + <nlightnfotis> In the mach kernel source code, does the (void) before a + function call have a semantic meaning, or is it just remnants of the past + (or even documentation) + <pinotree> for example? + <nlightnfotis> pinotree: (void) thread_create (kernel_task, + &startup_thread); + <nlightnfotis> I read on stack overflow that there is only one case where + it has a semantic meaning, most of the times it doesn't + <nlightnfotis> + http://stackoverflow.com/questions/13954517/use-of-void-before-a-function-call + <pinotree> most probably thread_create has a non-void return value, and + this way you're explicitly suppressing its return value (usually because + you don't want/need to care about it) + <nlightnfotis> isn't the value discarded if the (void) is not there? + <pinotree> yes, but depending on extra attributes and/or compiler warning + flags the compiler might warn that the return value is not used while it + ought to + <pinotree> the cast to void should suppress that + <nlightnfotis> oh, okay, thanks for that pinotree + <nlightnfotis> and yes you are right that thread_create actually does + return something + <pinotree> even if there would be no compiler message about that, adding + the explicit cast could mean "yes, i know the function does return + something, but i don't care about it" + <pinotree> ... as hint to other code readers + <nlightnfotis> as a form of documentation then + <pinotree> also + + <nlightnfotis> oh well, I am gonna ask and I hope someone will answer it: + In the Mach's dmesg (/var/log/dmesg) I can see that the version string + along with initial memory mapping information are printed twice, when in + fact they are supposed to be called only once. Is this a bug, or some + buffering error, or are they actually called twice for some reason? + + +# IRC, freenode, #hurd, 2013-07-29 + + <nlightnfotis> guys is the evaluation today? + <hacklu_> yes + <teythoon> right + <nlightnfotis> where can we find the evaluation papers on melange? + <hacklu_> wait untill 12pm UTC. + <nlightnfotis> yeah, I just noticed thanks hacklu_ + <hacklu_> nlightnfotis:) + + <NlightNFotis> tschwinge: I only have one question regarding my project. If + I make some changes to libpthread, what's the best way to test them in + the hurd? Rebuild glibc with the updated libpthread? + <tschwinge> NlightNFotis: Yes, you'll have to rebuild glibc. I have a + cheat sheet for that: + http://darnassus.sceen.net/~hurd-web/open_issues/glibc/debian/ + <tschwinge> It may be that the »Run debian/rules patch to apply patches« + step is no longer encessary with the 2.17 glibc packages. + <NlightNFotis> thanks for that tschwinge. :) + <tschwinge> NlightNFotis: Sure. :-) + + <tschwinge> NlightNFotis: Where's your weekly status? + <NlightNFotis> I will write it today at the noon. I have written all the + other ones, and they are available at www.fotiskoutoulakis.com + <NlightNFotis> the next one will be available there as well, later in the + day + <tschwinge> Ack. But please try to finish your report before the meeting, + as discussed. + <NlightNFotis> oh, forgive me for that. I thought it was ok to write my + report a day or so later. Sorry. + <tschwinge> NlightNFotis: Please write your report as soon as possible -- + otherwise there's no useful way for me to know what your status is. + <NlightNFotis> I will. This week I have been mostly going through the + various sources (the Hurd, Mach and libpthread, especially the last two) + in my attempt to get a better understanding for how libpthread + works. Since yesterday I have attempted some small changes on my + libpthread repo that I plan on testing and reporting on them. That's why + I still have not written my report. + <tschwinge> NlightNFotis: Things don't need to be finished before you + report about them. It's often more useful to discuss issues *before* you + spend time on implementing them. + #hurd + <braunr> NlightNFotis: what kind of changes do you want to add to + libpthread ? + <tschwinge> Have a look at the asseriton failure, I would hope. :-) + <braunr> well no + <braunr> again, i did that + <braunr> and it's not easy to fix + <NlightNFotis> braunr: I was looking into ways that I could create the + thread pool you suggested into libpthread + <braunr> no, don't + <braunr> create it in your application + <braunr> not in libpthread + <braunr> well, this may not be an acceptable solution either .. + <tschwinge> Before doing that we have to understand what exactly the Go + runtime is doing. It may just be a weird itneraction with the setcontext + et al. functions that I failed to think about when implementing these? + <NlightNFotis> the other possibility is the go runtime libraries. But I + thought that libpthread might be a better idea, since you told me that + creation *and* destruction are buggy + <hacklu> braunr: you are right, the signal thread is always exist. I have + got a wrong understand before. + <NlightNFotis> tschwinge: I can look into that, now. I will also include + that in my report. + <braunr> NlightNFotis: i don't see how this is a relevant argument .. + <braunr> tschwinge: i'd suggest he first try with a custom pool in the go + runtime, so we exclude what you're suspecting + <braunr> if this pool actually works around the issues NlightNFotis is + having, it will confirm the offending problem comes from libpthread + <tschwinge> So, as a very first step make any thread + distruction/deallocation a no-op. + <braunr> yes + <NlightNFotis> braunr: I originally understood that a thread pool might + skip the thread's destruction, so that we escape the buggy part with the + thread's destruction. Since that was a problem with libpthread, it sure + affects other threads (instead of go's ) too. So I assumed that building + the thread pool into libpthread might help eliminate bugs that may affect + other code too. + <braunr> no, it's not a proper fix + <braunr> it's a work around + <braunr> and i'm working on a proper fix in parallel + <braunr> (when i have the time, that is :/) + <NlightNFotis> oh, I see. So for the time, I had better not touch + libpthread, and take a look at the go run time aye? + <tschwinge> NlightNFotis: Remember: one thing after the other. First + identify what is wrong exactly. Then think and discuss how to solve the + very specific issue. Then implement it. + <braunr> as tschwinge said, make thread destruction a nop in go + <braunr> see if that helps + <tschwinge> NlightNFotis: For example, you surely have noticed (per your + last report), that basically all Go language test pass (aside from the + handful of those testing select, etc.) -- but all those of the libgo + runtime library fail, literally all of them. + <tschwinge> You noticed they basically all fail with the same assertion + failure. But why do all the Go language ones work fine? + <tschwinge> Don't they execute the program they built, for example? + <tschwinge> (I haven't looked.) + <NlightNFotis> they do execute the program. the language ones that fail + too, fail due to the assertion failure + <tschwinge> Or, what else is different for them? How are they built, which + flags, how are they invoked. + <braunr> how many goroutines ? + <braunr> :p + <tschwinge> Do you also get the assertion failure when you built a small Go + program yourself and run that one. + <tschwinge> Don't get the assertion failure? Then add some more complex + stuff that are likely to invole adding/re-using new threads, such as + goroutines. + <NlightNFotis> I didn't get the assertion failure on a small test program, + but now that you suggest it it might be a good idea to build a custom + test suite + <tschwinge> Etc. That way you'll eventually get an understanding what + triggers the assertion failure. + <tschwinge> And that exeactly is the kind of analysis I'd like to read in + your weekly report. + <tschwinge> A list of things what you have done, which assuptions you've + made, how that directed your further analysis, what results that gave, + etc. + <NlightNFotis> I will do it. I will try to rush to finish it today before + you leave, so that you can inspect it. God I feel like all that time I + spent this week studying the particular source code (libpthread, and the + Mach) were in vain... + <NlightNFotis> on second thoughts, it was not in vain. I got a pretty good + understanding of how these pieces of software work, but now I will have + to do something completely different. + <tschwinge> Studying code is never in vain. + <tschwinge> Exactly. + <tschwinge> You must have had some motivation to study the code, so that + was surely a valid thing to do. + <tschwinge> But we'd link to understand your reasoning, so that we can + support you and direct you accordingly. + <braunr> but it's better to focus on your goals and determine an + appropriate course of actions, usually starting with good analysis + <tschwinge> Yes. + <pinotree> s/link/like/? + <tschwinge> pinotree: Indeed, thanks. + <braunr> makes me remember when i implemented radix trees to replace splay + trees, only to realize splay trees were barely used .. + <tschwinge> braunr: Yes. It has happened to all of us. ;-P + <tschwinge> NlightNFotis: So, don't worry -- but learn from such things. + :-) + <NlightNFotis> anyway, I will start right away with the courses of action + you suggested, and will try to have finished them by noon. Thanks for + your help, it really means a lot. + <tschwinge> In software generally, it is never a good idea to let you be + distracted, and don't follow your focus goal, because there are always so + many different things that could be improved/learned/fixed/etc. + <NlightNFotis> tschwinge, I am only nervous about one thing: the fact that + I have not submitted yet any patch or some piece of code in general. Then + again, the summer of code for me so far has been 70-80% reading about + stuff I didn't know about and 30-20% doing the stuff I should know + about... + <tschwinge> NlightNFotis: That's why we're here, to teach you something. + Which we're happy to do, but we all need to cooperate for that (and I'm + well aware that this is difficult if one is not in the same rooms, and + I'm also aware that my time is pretty limited). + <tschwinge> NlightNFotis: We're also very aware that the Hurd system, as + any operating system project (if you're not just doing "superficial" + things) is difficult, and takes lots of time to learn, and have concepts + and things sink into your brain. + <braunr> i wouldn't worry too much + <tschwinge> We're also still learning every day. + <braunr> go doesn't require a lot from the underlying system, but what is + required is critical + <braunr> once you identify it, coding will be quick + <NlightNFotis> tschwinge: braunr: thanks. I shall begin working following + the directions you gave to me. + <tschwinge> NlightNFotis: So yes, because Google wants us to grade you + based on that, you'll eventually have to write some code, but for + example, a patch to disable thread distruction/deallocation in libgo + would definitely count as such code. And that seems like one of your + next steps. + <NlightNFotis> tschwinge: i need to deliver that instantly, right? seeing + as the evaluation is today. + <tschwinge> NlightNFotis: No. Deliver it when you have something to + deliver. :-) + <NlightNFotis> tschwinge: I am nervous about the evaluation today. I have + not submitted a single piece of code, only some reports. How negatively + does this influence my performance report? + <tschwinge> NlightNFotis: If I can say so, in the evaluation today, Google + basically asks us mentors whether we want to fail our students right now. + Which I don'T plan to do, knowing about the complexity of the Hurd + system, and the learning required before you can do useful code changes. + <NlightNFotis> tschwinge: that really means a lot to me, and it got a + weight of my chest. + <braunr> uh ok, i have to be the rude guy again + <braunr> NlightNFotis: the gsoc is also a way for the student to prepare + for working in software development communities + <braunr> whether free software/open source and/or in companies + <braunr> people involved care a lot less about pathos than actual results + <pinotree> (or to prepare students to be hired by google, but that's + another story) + <braunr> NlightNFotis: in other words, stop apologizing that much, stop + focusing so much on that, and just work as you can + + +# IRC, freenode, #hurd, 2013-07-31 + + <nlightnfotis> teythoon: both samuel and thomas would be missing for the + week right? + <teythoon> nlightnfotis: they do, why? + <teythoon> nlightnfotis: err, they do?? why? + + +# IRC, freenode, #hurd, 2013-08-01 + + <nlightnfotis> braunr: I checked out what you (and Thomas) suggested and + did some research on go on the Hurd. I have found out that go works, + until you need to use anything that has to do with a goroutine. I am now + playing with the go runtime and checking to see if turning thread + destruction to noop will have any difference. + + +# IRC, freenode, #hurd, 2013-08-05 + + <nlightnfotis> youpi: whenever you have time, I would like to report my + progress as well. + <youpi> nlightnfotis: sure, go ahead + <youpi> but again, you should report before the meeting + <youpi> so we can read it before coming to the discussion + <nlightnfotis> I have written my report + <youpi> ah + <hacklu> nlightnfotis: I have read your report, these days you have make a + great progress. + <youpi> where is it? + <nlightnfotis> it was available since yesterday + <nlightnfotis> + http://www.fotiskoutoulakis.com/blog/2013/08/05/gsoc-partial-week-7-report/ + <nlightnfotis> thanks hacklu. The particular piece of code I was studying + was very very interesting :) + <hacklu> nlightnfotis: I think you should show your link in here or email + next time. I have spend a bit more time to find that :) + <nlightnfotis> youpi: for a tldr, at the last time I was told to check + gccgo's runtime for clues regarding the go routine failures. + <nlightnfotis> hacklu: will keep that in mind, thanks. + <nlightnfotis> youpi: thing is, gccgo operates on two different thread + types: G's (the goroutines, lightweight threads that are managed by the + runtime) and M's (the "real" kernel threads") + <nlightnfotis> none of which are really "destroyed" + <youpi> ok, makes sense + <nlightnfotis> G's are put in a pool of available goroutines when their + status is changed to "Gdead" so that they can be reused + <nlightnfotis> M's also don't seem to go away. There is always at least one + M (the bootstrap one) and all other M's that get created are also stashed + in a pool of available working threads. + <youpi> you could put some debugging printfs in libpthread, to make sure + whether threads do die or not + <nlightnfotis> I am studying this further as we speak, but they both don't + seem to get "destroyed", so that we can be sure that bugs are triggered + by thread destruction + <nlightnfotis> I was beginning to believe that maybe I was looking in the + wrong direction + <nlightnfotis> but then I looked at my past findings, and I noticed + something else + <nlightnfotis> if you take a look at the first failed go routine, it failed + at the time.sleep function, which puts a goroutine to sleep for ns + nanoseconds. That made me think if it was something that had to do with + the context functions and not the goroutines' creation. + <youpi> nlightnfotis: that's possible + <youpi> nlightnfotis: I'd say you can focus on this very simple example: a + mere sleep + <youpi> that's one of the simplest things a thread scheduler has to do, but + it has to do it right + <youpi> fixing that should fix a lot of other issues + <nlightnfotis> if I have understood correctly, there is at least one G + (Goroutine) and at least one M (kernel thread) running. Sleep does put + that goroutine at a hold, and restarting it might be an issue + <braunr> talking about thread scheduling ? :) + <youpi> nlightnfotis: go's runtime doesn't actually destroy kernel threads, + apparently + <nlightnfotis> youpi: yeah, that's what I have understood so far. And it + neither does destroy goroutines. If there was an issue with thread + creation, then I guess it should be triggered in the beginning of the + program too (seeing as both M's and G's are created there) + <nlightnfotis> the fact that it is triggered when a goroutine goes to sleep + makes me suspect the context functions + <youpi> yes + <nlightnfotis> again I am studying it the last days, in search of + clues. Will keep you all updated. + <nlightnfotis> braunr: I have written my report and it is available here + http://www.fotiskoutoulakis.com/blog/2013/08/05/gsoc-partial-week-7-report/ + If you could read it and tell me if you notice something weird tell me + so. + <braunr> nlightnfotis: ok + <braunr> nlightnfotis: quite busy here so don't worry if i suddenly + disappear + <braunr> nlightnfotis: hum, does go implement its own threads ?? + <nlightnfotis> braunr: yeah. It has 2 threads. Runtime managed (the + goroutines) and "real" (kernel managed) ones. + <braunr> i mean, does it still use libpthread ? + <nlightnfotis> thing is none of them "disappear" so as to explain the bug + with "thread creation **and** destruction) + <nlightnfotis> it must use libpthread for kernel threads as far as creation + goes. + <braunr> ok, good + <braunr> then, it schedules its own threads inside one pthread, right ? + <braunr> using the pthread as a virtual cpu + <nlightnfotis> yes. It matches kernel threads and runtime threads and runs + the kernel threads in reality + <nlightnfotis> the scheduler decides which goroutine will run on each + kernel thread. + <braunr> ew + <braunr> this is pretty much non portable + <braunr> and you're right to suspect context switching functions + <nlightnfotis> yeah my thought for it was the following: thread creation, + if it was buggy, should be triggered as soon as a program starts, seeing + as at least one kernel thread and at least one go routine starts. My + sleep experiment crashes when the goroutine is put on hold + <braunr> did you find the code putting on hold ? + <nlightnfotis> I will give you the exact link, wait a moment + <nlightnfotis> braunr: + https://github.com/NlightNFotis/gcc/blob/master/libgo/runtime/time.goc?source=c#L59 + <nlightnfotis> that is the exact location is line 26, which calls the one I + pointed you at + <braunr> ahah, tsleep + <braunr> old ghost from the past + <braunr> nlightnfotis: the real location is probably runtime_park + <nlightnfotis> I will check this out. + + <nlightnfotis> may I ask something non-technical but relevant to summer of + code? + <braunr> sure + <nlightnfotis> would it be okay if I took the day off tomorrow? + <braunr> nlightnfotis: ask tschwinge but i guess it's ok + + <braunr> have you found runtime_park ? + <braunr> i'm downloading your repository from github but it's slow :/ + <nlightnfotis> braunr: not yet. Grepping through the files didn't produce + any meaningful results and github's search is not working + <nlightnfotis> braunr: there is that strange thing with th gccgo sources, + where I can find a function's declaration but not it's definition. Funny + thing is those functions are not really extern, so I am playing a hide + and seek game, in which I am not always successful. + <nlightnfotis> runtime_park is declared in runtime.h. I have looked nearly + everywhere for it. There is only one last place I have not looked at. + <nlightnfotis> braunr: I found runtime_park. It's here: + https://github.com/NlightNFotis/gcc/blob/master/libgo/runtime/proc.c?source=c#L1372 + + <tschwinge> nlightnfotis: Taking the day off is fine. Have fun! + <nlightnfotis> tschwinge: I am still here; Thanks for that tschwinge. I + will be for the next half hour or something if you would like to ask me + anything + <tschwinge> nlightnfotis: I have no immediate questions (first have to read + your report and discussion in here) -- so feel free to log out and enjoy + the sun outside. :-) + + <teythoon> nlightnfotis, tschwinge: btw, have you seen + http://morsmachine.dk/go-scheduler ? + <nlightnfotis> teythoon: thanks for the link. It's really interesting. + + +# IRC, freenode, #hurd, 2013-08-12 + + <nlightnfotis> teythoon did you manage to build the Hurd successfuly? + <teythoon> ah yes, the Hurd is relatively easy + <teythoon> the libc is hard + <nlightnfotis> debian glibc or hurd upstream libc? + <teythoon> but my build on darnassus was successful + <nlightnfotis> *debian eglibc + <teythoon> well, I rebuilt the debian package with two tweaks + <nlightnfotis> do you build on linux and rsync on hurd or ...? + <teythoon> I built it on Hurd, though I thought about setting up a cross + compiler + <nlightnfotis> I see. The process was build Mach, build Hurd, and then + build glibc and it's ready or it needed more? + <teythoon> no, I never built Mach + <teythoon> I must admit I'm not sure about the "proper" procedure + <teythoon> if I change one of Hurds RPC definitions, I think the proper way + is to rebuild the libc against the new definitions and then the Hurd + <teythoon> but I found no way to do that, so everyone seems to build the + Hurd, install it, build the libc and then rebuild the Hurd again + <nlightnfotis> I see. Thanks for that :) + + <nlightnfotis> tschwinge, I have also written my report! It's available + here + http://www.fotiskoutoulakis.com/blog/2013/08/12/gsoc-week-8-partial-report/ + <nlightnfotis> I can sum it up if you want me to. + <tschwinge> nlightnfotis: I already read it! :-D + <tschwinge> Oh, I didn't. I read the week 7 one. Let me read week 8. ;-) + <nlightnfotis> ok. I am currently going through the assembly generated for + the sample program I have embedded my report. + <nlightnfotis> the weird thing is that the assembly generated is pretty + much the same for the program with 1 and 2 goroutine functions (with the + obvious difference that the one with 2 goroutine functions has 1 more + goroutine in it's assembly code) + <nlightnfotis> I can not understand why it is that when I have 1 goroutine, + an exception is triggered, but when I am having two (which are 99% + identical) it seems to be executed. + <nlightnfotis> and I do not understand why the exception is triggered when + I manually use a goroutine. + <nlightnfotis> To my understanding so far, there is at least 1 (kernel) + thread created at program startup to run main. The same thread gets + created to run a new goroutine (goroutines get associated with kernel + threads) + <nlightnfotis> and it's obvious from the assembly generated. + <nlightnfotis> go_init_main (the main function for go programs) starts with + a .cfi_startproc + <nlightnfotis> the same piece of code (.cfi_startproc) starts a new kernel + thread (on which a goroutine runs) + <tschwinge> nlightnfotis: Re your two-goroutines example: in that case I + assume, you're directly returning from the main function and the program + terminates normally. ;-) + <tschwinge> nlightnfotis: Studying the assembly code for this will be too + verbose, too low-level. What we need is a trace of steps that happen + until the error. + <nlightnfotis> tschwinge, that must be it, but it should trigger the bug, + since it still has at least one goroutine (and one is known to trigger + the bug) + <tschwinge> nlightnfotis: I guess the program exits before the first + gorouting would be scheduled for execution. + <nlightnfotis> the assembly for the goroutines is identical. You can't tell + one from the other. The only change is that it has 2 of these sections + instead of one + <nlightnfotis> actually it's the same for the first one + <tschwinge> nlightnfotis: I very much assume that the issue is not due to + the code generated by the Go compiler (which you're seeing in the + assembly code), but rather due to the runtime code in the libgo library. + <nlightnfotis> I didn't think of it this way. + <tschwinge> ... that improperly interacts with our libpthread. + <nlightnfotis> so my research should focus on the runtime from now on? + <tschwinge> Improperly may well imply that our libpthread is at fault, of + course, as we discussed. + <tschwinge> Back to the one-gouroutine case (that shows the assertion + failure). Simple case: one goroutine, plus the "main" thread. + <tschwinge> We need to get an understanding of the steps that happen until + the error happens. + <tschwinge> As this is a parallel problem, and it is involving "advanced" + things (such as setcontext), I would not trust GDB too much when used on + this code. + <nlightnfotis> I will have to manually step through the source myself, + right? + <tschwinge> What I would do, is add printf's (or similar) into the code at + critical points, to get an udnerstanding of what's going on. + <tschwinge> Such critical points are: pthread_create, setcontext, + swapcontext. + <nlightnfotis> It sounds like a good idea. Anything else to note? + <tschwinge> That way, you can isolate the steps required to trigger the + assertion failure. + <tschwinge> For example, it could be something like: makecontext, + swapcontext, pthread_creat, boom. + <nlightnfotis> pthread_create_internal is failing at an assertion. I wonder + what would happen if I remove that assertion. + <tschwinge> Not without understanding what the error is, and why it is + happening (which steps lead to it). We don't usually do »voodoo + computing and programming by coincidence«. + <nlightnfotis> tschwinge, I also figured out something. If it is a + libpthread issue, it should also get triggered when a simple C program + creates a thread (assuming _pthread_create is causing the issue) + <nlightnfotis> so maybe I should write a C program to test that + functionality and see if it provides any further clues? + <tschwinge> nlightnfotis: That's precile what the goal of »isolate the + steps required to trigger the assertion failure« is about: reduce the big + libgo code to a few function calls required to reproduce the problem. + <tschwinge> nlightnfotis: I simple C program just doing pthread_create + evidently does not fail. + <tschwinge> nlightnfotis: I assume you have a Go program dynamically linked + to the libgo you build? + <nlightnfotis> yes. To the latest go build from the source (4.9) + <nlightnfotis> *gccgo build from source + <braunr> removing an assertion is usually extremely bad practice + <tschwinge> Then you can just do something like make target-libgo (IIRC) + (or instead: cd i686-pc-gnu/libgo/ && make) to rebuild your changed + libgo, and then re-run the Go program. + <braunr> the thought of randomly removing assertions shouldn't even reach + your mind ! + <nlightnfotis> braunr: even if it is not permanent, but an experiment? + <braunr> yes + <nlightnfotis> can you explain to me why? + <tschwinge> nlightnfotis: <tschwinge> Not without understanding what the + error is, and why it is happening (which steps lead to it). We don't + usually do »voodoo computing and programming by coincidence«. + <braunr> an assertion exists to make sure something that should *never* + happen never happens + <braunr> removing it allows such events to silently occur + <teythoon> braunr: that's the theory, yes, to check invariants + <braunr> i dont' know what you mean by using assertions for "an experiment" + <teythoon> unfortunately some people use assert for error handling :/ + <braunr> that's wrong + <braunr> and i dont't remember it to be the case in libpthread + <braunr> nlightnfotis: can you point the faulting assertion again there + please ? + <nlightnfotis> braunr: sure: Assertion `({ mach_port_t ktid = + __mach_thread_self (); int ok = thread->kernel_thread == ktid; + <nlightnfotis> __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; + })' failed. + <braunr> so basically, thread->kernel_thread != __mach_thread_self() + <braunr> this code is run only for num_threads == 1 + <braunr> but has there been any thread destruction before ? + <nlightnfotis> no. To my understanding kernel threads in the go runtime + never get destroyed (comments seem to support that) + <braunr> IOW: is it certain the only thread left *is* the main thread ? + <braunr> hm + <braunr> intuitively, i'd say this is wrong + <braunr> i'd say go doesn't destroy threads in most cases, but something in + the go runtime must have done it already + <braunr> i'm not even sure the main thread still exists + <braunr> check that + <braunr> where is the go code you're working on ? + <nlightnfotis> there are 3 files of interest + <braunr> i'd like the whole sources please + <nlightnfotis> I will find it in a moment + <tschwinge> braunr: GCC Git clone, tschwinge/t/hurd/go branch. + <nlightnfotis> it is <gcc_root>/libgo/runtime/runtime.h + <nlightnfotis> it is <gcc_root>/libgo/runtime/proc.c + <braunr> tschwinge: thanks + <tschwinge> braunr: git://gcc.gnu.org/git/gcc.git + <nlightnfotis> I will provide links on github + <braunr> nlightnfotis: i sayd the whole sources, why do you insist on + giving me separate files ? + <nlightnfotis> for checking it out quickly + <nlightnfotis> oh I misunderstood that sorry + <nlightnfotis> thought you wanted to check out thread creation and + destruction and that you were interested only in those specific files + <braunr> tschwinge: is it completely contained there or are there external + libraries ? + <tschwinge> braunr: You mean libgo? + <braunr> tschwinge: possibly + <nlightnfotis> tschwinge, I just made sure that yeah programs are + dynamically linked against the compiler's libgo + <nlightnfotis> libgo.so.3 + <braunr> does libgo come from gcc sources ? + <nlightnfotis> yeah + <braunr> ok + <nlightnfotis> go files on gcc sources are split under two directories: go, + which contains the frontend go, and libgo which contains the libraries + and the runtime code + <tschwinge> braunr: darnassus:~tschwinge/tmp/gcc/go.build/ is a recent + build, with sources in $PWD/../go/. + <tschwinge> braunr: libgo is in i686-unknown-gnu0.3/libgo/.libs/ + <nlightnfotis> so tschwinge to roundup for this week I should print debug + around the "hotspots" and see if I can extract more information about + where the specific problem is triggered right? + <tschwinge> nlightnfotis: Yes, for a start. + <braunr> nlightnfotis: identify the main thread, make sure it doesn't exit + <nlightnfotis> noted. + <nlightnfotis> braunr: do you have an idea about the issue I described + earlier? The one with the 1 goroutine triggering the bug, but the 2 + exiting successfully but with no output? + <braunr> nlightnfotis: i didn't read + <nlightnfotis> do you have 2 mins to read my report? I describe the issue + <braunr> something messed up in the context i suppose + <tschwinge> nlightnfotis: Uhm, I already explained that issue? + <braunr> you did ? + <nlightnfotis> tschwinge, I know, don't worry. I am trying to get all the + insight I can get. + <nlightnfotis> you mentioned that the scheduler might have an issue and + that the main thread returns before the goroutines execu + <nlightnfotis> *execute + <nlightnfotis> right? + <tschwinge> It is the normal thing for a process to terminate normally when + the main function returns. I would expect Go to behave the same way. + <braunr> "Now, if we change one of the say functions inside main to a + goroutine, this happens" + <braunr> how do you change it ? + <tschwinge> Or am I confused? + <braunr> tschwinge: i don't remember exactly + <nlightnfotis> braunr: from say("world") to go say("world") + <nlightnfotis> tschwinge, yeah I get that. What I still have not understood + is what is it specifically about the 2 goroutines that doesn't trigger + the issu when 1 goroutine does. + <nlightnfotis> You said that it might have something to do with the + scheduler; it does seem like a good explanation to me + <tschwinge> nlightnfotis: My understanding still is that the goroutinges + don't get executed before the main thread exits. + <braunr> which scheduler ? + <nlightnfotis> braunr: the runtime (go) scheduler. + <nlightnfotis> tschwinge, Yeah, they don't. But still, with 1 goroutine: + you get into main, attempt to execute it, and bam! With two, it should be + the same, but strangely it seems to exit main without an issue + <nlightnfotis> (attempt to execute the goroutine) + <braunr> why should it be the same ? + <nlightnfotis> braunr: seeing as one goroutine has problems, I can't see + why two wouldn't. At least one of the two should result in an exception. + <braunr> nlightnfotis: why ? + <braunr> nlightnfotis: they do have the problem + <braunr> they don't run + <braunr> they just don't run into that assertion, probably because there is + more than one thread + <nlightnfotis> wait a minute. You imply that they fail silently? But still + end up in the same situation + <braunr> yes + <braunr> in which case it does look like a go scheduler problem + <nlightnfotis> if I understood it correctly, that assertion fails when it + is only 1 thread? + <braunr> yes + <braunr> and since the main thread is always correct, i expect the main + thread has exited + <braunr> which this happens because the one thread left is *not* the main + thread + <braunr> (which is a libpthread bug) + <braunr> but it's a bug we've not seen because we don't have applications + creating threads while exiting + <nlightnfotis> I think I got it now. + <braunr> try to put something like getchar() in your go program + <braunr> something that introduces a break + <braunr> so that the main thread doesn't exit + <nlightnfotis> oh right. Thanks for that. And sorry tschwinge I reread what + you said, it seems I had misinterpreted what you suggested. + <tschwinge> braunr: If you're interested: for a Go program triggering the + asserition, I don't see any thread exiting (see + darnassus:~tschwinge/tmp/gcc/a.go, run: cd ~tschwinge/tmp/gcc/go.build/ + && ./a.out) -- but perhaps I've been looking for the wrong things in l_. + File l is without a goroutine. Have to leave now, sorry. + <tschwinge> braunr: If you want to rebuild: gcc/gccgo -B gcc -B + i686-unknown-gnu0.3/libgo ../a.go -Li686-unknown-gnu0.3/libgo/.libs + -Wl,-rpath,i686-unknown-gnu0.3/libgo/.libs + <braunr> tschwinge: no i won't touch anything + <braunr> but thanks + + +# IRC, freenode, #hurd, 2013-08-19 + + <youpi> nlightnfotis: how are you going with gcc go? + <nlightnfotis> I was print debugging all the week. + <nlightnfotis> I can tell you I haven't noticed anything weird so far. + <nlightnfotis> But I feel I am close to the solution + <nlightnfotis> I have not written my report yet. + <nlightnfotis> I will write it maximum until wednesday + <nlightnfotis> I hope I will have figured it all out until then + <pinotree> a report is not for writing solutions, but for the progress + <youpi> yes + <youpi> it's completely fine to be saying "I've been debugging, not found + anything yet" + <pinotree> results or not, always write your reports on time, so your + mentor(s) know what you are doing + <nlightnfotis> I see. Would you like me to write it right now, or is it + okay to write it a day or two later? + <hacklu__> nlightnfotis: FYI. this week my report is not finished. just + state some problem I face now. + <youpi> nlightnfotis: I'd say better write it now + <nlightnfotis> youpi: Ok I will write it and tell you when I am done with + it. + <nlightnfotis> youpi: here is my partial report describing what my course + of action looked like this + week. http://www.fotiskoutoulakis.com/blog/2013/08/19/gsoc-week-9-partial-report/ + <nlightnfotis> of course, I will write in a day or two (hopefully having + figured out the whole situation) an exhaustive report describing + everything I did in detail + <nlightnfotis> youpi: I have written my (partial) report describing how I + went about this week + http://www.fotiskoutoulakis.com/blog/2013/08/19/gsoc-week-9-partial-report/ + <youpi> nlightnfotis: good, thanks! + <nlightnfotis> youpi: please note that this is not an exhaustive link of my + findings or course of action, it merely acts as an example to demonstrate + the way I think and how I go about every day. + <nlightnfotis> I will write an exhaustive report of everything I did so + far, when I figure out what the issue is, and I feel I am close. + <youpi> well, you don't need to explain all bits in details + <youpi> this is fine to show an example of how you went + <youpi> but please also provide a summary of your other findings + <nlightnfotis> oh okay, I will keep this in mind. :) + + +# IRC, freenode, #hurd, 2013-08-22 + + < nlightnfotis> if I want to rebuild libpthread, I have to embed it into + eglibc's source, then build? + < pinotree> or pick the debian sources, patch libpthread there and rebuild + < nlightnfotis> that's most likely what I am going to do. Thanks pinotree. + < pinotree> yw + < braunr> nlightnfotis: i usually add my patches on top of the debian glibc + ones, yes + < braunr> it requires some tweaking + < braunr> but it's probably the easiest way + < nlightnfotis> braunr: I was studying my issues with gcc, and everyday I + was getting more and more confident it must be a libpthread issue + < nlightnfotis> and I figured out, that I might wanna play with libpthread + this time + < braunr> it probably is but + < braunr> i'm not so sure you should dive there + < nlightnfotis> why not? + < braunr> because it can be worked around in go + < braunr> i had a test for you last time + < braunr> do you remember what it was ? + < nlightnfotis> nope :/ care to remind it? + < braunr> iirc, it was running the go test you did but with an additional + instruction in the main function, that pauses + < braunr> something like getchar() in c + < braunr> to make sure main doesn't exit while the goroutines are still + running + < braunr> i'm almost positive that the bug you're seeing is main returning + and libpthread beleiving it's acting on the main thread because there is + only one left + < nlightnfotis> oh that's easy, I can do it now. But it's probably what + thomas had suggested: go routines may not be running at all. + < braunr> they probably aren't + < braunr> and that's a context bug + < braunr> not a libpthread bug + < braunr> and that's what you should focus on + < braunr> the libpthread bug is minor + < nlightnfotis> which is strange, because I had studied the assembly code + and it the code for the goroutine was there + < nlightnfotis> anyway I will proceed with what you suggested + < braunr> yes please + < braunr> that's becoming important + < nlightnfotis> would you mind me dumping some of my findings for you to + evaluate/ post on opinion on? + < braunr> no + < braunr> please do so + < nlightnfotis> I have found that the go runtime starts with a total number + of threads == 1 + < braunr> nlightnfotis: as all processes + < nlightnfotis> I would guess that's because of using fork () + < nlightnfotis> oh so it's ok + < braunr> there always is a main thread + < braunr> even for non-threaded applications + < nlightnfotis> yeah, that I know. The runtime proceeds to create + immediately one more. + < braunr> then it's 2 + < nlightnfotis> and that's ok, it doesn't have an issue with that + < nlightnfotis> yep + < nlightnfotis> the issue begins when it tries to create the 3rd one + < braunr> hum + < braunr> from what i remember + < nlightnfotis> it happily goes through the go runtime's kernel thread + allocation function (runtime_newm()) + < braunr> you also had an issue with the first goroutine + < nlightnfotis> that's with 1 go routine + < braunr> ok + < braunr> so 1 goroutine == 3 threads + < nlightnfotis> it seems so yes. + < braunr> depending on how the go scheduler is able to assign goroutines to + kernel threads i suppose + < nlightnfotis> mind you, (disclaimer: I am not so sure about that) that go + must be using one extra thread for the runtime scheduler and garbage + collector + < braunr> that's ok + < nlightnfotis> so that's where the two come from + < braunr> and expected from a modern runtime + < nlightnfotis> the third must be the go routime + < nlightnfotis> routine + < braunr> hum have to go + < braunr> brb in a few minutes + < braunr> keep posting + < nlightnfotis> it's ok take your time + < nlightnfotis> I will be here + < braunr> but i may not ;p + < braunr> in fact i will not + < braunr> i have like 15 mins ;) + < braunr> nlightnfotis: ^ + < nlightnfotis> I am trying what you told me to do with go + < nlightnfotis> it's ok if you have to go, I will continue investigating + and be back tomorrow + < braunr> ok + < nlightnfotis> braunr: I tried what you asked me to do, both we waiting to + read a string from stdin and with waiting to read an int from stdin + < nlightnfotis> it never waits, it still aborts with the assertion failure + < nlightnfotis> both with one and two go routines + < nlightnfotis> dumping it here just for the log, running the same code + without waiting for input results in two threads created (1 for main and + 1 for runtime, most likely) and "normal" execution. + < nlightnfotis> normal as in no assertion failure, + < nlightnfotis> it seems to skip the goroutines altogether + + +# IRC, freenode, #hurd, 2013-08-23 + + < braunr> nlightnfotis: can i see your last go test code please ? the one + with the read at the end of main + < nlightnfotis> braunr sure + < nlightnfotis> sorry I had gone to the toilet, now I am back + < nlightnfotis> I will send it right now + < nlightnfotis> braunr: http://pastebin.com/DVg3FipE + < nlightnfotis> it crashes when it attempts to create the 3rd thread (the + 1st goroutine), with the assertion fail + < nlightnfotis> if you remove the Scanf it will not fail, return 0, but + only create 2 threads (skip the goroutines alltogether) + < braunr> can you add a print right before main exits please ? + < braunr> so we know when it does + < nlightnfotis> doing it now + < nlightnfotis> braunr: If I enter a print statement right before main + exits, the assertion failure is triggered. If I remove it, it still runs + and creates only 2 threads. + < braunr> i don't understand + < braunr> 14:42 < nlightnfotis> it crashes when it attempts to create the + 3rd thread (the 1st goroutine), with the assertion fail + < braunr> why don't you get that ? + < nlightnfotis> This seems like having to do with the runtime. I mean, I + have seen the emitted assembly from the compiler, and the goroutines are + there. Something in the runtime must be skipping them + < braunr> context switching seems buggy + < nlightnfotis> if it's only goroutines in main + < nlightnfotis> if there's also something else in main, the assertion + failure is triggered. + < braunr> i want you to add a printf right before main exits, from the code + you pasted + < nlightnfotis> I did. It acts the same as before. + < braunr> do you see that last printf ? + < nlightnfotis> no. It aborts before that + < nlightnfotis> :q + < braunr> find a way to make sure the output buffer is flushed + < braunr> i don't know how it's done in go + < nlightnfotis> mistype the :q, was supposed to do it vim + < nlightnfotis> braunr will do right away + < nlightnfotis> there is one thing I still can not understand: Why is it + that two threads are ok, but when the next is going to get created, the + assertion is triggered. + < braunr> nlightnfotis: the assertion is triggered because a thread is + being created while there is only one thread left, and this thread isn't + the main thread + < braunr> so basically, the main thread has exited, and another (the last + one) is trying to create one + < nlightnfotis> the other one might be the runtime I guess. Let me check + out quickly what you suggested + < braunr> the main thread shouldn't exit at all + < braunr> so something with context switching is wrong + < nlightnfotis> the thing is: it doesn't seem to exit when this happens. My + debug statements (in the runtime) suggest that there are at least 2 + threads active, kernel threads don't get destroyed in gccgo + < braunr> 14:52 < braunr> so something with context switching is wrong + < braunr> how well have the context switching functions been tested ? + < nlightnfotis> to be honest I have not tested them; up until this point I + trusted they worked. Should I also take a look at them? + < braunr> how can you trust them ? + < braunr> they've never been used .. + < braunr> thomas added them recently if i'm right + < braunr> nothing has been using them except go + < braunr> piece of advice: don't trust anything + < nlightnfotis> I think they were in before, and thomas recently patched + them! + < braunr> they were in, but didn't work + < braunr> (if i'm right) + < braunr> nlightnfotis: you could patch libpthread to monitor the number of + threads + < braunr> or the go runtime, idk + < nlightnfotis> I have done so on the go runtime + < nlightnfotis> that's where I am getting the number of threads I + report. That's straight out from the scheduler's count. + < braunr> threads can exit by calling pthread_exit() or returning from the + thread routine + < braunr> make sure you catch both + < braunr> also check for pthread_cancel(), although i don't expect any in + go + < nlightnfotis> braunr: Should I really do that? I mean, from what I can + see in gccgo's comments, Kernel threads (m) never go away. They are added + to a pool of m's waiting for work if there is no goroutine running on + them + < nlightnfotis> I mean, I am not so sure they exit at all + < braunr> be sure + < braunr> point me the code please + < nlightnfotis> + https://github.com/NlightNFotis/gcc/blob/master/libgo/runtime/proc.c#L224 + < nlightnfotis> this is where it get's stated that m's never go away + < nlightnfotis> and at line 257 you can see the pool + < nlightnfotis> and wait for me to find the code that actually releases an + and places into the pool + < nlightnfotis> yep found it + < nlightnfotis> line 817 mput + < nlightnfotis> puts a kernel thread given as parameter to the pool + < nlightnfotis> another proof of the theory is at line 1177. It states: + "This point is never reached, because scheduler does not release os + threads at the moment." + < braunr> fetching git repository, bit busy, i'll have a look in 5-10 mins + < nlightnfotis> oh it's ok, I had pointed you to the file directly on + github to check it out instantly, but never mind, the file is + <gccroot>/libgo/runtime/proc.c + < braunr> damn github is so slow .. + < braunr> nlightnfotis: i much prefer my own text interface :) + < nlightnfotis> braunr: just out of curiosity what's your setup? I use vim + mainly (not that I am a vim expert or anything, I only know the basics, + but I love it) + < braunr> same + < braunr> nlightnfotis: add a trace at that comment to make SURE threads do + not exit + < braunr> you *cannot* get the libpthread assertion with more than 1 thread + < braunr> grep for pthread_exit() too + < nlightnfotis> will do it now. It will take about an hour to compile + though. + < braunr> i don't understand the stack trick at the start of runtime_mstart + < braunr> ah splitstack .. + < nlightnfotis> I think I should try cross compiling gcc, and then move + files on the hurd. It would be so much faster I believe. + < braunr> than what ? + < nlightnfotis> building gcc on the hurd + < nlightnfotis> I remember it taking about 10minutes with make -j4 on the + host + < nlightnfotis> it takes 45-50 minutes on the vm (kvm enabled) + < braunr> but you can merely rebuild the files you've changed + < nlightnfotis> I feel stupid now... + < braunr> nlightnfotis: have you tried setting GOMAXPROCS to 1 ? + < nlightnfotis> not really, but from what I know GOMAXPROCS defaults to 1 + if not set + < braunr> again, check that + < braunr> take the habit of checking things + < nlightnfotis> braunr: yeah sorry for that. I have checked these things + out before they don't come out of my head I just don't remember exactly + where I had seen this + < braunr> what you can also do is use gdb to catch the assertion and check + the number of threads at that time, as well as the number of threads as + seen by libpthread + < nlightnfotis> braunr: line 492 file proc.c: runtime_gomaxprocs = 1; + < braunr> also see runtime.LockOSThread + < braunr> to make sure the main thread is locked to its own pthread + < nlightnfotis> I can see in line 529 of the same file that the first + thread is getting locked + < nlightnfotis> the new threads that get initialised are non main threads + < braunr> if(!runtime_sched.lockmain) runtime_UnlockOSThread(); + < braunr> i'm suggesting you set runtime_sched.lockmain + < braunr> so it remains true for the whole execution + < braunr> this code looks like a revamp of plan9 lol + < nlightnfotis> it is + < nlightnfotis> in the paper from Ian Lance Taylor describing gccgo he + states somewhere that the original go compilers (the 3gs) are a modified + version of plan9's C compiler, and that gccgo tries to follow them + < nlightnfotis> they differ in a lot of ways though + < nlightnfotis> the 3gs generate a lot of code during link time + < nlightnfotis> gccgo follows the standard gcc procedures + < braunr> eh :D + < nlightnfotis> go -> gogo -> generic -> gimple -> rtl -> object + < nlightnfotis> that's how it flows as far as I recall + < nlightnfotis> gogo is an internal representation of go's structure inside + the gccgo frontend + < nlightnfotis> that's why you see many functions with gogo in their name + < nlightnfotis> I just revisited the paper: gogo is there to make it easy + to implement whatever analysis might seem desirable. It mirrors however + the Go source code read from the input files + < braunr> nlightnfotis: what are you trying now ? + < nlightnfotis> I am basically studying the runtime's source code while + waiting for gccgo to compile on the Hurd + < nlightnfotis> yes I did the stupid whole recompilation again. :/ + < braunr> nlightnfotis: compile for what ? + < braunr> what test ? + < nlightnfotis> to check out to see if M's really are added to the pool + instead of getting deleted + < braunr> nlightnfotis: but how ? + < nlightnfotis> braunr: I have added a statement in mput if we get there + first, and secondly the number of threads that the runtime scheduler + knows that are waiting (are in the pool of m's waiting for work) + < braunr> ok + < braunr> when you can, i'd really like you to do this test : + < braunr> 15:55 < braunr> what you can also do is use gdb to catch the + assertion and check the number of threads at that time, as well as the + number of threads as seen by libpthread + < nlightnfotis> the number of threads required by libpthread is gonna need + me to recompile the whole eglibc right? + < braunr> no + < braunr> just print it with gdb + < nlightnfotis> oh, ok + < braunr> it's __pthread_num_threads + < nlightnfotis> is gdb reliable? I remember thomas telling me that I can't + trust gdb at this point in time + < braunr> and also __pthread_total + < braunr> really ? + < braunr> i don't see why not :/ + < braunr> youpi: any idea about what nlightnfotis is speaking of ? + < nlightnfotis> I may have misunderstood it; don't take it by heart + < nlightnfotis> I don't wanna put words in other people's mouths because I + misunderstood something + < braunr> sure + < braunr> that's my habit to check things + < youpi> braunr: nope + < braunr> youpi: and am i right when i say we don't use context functions + on the hurd, and they're likely to be incomplete, even with the recent + changes from thomas ? + < braunr> (mcontext, ucontext) + < nlightnfotis> braunr: this is what had been said: 08:46:30< tschwinge> As + this is a parallel problem, and it is involving "advanced" things (such + as setcontext), I would not trust GDB too much when used on this code. + < pinotree> if thomas' changes were complete and polished, i guess he would + have sent them upstream already + < braunr> i see but + < braunr> you can normally trust gdb for global variables + < nlightnfotis> Didn't post it as an objection; I posted it because I felt + bad putting the wrong words on other people's mouths, as I said + before. So I posted his original comment which was more authoritative + than my interpretation of it + < braunr> i wonder if there is a tunable to strictly map one thread to one + goroutine + < braunr> nlightnfotis: more focus on the work, less on the rest please + < nlightnfotis> Did I do something wrong? + < braunr> you waste too much time apologizing + < braunr> for no reason + < braunr> nlightnfotis: i suppose you don't use splitstack, right ? + < nlightnfotis> no I didn't + < nlightnfotis> and here's something interesting: The code I just added, in + mput, to see if threads are added in the pool. It's not there, no matter + what I run + < nlightnfotis> So it seems that we the runtime is not reaching mput. + < nlightnfotis> Could this be normal behavior? I mean, on process + termination just release the resources so mput is skipped? + < braunr> i don't know the code well enough to answer that + < braunr> check closer to the lower interface + + +# IRC, freenode, #hurd, 2013-08-25 + + < nlightnfotis> braunr: what is initcontext supposed to be doing? + < braunr> nlightnfotis: didn't look + < braunr> i'll take a look later + < nlightnfotis> braunr: I am buffled by it. It seems to be doing nothing on + the Hurd branch and nothing in the Linux branch either. Why call a + function that does nothing? (it doesn't only seem to do nothing, I have + confirmed it) + < nlightnfotis> youpi: I was wondering if you could explain me + something. What is the initcontext function supposed to be doing? + < youpi> you mean initcontext ? + < nlightnfotis> yes + < youpi> ergl + < youpi> you mean makecontext? + < nlightnfotis> no initcontext. I am faced with this in the goruntime. It's + called in it, but it is doing nothing. Neither in the Hurd tree, nor in + the Linux one + < youpi> I don't know what initcontext is + < youpi> where do you read it? + < nlightnfotis> youpi: let me show you + < nlightnfotis> + https://github.com/NlightNFotis/gcc/blob/fotisk/goruntime_hurd/libgo/runtime/proc.c#L80 + < nlightnfotis> and it is called in quite a few places + < youpi> it's not doing nothing, see other implementations + < pinotree> if SETCONTEXT_CLOBBERS_TLS is not defined, initcontext and + fixcontext do nothing + < pinotree> otherwise (presuming if setcontext clobbers tls) there are two + implementations for solaris/x86_64 and netbsd + < youpi> I don't think we have the tls clobber bug + < youpi> so these functions being empty is completely fine + < nlightnfotis> pinotree: oh, you mean it's used as a workaround for these + two systems only? + < youpi> yes + < pinotree> yes + < nlightnfotis> That makes sense. Thanks both of you for the help :) + < nlightnfotis> youpi: if this counts as some progress, I have traced the + exact bootstrapping sequence of a new go process. I know a good deal of + what is done from it's spawn to it's end. There are some things I wanna + sort out, and later tonight I will write my report for it to be ready for + tomorrow. + < youpi> good + + +# IRC, freenode, #hurd, 2013-08-26 + + < nlightnfotis> Hi everyone, my report is here + http://www.fotiskoutoulakis.com/blog/2013/08/26/gsoc-week-10-report/ + < youpi> nlightnfotis: you should clearly put printfs inside libpthread + < youpi> to check what is happening with the ktids + < nlightnfotis> youpi: yep, that's my next course of action. I just want to + spend some more time in the go runtime to make sure that I understand the + flow perfectly, and to make sure that it is not the runtime's fault + < braunr> nlightnfotis: did you try gdb to print the number of threads ? + < youpi> nlightnfotis: to build it, the easiest way is to start building + eglibc, and when you see it compiling C files (i.e. run i486-gnu-gcc-4.7 + etc.) + < youpi> stop it + < youpi> and go into build/hurd-i386-libc, and run "make others" from there + < nlightnfotis> braunr: that was my plan for today or tomorrow :) + < braunr> start building *debian* glibc + < youpi> there's perhaps some way to only build libpthread, but I don't + remember + < braunr> nlightnfotis: ok + < braunr> youpi: i suggested he tried gdb first + < youpi> why not + < braunr> if you need quick glibc builds, you can use darnassus + < nlightnfotis> braunr: how much time on average should I expect it to + take? + < youpi> it highly depends on the machine + < youpi> it can be hours + < youpi> or a few minutes + < youpi> depending you already have a built tree, a fast disk, etc. + < braunr> make lib others on darnassus takes around 30 minutes + < braunr> a complete dpkg-buildpackage from fresh sources takes 5-6 hours + < braunr> make others from a built tree is very quick + < braunr> a few minutes at most + < braunr> nlightnfotis: i don't see any trace of thread exiting in your + report, is that normal ? + < nlightnfotis> yeah, I guess, since they don't exit prematurely, they are + released along with other resources at the process' exit + < braunr> i'll rephrase + < braunr> you said last time that you saw a function never got called + < braunr> i assumed it was because a thread exited prematurely + < nlightnfotis> oh I sorted it out with the help of youpi and pinotree + yesterday + < braunr> that's different + < braunr> i'm not talking about the function that does nothing + < braunr> i'm talking about the one never called + < nlightnfotis> oh, go on then, + < braunr> i don't remember its name + < braunr> anyway + < nlightnfotis> abort()? + < braunr> i hope abort doesn't get called :) + < nlightnfotis> it doesn't + < braunr> i thought it was the one right before + < braunr> what i mean is + < nlightnfotis> oh runtime_mstart, it does get called + < braunr> add traces at thread exit points + < nlightnfotis> I sorted it out too + < braunr> make *sure* threads don't exit + < nlightnfotis> it get's called to start the kernel thread created at + process spawn at the runtime_schedinit + < braunr> if they really don't, it's probably a context/tls issue + < nlightnfotis> I will do this right now. + < nlightnfotis> braunr: if it's a context/tls issue it's libpthread's + problem? + + +# IRC, freenode, #hurd, 2013-09-02 + + <nlightnfotis> Hello! My report for this week is online: + http://www.fotiskoutoulakis.com/blog/2013/09/02/gsoc-week-11-report/ + <braunr> nlightnfotis: there always is a signal thread in every hurd + program + <braunr> nlightnfotis: i also pointed out that there are two variables + involved in counting threads in libpthread, the other one being + __pthread_num_threads + <braunr> again, more attention to work and details, less showmanship + <braunr> i'm tired of repeating it + <youpi> nlightnfotis: doesn't backtrace work in gdb to tell you what + 0x01da48ec is? + <youpi> also, do you have libc0.3-dbg installed? + <nlightnfotis> braunr: __pthread_num_threads reports is 4. + <braunr> then why isn't it in your report ? + <braunr> it's acceptable that you overlook it + <nlightnfotis> and youpi: yeah I have got the backtrace, but 0x01da48ec is + ?? () from /lib/i386-gnu/libc.so.3 + <braunr> it's NOT when someone else has previously mentioned it to you + <youpi> nlightnfotis: only that line, no other line? + <nlightnfotis> it has 8 more youpi, the one after ?? is mach_msg () + form/lib/gni386-gnu/libc.so.0.3 + <braunr> yes mach_msg + <braunr> almost everything ends up in mach_msg + <youpi> you should probably pastebin somewhere the output of thread apply + all bt + <braunr> what's before that ? + <nlightnfotis> braunr: I don't know how I even missed it. I skimmed through + the code and only found __pthread_total and assumed that it was the total + number of threads + <braunr> nlightnfotis: i don't know either + <braunr> take notes + <nlightnfotis> before mach_msg ins __pthread_timedblock () from + /lib/i386-gnu/libpthread.so.0.3 + <nlightnfotis> I will add it to pastebin in a second + <braunr> i find it very disappointing that after several weeks blocking on + this, despite all the pointers you've been given, you still haven't made + enough progress to reach the context switching functions + <braunr> last week, most progress was made when we talked together + <braunr> then nothing + <braunr> it seems that you disappear, apparently searching on your own + <braunr> but for far too long + <nlightnfotis> braunr: I do search on my own, yes, + <braunr> almost like exploiting being blocked not to make progress on + purpose ... + <braunr> but too much + <nlightnfotis> braunr: I am not doing this on purpose, I believe you are + unfair to me. I am trying to make as much progress as I can alone, and + reach out only when I can't do much more alone + <braunr> then why is it only now that we get replies to questions such as + "how much is __pthread_num_threads" ? + <braunr> why do you stop discussions for almost a week, just to find + yourself blocked again ? + <nlightnfotis> I was working on gcc, going through the runtime making sure + about assumptions and going through various other goroutine or not + programs through gdb + <braunr> that doesn't take a week + <braunr> clearly not + <braunr> last time we talked was + <braunr> 10:40 < nlightnfotis> braunr: if it's a context/tls issue it's + libpthread's problem? + <nlightnfotis> it did for me... honestly, what is it you believe I am doing + wrong? I too am frustrated by my lack of progress, but I am doing my best + <braunr> august 26 + <nlightnfotis> yeah, I wanted to make sure about certain assumptions on the + gcc side. I don't want to start hacking on libpthread only to see that it + might have been something I msissed on the gcc side + <braunr> i told you + <braunr> it's probably not a libpthread issue + <braunr> the assertion is + <braunr> but it's minor + <braunr> it's not the realy problem, only a side effect + <braunr> i told you about __pthread_num_threads, why didn't you look at it + ? + <braunr> i told you about context switching functions, why nothing about it + ? + <braunr> doing a few printfs to check numbers and using gdb to check them + at break points should be quick + <braunr> when we talk,ed we had the results in a few minutes + <nlightnfotis> yeah, because I was guided, and that helped me target my + research. On my own things are quite different. I find out something + about gcc's behavior, then find out I need tons more information, and I + have a lot of things that I need to research to confirm any assumptions + from my side + <braunr> how did you miss the signal thread ? + <braunr> we even talked about it right here with hacklu + <braunr> i'll say it again + <braunr> if blocked more than one day, ask for help + <braunr> 2 days minimum each time is just too long + <nlightnfotis> I'm sorry. I will be online every day from now on and report + every 10 minutes, on my course of actions. + <nlightnfotis> I recognise that time is off the essence at this point in + time + <braunr> it's also NO + <braunr> NO + <braunr> *SIGH* + <hacklu> nlightnfotis: calm down. braunr just want to help you solve + problem quickly. + <braunr> 10 minutes is the other extreme + <hacklu> nlightnfotis: in my experiecence, if something block me, I will + keep asking him until I solve the problem. + <braunr> it's also very frustrating to see you answer questions quickly + when you're here, then wait days for unanswered questions that could have + taken little time if you kept being here + <braunr> this just gives the impression that you're doing something else in + parallel that keeps you busy + <braunr> and comfort me in believing you're not being serious enough + aboutit + <nlightnfotis> yeah, I understand that it gives that impression. The only + thing I can tell you now, is that I am *not* doing something else in + parallel. I am only trying to demonstrate some progress alone, and when + working alone things for me take quite some more time than when I am + guided + <braunr> hacklu: i'm actually the nervous one here + <nlightnfotis> braunr: ok, I understand I have dissapointed you. What would + you suggest me to do from now on? + <hacklu> braunr: :) + <braunr> manage your time correctly or you'll fail + <braunr> i'm not the main mentor of this project so it's not for me to + decide + <braunr> but if i were, and if i had to wait again for several days before + any notice of progress or blocking, i wouldn't even wait for the end of + the gsoc + <braunr> you're confronted with difficult issues + <braunr> tls, context switching, thread + <braunr> ing + <braunr> they're all complicated + <braunr> unless you're very experienced and/or gifted, don't assume you can + solve it on your own + <braunr> and the biggest concern for me is that it's not even the main + focus of your project + <braunr> you should be working on go + <braunr> on porting + <braunr> any side issues should be solved as quickly as possible + <braunr> and we're now in september ... + <nlightnfotis> go is working quite alright. It's goroutines that have + issues. + <braunr> nlightnfotis: same thing + <braunr> goroutines are part of go as far as i'm concerned + <braunr> and they're working too, something in the hurd isn't + <braunr> so it's a side issue + <braunr> you're very much entitled to ask as much help as you need for side + issues + <braunr> and i strongly feel you didn't + <nlightnfotis> yeah, you're right. I failed on that aspect, mainly because + of the way I work. I wanted to show some progress on my own, and not be + here and spam all day. I felt that spamming questions all day would + demonstrate incompetence from my side + <nlightnfotis> and I wanted to show that I am capable of solving my + problems on my own. + <braunr> well, in a sense it does, but that's not the skills we were + expecting from you so it's perfectly ok + <braunr> nlightnfotis: no development group, even in companies, in their + right mind, would expect you to grasp the low level dark details of an + operating system implementation in a few weeks ... + <nlightnfotis> braunr: ok, may I ask what you suggest to me that my next + course of action is? + <braunr> let me see + <braunr> nlightnfotis: your report mentions runtime_malg + <nlightnfotis> yes, I runtime malg always returns a new goroutine + <braunr> nlightnfotis: what's the problem ? + <nlightnfotis> a new m created is assigned a new goroutine via runtime_malg + <nlightnfotis> what happens to that goroutine? Is it destroyed? Because it + seems to be a bogus goroutine. Why isn't the kernel thread instantly + picking the one goroutine available at the global goroutine pool? + <braunr> let's see if it's that hard to figure out + <nlightnfotis> seeing as m's and g's have a 1:1 (in gccgo) relationship, + and a new kernel thread is created everytime there is a new goroutine + there to run. + <braunr> are you sure about that 1:1 relationship ? + <braunr> i hardly doubt it + <braunr> highly* + <nlightnfotis> yeah, that's what I thought too, but then again, my research + so far shows that when a new goroutine is created, a new kernel thread + creation follows suit + <nlightnfotis> what I have mentioned of course, happens in runtime_newm + <braunr> nlightnfotis: that's when you create a new m, not a new g + <nlightnfotis> yes, a new m is created when you create a new g. My issue is + that during m's creation, a new (bogus) g is created and assigned to the + m. I am looking into what happens to that. + <braunr> nlightnfotis: "a new m is created when you create a new g", can + you point me to the code ? + <nlightnfotis> braunr: matchmg line 1280 or close to that. Creates new m's + to run new g's up to (mcpumax) + <braunr> "Kick off new m's as needed (up to mcpumax)." + <braunr> so basically you have at most mcpumax m + <nlightnfotis> yeah. but for a small number of goroutines (as for example + in my experiments), a new m is created in order to run a new g. + <braunr> runtime_newm is called only if mget(gp)) == nil + <braunr> be rigorous please + <braunr> when i ask + <braunr> 11:01 < braunr> are you sure about that 1:1 relationship ? + <braunr> this conclusively proves it's *false* + <braunr> so don't answer yes to that + <braunr> it's true for a small number of goroutines, ok + <braunr> and at startup + <braunr> because then, mget returns an existing m + <braunr> nlightnfotis: this g0 goroutine is described in the struct as + <braunr> G runtime_g0; // idle goroutine for m0 + <braunr> runtime_malg builds it with just a stack + <braunr> apparently, that's the goroutine an m runs when there are no g + left + <braunr> so yes, the idle one + <braunr> it's not bogus + <nlightnfotis> I thought m0 and g0 where the bootstrap m and g for the + scheduler. + <nlightnfotis> *correction: runtime_m0 and runtime_g0 + <braunr> hm i got a bit fast + <braunr> G* g0; // goroutine with scheduling stack + <nlightnfotis> braunr: scheduling stack with stacksize = -1? + <nlightnfotis> unless it's not used as a parameter + <nlightnfotis> let me investigate that + <nlightnfotis> yeah now that I am seeing it, it might make sense, if it + using a default stack size, #defined as StackMin + <braunr> g0 looks like a placeholder + <braunr> i think it's used to reuse switching code when there is only one + goroutine involved + <braunr> e.g. when starting + <braunr> anyway i don't think we should waste too much time with it + <braunr> nlightnfotis: try to make a real 1:1 mapping + <braunr> that's something else i suggested last time + <nlightnfotis> braunr: ok. Where do you suspect the problem lies? + <braunr> context switching + <nlightnfotis> inside the goruntime? + <braunr> in glibc + <braunr> try to use runtime.LockOSThread + <braunr> http://code.google.com/p/go-wiki/wiki/LockOSThread + <braunr> nlightnfotis: http://golang.org/pkg/runtime/ is probably better + <nlightnfotis> what exactly do you mean by `use runtime.LockOSThread`? + LockOSThread locks the very first m and goroutine as the main threads + during process initialisation + <nlightnfotis> in proc.c line 565 or something + <braunr> i'm not sure it will help, because the problem is likely to occur + before even switching to the goroutine that locks its m, but worth trying + <braunr> 11:28 < braunr> nlightnfotis: http://golang.org/pkg/runtime/ is + probably better + <braunr> the first example is specific to GUIs that have requirements on + the main thread + <braunr> whereas i want every goroutine to run in its own thread + <nlightnfotis> I have also noticed that some context switching happens in + the goruntime even with a low number of goroutines and kernel threads + <braunr> that's expected + <braunr> goroutines must be viewed as works, and ms as worker threads + <braunr> everytime a goroutine sleeps, its m should be switching to useful + work + <braunr> nlightnfotis: i'd make prints (probably using mach_print) of + contexts when saved and restored + <braunr> and try to see if it makes any sense + <braunr> that's not simple to setup but not overly complicated either + <braunr> don't hesitate to ask for help + <nlightnfotis> from inside glibc, right? + <braunr> yes + <braunr> well + <braunr> no from go + <braunr> don't touch glibc from now + <braunr> put these prints near calls to makecontext/swapcontext + <braunr> and setcontext/getcontext + <braunr> wel + <braunr> you'll be using getcontext i think + <nlightnfotis> noted it all. I also have the gdb output you asked me for + http://pastebin.com/LdnMQDh1 + <braunr> i don't see main + <nlightnfotis> some notes first: The main thread is the one with id 4, and + the output on the top is its backtrace. + <braunr> and main.main is run in thread 6 + <nlightnfotis> Remember that main when it comes to go is in the file + go-main.c + <braunr> so main becomes runtime_MHeap_Scavenger + <nlightnfotis> yeah, main.main is the code of the program, (the one the + user wrote, not the runtime) + <nlightnfotis> yeah, it becomes a gc thread + <nlightnfotis> seeing as runtime_starttheworld reports that there is + already one gc thread + <braunr> and how much are __pthread_total and __pthread_num_threads for + that trace ? + <nlightnfotis> they were: __pthread_total = 2, and __pthread_num_threads = + 4 + <braunr> can you paste the assertion again please, just to make sure + <nlightnfotis> a.out: ./pthread/pt-create.c:167: __pthread_create_internal: + Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = + thread->kernel_thread == ktid; + <nlightnfotis> __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; + })' failed. + <braunr> btw, install the -dbg packages too + <nlightnfotis> dbg for which one? gccgo? + <braunr> libc0.3 + <braunr> pthread/pt-create.c:167 is __pthread_sigstate (_pthread_self (), + 0, 0, &sigset, 0); here :/ + <braunr> that assertion should be in __pthread_thread_start + <braunr> let's just say gdb is confused + <pinotree> braunr: apt-get source eglibc ; cd eglibc-* ; debian/rules patch + <braunr> pinotree: i have + <braunr> and that assertion can only trigger if __pthread_total is 1 + <braunr> so let's say it just got to 2 + <nlightnfotis> it does from very early on in process initialisation + <nlightnfotis> let me check this out again + <braunr> hm + <braunr> actually, both __pthread_total and __pthread_num_threads must be 1 + <braunr> the context functions might be fine actually + <nlightnfotis> braunr: __pthread_num_threads = 2 right from the start of + the program + <nlightnfotis> 0x01da48ec is in mach_msg_trap + <braunr> something happened with libpthreads recently .. + <braunr> i can't even start iceweasel + <pinotree> braunr: what's the error? + <braunr> iceweasel: ./pthread/../sysdeps/generic/pt-mutex-timedlock.c:70: + __pthread_mutex_timedlock_internal: Assertion `__pthread_threads' failed. + +But not the [[open_issues/libpthread_dlopen]] issue? + + <braunr> considering __pthread_threads is a global variable, this is tough + <braunr> i wonder if that's the issue with nlightnfotis's work + <braunr> wrong symbol resolution, leading libpthread to consider there is + only one thread running + <pinotree> try with LD_PRELOAD=/lib/i386-gnu/libpthread.so.0 iceweasel + <braunr> same + <braunr> maybe the switch to glibc 2.17 + <braunr> this assertion is triggered by __pthread_self, assert + (__pthread_threads); + <braunr> __pthread_threads being the array of thread pointers + <braunr> so either corrupted (but we hardly changed anything ...) or wrong + resolution + <braunr> __pthread_num_threads includes the signal thread, __pthread_total + doesn't + <nlightnfotis> braunr: I recompiled with the libc debugging symbols and I + have new information + <nlightnfotis> the threads block at mach_msg_trap + <braunr> again, almost everything blocks there + <braunr> mach_msg is mach ipc, the way hurd system calls are implemented + <nlightnfotis> and the next calls (if it didn't block, from what I can see + from eip) are mach_reply_port and mach_thread_self + <braunr> please paste it + <nlightnfotis> yes give me 2 mins plz, brb + <braunr> pinotree: looks different for firefox + <braunr> it seems it calls pthread_key_create before pthread_create + <braunr> something our libpthread doesn't handle correctly + <nlightnfotis> braunr: http://pastebin.com/yNbT7nLn + <pinotree> braunr: what do you mean? + <braunr> pinotree: i mean libpthread needs to be fixed so thread-specific + data can be set even without a call to pthread_create + <braunr> nlightnfotis: hum, we already knew it was blocking in a semaphore + <braunr> nlightnfotis: ok forget the other things i told you to test + <braunr> nlightnfotis: track __pthread_total and __pthread_num_threads + <braunr> add prints (again, with mach_print) to see when (and why) they + change and go back to 1 + <pinotree> braunr: i see that pthread_key_create uses a mutex which in + turns needs _pthread_self(), but shouldn't at least one pthread_create be + done (directly by libc for the main thread)? + <braunr> pinotree: no :) + <braunr> well + <braunr> it should have been for the signal thread indeed + <braunr> and the signal thread exists + <pinotree> and the main thread? + <braunr> not the main, no + <pinotree> how so? + <braunr> a simple test program shows it does indeed work .. + <braunr> so this is again another problem in firefox too + <nlightnfotis> braunr: I don't think I understand this. I mean how can + pthread_total and __pthread_num_thread turn to 1, when , right before and + right after the crash they have numbers between 2, 3, and 4? + <braunr> how did you get their values "right before" the crash ? + <nlightnfotis> I have set a breakpoint to a printing function right before + the go statement + <nlightnfotis> (right before in this context, in the application code, not + the runtime code, but then again, I don't really think they are too far + each other) + <braunr> well, that's the mystery + <nlightnfotis> I am not challenging what you said, I will of course do, + just asking to understand some things + <braunr> they may either turn to 1, or there is some mess with symbol + resolution leading threads to see a value of 1 + <nlightnfotis> *do it + <braunr> there* + <nlightnfotis> braunr: ping + <teythoon> just ask ;) + <nlightnfotis> teythoon: have you used mach_print? + <teythoon> no + <nlightnfotis> I have some questions about it + <teythoon> ask them + <nlightnfotis> I was told to use them inside go's runtime, to print the + values of __pthread_total and __pthread_num_threads. The thing is, these + values (I believe) are unknown to the runtime, they are only known to the + executable (linking time and later) + <teythoon> so? if the requested information is bound to a symbol that is + resolved at link time, you can print it from within the runtime + <teythoon> the same way any function from the libc is not known to the + executable until linking against it, but you can still "use" it in your + executable + <nlightnfotis> yeah, ok I understand that, but these are references that + are resolved at link time. The values I want to print are totally unknown + to the runtime (0 references to them) + <teythoon> if the value you are interested in is bound to the symbol + __pthread_total at link time, then you've got a reference you can use + <teythoon> doesn't printing __pthread_total work? did you try that? + <nlightnfotis> no, whenever I printed these values I did it from gdb. I am + trying to do what you suggested atm + <braunr> nlightnfotis: im here + <braunr> printing those values from libgo will tell us what value libgo + actually sees + <nlightnfotis> I am trying to use mach_print. Could you give me some + pointers on its usage (inside the goruntime?) (I have already read your + document here + http://www.gnu.org/software/hurd/microkernel/mach/gnumach/interface/syscall/mach_print.html + and the example code) + <braunr> and symbol resolution may depend on where it's done from + <braunr> nlightnfotis: first, it only work with -dbg kernels + <braunr> so make sure you're running one + <braunr> actually, i'll write you a patch + <braunr> including a mach_printf function with argument parsing + <nlightnfotis> isn't it on by default? I read that on the document you are + discussing mach_printf + <nlightnfotis> ahh ok + <braunr> it's on by default on -dbg kernels + <braunr> i'll make a repository on darnassus too + <braunr> better store it there + <braunr> nlightnfotis: + http://darnassus.sceen.net/gitweb/rbraun/mach_print.git/ + <braunr> nlightnfotis: i suggest you implement mach_print with inline asm + statement in a C file, so that you don't need to alter the build system + configuration + <braunr> i'll make an example of that too + <nlightnfotis> braunr: that wasn't a problem. My only real problem atm is + that __atomic_t isn't recognised as a type, and I can not find the header + file for it on Hurd + <nlightnfotis> it was pt-internal.h in libpthread + <braunr> ah + <braunr> nlightnfotis: just in case, i updated the repository with an + inline assembly version + <braunr> let's see about __atomic_t + <braunr> sysdeps/i386/bits/pt-atomic.h:typedef __volatile int __atomic_t; + <braunr> nlightnfotis: just redeclare it as this locally + <braunr> nlightnfotis: ok ? + <nlightnfotis> I am working on it, because I still haven't found what + __atomic_t is typedefed from. Thinking of typedefing an int to it and see + how it goes + <nlightnfotis> braunr: found it just now: __volatile int + <braunr> "just now" ? + <braunr> 14:19 < braunr> sysdeps/i386/bits/pt-atomic.h:typedef __volatile + int __atomic_t; + <nlightnfotis> I was using cscope all this time + <braunr> why use cscope at all when i tell you where it is ? + <nlightnfotis> because I didn't notice it: your discussion was between + pino's and srs' and I wasn't tagged and thought it had something to do + with their discussion + <pinotree> (sorry) + <nlightnfotis> no it was my bad + <braunr> ok + <braunr> pinotree: there is indeed a special call to + __pthread_create_internal for the main thread + <pinotree> yeah + <pinotree> braunr: if there wouldn't be that libc→pthread bridge, things + like pthread_self() or so wouldn't work for the main thread + <braunr> pinotree: right + <pinotree> braunr: weird thing is that the error you got is usually a sign + that pthread is not linked in explicitly + <braunr> pinotree: yes + <braunr> pinotree: with firefox, gdb can't locate pthread symbols before a + call to a pthread function + <braunr> so yes, libpthread is loaded after main is called + <braunr> nlightnfotis: can you give me a quick procedure to build gcc with + go support from your repository, and then test a go program please ? + <braunr> to i can have a better look at it myself + <braunr> so* + <nlightnfotis> braunr: sure you want access to my go repo? If you already + have gcc repo add my github repo as a remote and checkout + fotisk/goruntime_hurd + <braunr> i have your github repo + <nlightnfotis> git checkout fotisk/goruntime_hurd (You may need to revert a + commit or two, because of my latest endeavour with mach_print + <nlightnfotis> braunr: check it out now, I reverted some messy commits for + you to rebuild + <braunr> nlightnfotis: i won't work on it right now, i'm building glibc to + check some things in libpthread + <braunr> since it seems to be the source of your problems and many others + <nlightnfotis> oh ok then. btw, it compiles ok, but when I try to compile + another program with gccgo collect2 cries about undefined references to + __pthread_num_threads and __pthread_total + <braunr> Oo + <braunr> another program ? + <nlightnfotis> braunr: will I get the same result if I slowly go through it + with gdb + <nlightnfotis> yep + <braunr> i don't understand + <braunr> what compiles ok, what fails ? + <nlightnfotis> gccgo compiles without errors (which is strange) but when I + use it to compile goroutine.go it fails with the errors I reported + <pinotree> (missing linking to pthread?) + <braunr> since when ? + <nlightnfotis> pinotree: perhaps braunr: since I made the changes with + mach_print + <nlightnfotis> pinotree: but what could be missing the link? GCC compiled + programs are getting linked automatically to the shared objects of the + headers they include right? + <nlightnfotis> (assuming it's not a huge program, only a tiny 10 liner for + instance) + <braunr> uh + <braunr> did you declare them as extern + <braunr> ? + <nlightnfotis> yes + <braunr> do you see -lpthread on the link line ? + <nlightnfotis> during gcc's compilation? I will have to rerun it again and + see. + <braunr> log the compilation output somewhere once + <braunr> nlightnfotis: why did you remove volatile from the definition of + __atomic_t ?? + <nlightnfotis> just for testing purposes, because I thought that the GNU + version is volatile with no __ in front of it and that might cause some + issues. + <braunr> i don't understand + <nlightnfotis> it was just an experiment gone wrong + <braunr> nlightnfotis: keep volatile there + <nlightnfotis> just did + <nlightnfotis> braunr: there is -lpthread on some lines. For instance when + libtool is invoked. + <youpi> braunr: the pthread assertion usually happens when libpthread gets + loaded from a plugin, I guess mozilla got rid of libpthread in the main + application recently, simply + <pinotree> youpi: he said that the LD_PRELOAD trick (which used to + workaround the issue in older iceweasel) does not work, though + <youpi> ah? it does work for me + <pinotree> dunno then... + <braunr> youpi: aouch, ok + <braunr> nlightnfotis: what about the specific gcc invocation that fails ? + <braunr> pinotree: /lib/i386-gnu/libpthread.so.0: ERROR: cannot open + `/lib/i386-gnu/libpthread.so.0' (No such file or directory) + <braunr> trying with a working path this time + <braunr> better + <pinotree> sorry, i typed it by hand :p + <braunr> Segmentation fault + <braunr> but no assertion + <nlightnfotis> braunr: gccgo hello.go + <braunr> nlightnfotis: ? + <pinotree> <braunr> nlightnfotis: what about the specific gcc invocation + that fails ? + <braunr> nlightnfotis: i'm asking if -lpthread is present when you have + these undefined reference errors + <nlightnfotis> it is. it seems so + <nlightnfotis> I wrote above that it is present when libtool is called + <nlightnfotis> I don't know what libtool is doing sadly + <braunr> you said some lines + <nlightnfotis> but I from what I've seen I believe it does some kind of + linking + <braunr> paste it somewhere please + <nlightnfotis> yeah it doesn't fail though + <braunr> that's far too vague ... + <braunr> it doesn't fail ? + <nlightnfotis> give me a second + <braunr> i thought it did + <nlightnfotis> no it doesn't + <braunr> 14:53 < nlightnfotis> gccgo compiles without errors (which is + strange) but when I use it to compile goroutine.go it fails with the + errors I reported + <nlightnfotis> yeah gccgo compiles. + <nlightnfotis> when I use the compiler, it fails + <braunr> so it fails running + <braunr> is gccgo built with -lpthread itself ? + <nlightnfotis> http://pastebin.com/1TkFrDcG + <nlightnfotis> check it out + <nlightnfotis> I think it does, but I would take an extra opinion + <nlightnfotis> line 782 + <nlightnfotis> and 784 + <braunr> (are you building as root ?) + <nlightnfotis> yes. for now + <pinotree> baaad :p + <nlightnfotis> I never had any particular problems...except that one time + that I rm -rf the source tree :P + <nlightnfotis> I know it's bad d/w + <nlightnfotis> braunr: I found something interesting (I don't know if it's + expected or not; probably not): If I set GOMAXPROCS to 2, and run the + goroutine program, it seems to be running for a while (with the + goroutines!) and then it segfaults. Will look more into it + <braunr> it's interesting, yes + <braunr> nlightnfotis: have you tried the preload trick too ? + <nlightnfotis> ldpreload? no. Could you tell me how to do it? export + LDPRELOAD and a path to libpthread? + <braunr> nlightnfotis: LD_PRELOAD=/lib/i386-gnu/libpthread.so.0.3 ... + <nlightnfotis> braunr: it also produces a very different backtrace. This + one heavily involves mig functions + <tschwinge> braunr, nlightnfotis: Thanks for working together, and sorry + for my lack of time. + <braunr> nlightnfotis: paste please + <nlightnfotis> tschwinge, Hello. It's ok, I am sorry for not showing good + amounts of progress from my part. + <nlightnfotis> braunr: http://pastebin.com/J4q2NN9p + <braunr> nlightnfotis: thread apply all bt full please + <nlightnfotis> braunr: http://pastebin.com/tbRkNzjw + <braunr> looks like an infinite loop of + __mach_port_mod_refs/__mig_dealloc_reply_port + <braunr> ... + <nlightnfotis> yes that's what I got from it too. Keep in mind these + results are with GOMAXPROCS=2 and they result in segmentation fault + <nlightnfotis> and I also can not understand the corrupted stack at the + beginning of the backtrace + <braunr> no please + <nlightnfotis> ? + <braunr> test LD_PRELOAD=/lib/i386-gnu/libpthread.so.0.3 without + GOMAXPROCS=2 + <nlightnfotis> braunr: LD_PRELOAD without GOMAXPROCS results in the usual + assertion failure and abortion of execution after it + <braunr> nlightnfotis: ok + <braunr> nlightnfotis: im sorry, i thought you couldn't launch a test since + you added mach_print + <nlightnfotis> I am not using mach_print, I couldn't fix the issue with the + references and thought I was losing time, so I went back to debugging + with gdb until I can't get anything more out of it + <nlightnfotis> braunr: should I focuse on mach_print? Will it produce very + different results than gdb? + <nlightnfotis> *focus + <nlightnfotis> (btw I didn't delete mach print or anything, it's still + there, in another branch) + <nlightnfotis> braunr: Now I stepped through the program in gdb, and got + something really really weird. Some close to a full execution + <nlightnfotis> Number of gorountines and machine threads according to + runtime was 3, __pthread_num_threads was 4 + <nlightnfotis> it did get SIGILL (illegal instruction some times though) + <nlightnfotis> and it exited with code 02 + <braunr> uh + <braunr> nlightnfotis: try with mach_print yes, it will show the values + from the real execution context, and be as close as what we can get + <braunr> i'm not sure about how gdb finds the values + <nlightnfotis> braunr: ok, will spend the rest of the day to find a way to + make mach_print and the other values work. Did you see my last messages, + with the goroutines that worked under gdb? + <braunr> yes + <nlightnfotis> it seemed to run. Didn't get the expected output, but also + didn't get any errors other than illegal instruction either + <nlightnfotis> braunr: I still have not found an easy way to do what you + asked me to from go's runtime. Would it be ok if I do it from inside + libpthread? + <braunr> nlightnfotis: do what ? + <nlightnfotis> print the values of __pthread_total and + __pthread_num_threads with mach_print. + <braunr> how ? + <braunr> oh wait + <braunr> well yes ofc, they're not exported :/ + <braunr> nlightnfotis: have you been able to use mach_print ? + <nlightnfotis> braunr: not really because of the problems I shared + earlier. I can try to use with in-gcc structures if you want me to, it's + nothing hard to do + <nlightnfotis> actually I will. Hang on + <braunr> proceed with debugging inside libpthread instead + <braunr> using mach_print to avoid deadlocks this time + <braunr> (mach_print was purposely built for debugging such low level code + parts) + <nlightnfotis> ok, I will patch this, but can I build it tomorrow? + <braunr> yes + <braunr> just keep us informed + <nlightnfotis> ok, thanks, and sorry for everything I have done. I want you + to know that I really appreciate that you are helping me. + <braunr> remember: the goal here is to understand why __pthread_total and + __pthread_num_threads have inconsistent values + <nlightnfotis> braunr: whenever you see it, mach_print works as expected + inside gcc. + + +# IRC, freenode, #hurd, 2013-09-03 + + <nlightnfotis> braunr: I have made the changes I want to glibc. After I + build it, how do I install it? make install or is it more involved? + <braunr> nlightnfotis: use LD_LIBRARY_PATH + <braunr> never install an experimental glibc unless you have backups or are + certain of what you're doing + <braunr> nlightnfotis: i didn't understand what you meant about mach_print + yesterday + <nlightnfotis> it works in gcc. + <braunr> what do you mean "in gcc" ? + <braunr> why would you put mach_print in gcc ? + <braunr> we want it in go programs .. + <nlightnfotis> yes, I understand it. gcc was the fastest way to test it's + usage at that moment (for me) and I just wanted to confirm it works. I + only had to change its signature to const char * because gcc wouldn't + accept it otherwise + <braunr> doesn't my example include const ? + <braunr> nlightnfotis: why did you rebuild glibc ? + <nlightnfotis> braunr: I have not started yet, will do now, to apply the + changes to libpthread + <braunr> you mean add the print calls there ? + <nlightnfotis> yes + <braunr> ok + <braunr> use debian/rules build, interrupt when you see gcc invocations + <braunr> then switch to the build directory (hurd-libc-i386 iirc), and make + others + <braunr> nlightnfotis: did you send me the instructions to build and test + your work ? + <braunr> so i can reproduce these weird threading problems at my side + <nlightnfotis> braunr: sorry, I was in the toilet, where would you like me + to send the instructions? + <braunr> nlightnfotis: i should be fine i guess, let's check here + <braunr> nlightnfotis: i simply used configure + --enable-languages=c,c++,go,lto + <braunr> and i'll see how it goes + <nlightnfotis> I configure with --enable-languages=go (it automatically + builds c and c++ for that as go depends on them), --disable-bootstrap, + and use a custom prefix to install at a custom location + <braunr> yes + <braunr> ok + <braunr> nlightnfotis: how long does it take you ? + <nlightnfotis> complete non-bootstrap build about 45 minutes. With a build + tree ready and only simple changes, about 2-3 minutes + <nlightnfotis> braunr: In an hour I will go offline for 2-3 hours, I am + gonna move back to my other home in the other city. It won't take long, + the whole process will be about 4 hours, and I will compensate for the + time lost by staying up late up until 3 o clock in the morning + <braunr> i'd prefer you didn't "compensate" + <nlightnfotis> ? + <braunr> work if you want to + <braunr> noone if forcing you to work late at night for gsoc, unless you + want to + <nlightnfotis> no, I do it because I want to. I **really** really want to + succeed, and time is off the essence for me at this point + <braunr> then ok + <braunr> nlok i have a gccgo compiler + <pinotree> nlok? + <braunr> nl being nlightnfotis but he's gone + <pinotree> oh + * pinotree was trying to parse that as "now" or "look" or the like + <nlightnfotis> braunr: 08:19:56< braunr> use debian/rules build, interrupt + when you see gcc invocations: Are gcc invocations related to + i486-gnu-gcc-4.7? + <nlightnfotis> nvm I'm good now :) + <gnu_srs> of course not, that's only for compiling applications using the + newly built libc + <nlightnfotis> gnu_srs: I didn't exactly understand what you said? Care to + elaborate? which one is for compiling applications using the newly build + libc? -486-gnu-gcc-4.7? + <gnu_srs> when you see gcc ... -llibc.so you know libc.so is built, and + that is sufficient to use it. + <gnu_srs> with LD_PRELOAD or LD_LIBRARY_PATH (after cding and building + others) + <nlightnfotis> gnu_srs: thanks for the tip :) + <gnu_srs> :-D + <nlightnfotis> is anyone else getting glibc build problems? (from apt-get + source glibc, at cxa-finalize.c)? + <gnu_srs> apt-get source eglibc; apt-get build-dep eglibc (as root); + dpkg-buildpackage -b ... + <braunr> nlightnfotis: just debian/rules build + <braunr> to start the glibc build + <nlightnfotis> braunr: oh I have now, it's building without issues so far + <braunr> when you see gcc processes, it means the build process has + switched from configuring to making + <braunr> then interrupt (ctrl-c) + <braunr> cd build-tree/hurd-i386-libc + <braunr> make others + <braunr> or make lib others + <braunr> lib is glibc, others is some addons which include our libpthread + <nlightnfotis> thanks for the tip braunr. + <nlightnfotis> braunr: I have managed to get a working version of glibc and + libpthread with mach_print working. I have also run 2 test programs and + it works as expected. Will continue researching tomorrow if that's ok + with you, I am too tired to keep on now. + <nlightnfotis> for the record compilation of glibc right from the start was + about 1 hour and 20 - 30 minutes + + +# IRC, freenode, #hurd, 2013-09-04 + + <braunr> i've taken a deeper look at this assertion failure + <braunr> and ... + <braunr> it has nothing to do with pthread_create + <braunr> i assumed it was the one in sysdeps/mach/pt-thread-start.c + <nlightnfotis> pthread_self ()? + <braunr> but it's actually from sysdeps/mach/hurd/pt-sysdep.h, in + _pthread_self() + <braunr> and looking there : + <braunr> thread = *(struct __pthread **)__hurd_threadvar_location + (_HURD_THREADVAR_THREAD); + <braunr> so simply put, context switching doesn't fix up thread specific + data ... + <braunr> it's that simple + <nlightnfotis> wow + <nlightnfotis> today I was running programs all day long with mach_print on + to print __pthread_total and __pthread_num_threads to see when both + become 1 and couldn't find anything + <nlightnfotis> I was nearly desperate. You just made my day! :) + <braunr> now the problem is + <braunr> thread specific data is highly dependent on the stack + <braunr> it's illegal to make a thread switch stack and expect it to keep + working on the hurd + <nlightnfotis> unless split stack is activated? + <nlightnfotis> no wait + <braunr> split stack is completely unsupported on the hurd + <teythoon> uh, why would that be? + <braunr> teythoon: about split stack ? + <teythoon> yes + <braunr> i'm not sure + <nlightnfotis> at least now we do know what the problem is and I can start + working on a solution. + <nlightnfotis> braunr: we should tell tschwinge and youpi about it. + <braunr> nlightnfotis: sure but + <braunr> nlightnfotis: you can also start looking at a workaround + <braunr> nlightnfotis: also, let's makre sure that's the reason first + <braunr> nlightnfotis: use mach_print to display the stack pointer when + switching + <braunr> nlightnfotis: + http://stackoverflow.com/questions/1880262/go-forcing-goroutines-into-the-same-thread + <braunr> " I believe runtime.LockOSThread() is necessary if you are + creating a library binding from C code which uses thread-local storage" + <braunr> oh, a paper about the go runtime scheduler + <braunr> let's have a look .. + <teythoon> braunr: have you seen the high level overview presented in that + blog post I once posted here? + <braunr> no + <nlightnfotis> braunr, just came back, and read the log. Which paper are + you reading? The one from columbia university? + <braunr> but i need to know about details here, specifically, if threads do + change stack + <braunr> nlightnfotis: yes + <teythoon> braunr: ok + <braunr> this could be caused either by true stack switching, or by "stack + segmentation" as implemented by go + <braunr> it is interesting that there are stack related members per + goroutine + <braunr> nlightnfotis: in particular, pthread_attr_setstacksize() doesn't + work on the hurd + <nlightnfotis> <braunr> it is interesting that there are stack related + members per goroutine -> I think that's go's policy. All goroutines run + on a shared address space (that is the kernel thread's address space) + <braunr> nlightnfotis: that's obvious + <braunr> and not the problem + <braunr> and yes, it's "stack segmentation" + <braunr> and on linux, and probably other archs, switching stack may be + perfectly legit + <braunr> on the hurd, we still have threadvars + <braunr> which are the hurd specific thread local storage mechanism + <braunr> it means 1/ all stacks in a process must have the same size + <braunr> 2/ stack size must be a power of two + <braunr> 3/ threads can't switch stack + <braunr> this hardly prevents goroutines from being run by just any thread + <braunr> i see there already hard hurd specific changes about stack + handling + <nlightnfotis> so we should only make changes to the specific gccgo + scheduler as a workaround under the Hurd right? + <braunr> i don't know + <braunr> this might also push the switch to tls + <nlightnfotis> this sounds better as a long term fix + <nlightnfotis> but it must also involve a great amount of work, right? + <braunr> most of it has already been done + <braunr> by youpi and tschwinge + <nlightnfotis> with the changes to tls early in the summer? + <braunr> maybe + <braunr> 14:36 < braunr> nlightnfotis: also, let's makre sure that's the + reason first + <braunr> 14:36 < braunr> nlightnfotis: use mach_print to display the stack + pointer when switching + <braunr> check what goes wrong with the stack + <braunr> then we'll see + <braunr> as a very simple workaround, i expect locking g's on m's to be a + good first step + <nlightnfotis> braunr: noted everything. that's my work for tonight. I + expect myself to stay up late like yesterday and have this all figured + out by tomorrow. + <braunr> nlightnfotis: why not now ? + <nlightnfotis> I am starting from now, but I expect myself to stop about 6 + o clock here (2 hours) because I have an appointment with a doctor. + <nlightnfotis> and keep on when I come back home + <braunr> well adding a few printfs to track the stack should be doable + before 2 hours + <nlightnfotis> braunr: I am doing it now. Will report as soon as I have + results :) + <nlightnfotis> braunr: have I messed up with the way I read esp's value? + https://github.com/NlightNFotis/glibc/commit/fdab1f5d45a43db5c5c288c4579b3d8251ee0f64#L1R67 + <braunr> nlightnfotis: +unsigned + <braunr> nlightnfotis: using gdb : + <braunr> (gdb) info registers + <braunr> esp 0x203ff7c0 0x203ff7c0 + <braunr> (gdb) print thread->stackaddr + <braunr> $2 = (void *) 0x2000000 + <nlightnfotis> oh yes, I know about gdb, I thought you wanted me to use + mach_print + <braunr> nlightnfotis: yes + <braunr> this is just my own attempt + <braunr> and it does show the stack pointer is completely outside the + thread stack + <braunr> nlightnfotis: in your code, i suggest using + __builtin_frame_address() + <braunr> well __builtin_frame_address(0) + <braunr> see + http://gcc.gnu.org/onlinedocs/gcc-4.7.3/gcc/Return-Address.html#Return-Address + <braunr> it's not exactly the stack pointer but close enough, unless of + course the stack is changed in the middle of the function + <nlightnfotis> I see. I am gonna try one more time with esp the way I + worked it and if it fails to work, I am gonna use return address + <braunr> nlightnfotis: be very careful about signed/unsigned and type + widths + <braunr> not return address, frame address + <braunr> return address is code, frame address is data (stack) + <nlightnfotis> ah, I see, thanks for the correction. + <braunr> youpi: not sure you catched it earlier, the problem fotis has been + having with goroutines is about threadvars + <braunr> simply put, threads use setcontext functions to save/restore + goroutines state, which make them switch stack, rendering the location of + threadvars invalid, and making _pthread_self() choke + + +# IRC, freenode, #hurd, 2013-09-05 + + <nlightnfotis> I am having very weird behavior with my code, something that + I can not explain and seems likely to be a bug, could someone else take a + look? + <nlightnfotis> pinotree are you available at the moment to take a look at + something? + <pinotree> nlightnfotis: dont ask to ask, just ask + <nlightnfotis> I have made some modifications to pthread_self as also + suggested by braunr to see if the stack pointer is within the bounds of + the frame address after context switching. I can get the values of both + esp and frame_address to be shown before the context switch, but I can + only get the value of esp to be shown after the context switch, and it + always results to the program getting killed + <nlightnfotis> + https://github.com/NlightNFotis/glibc/blob/7e72da09a42b1518865f6f4882d68689e681f25b/libpthread/sysdeps/mach/hurd/pt-sysdep.h#L97 + <nlightnfotis> thing is a dummy print value I have right after the code + that was supposed to print the frame_address after the context switching + is executing without any issues. + <pinotree> oh assembler... cannot help, sorry :/ + <nlightnfotis> oh no, I am not asking for assembler help, that part works + quite alright. I am asking why from the 4 identical pieces of code that + print debugging values the last one doesn't work. I am on it all day, and + still have not found an answer + <braunr> nlightnfotis: i can + <nlightnfotis> hello braunr, + <braunr> nlightnfotis: do you have a backtrace ? + <braunr> uh + <nlightnfotis> nope, it crashes right after I execute something. Let me + compile glibc once again and see if a fix I attempted works + <braunr> malloc and free use locks + <braunr> so they probably use _pthread_self + <braunr> don't use them + <braunr> for debugging, a simple statically allocated buffer on the stack + will do + <braunr> nlightnfotis: so ? + <nlightnfotis> Ι got past my original problem, but now I am trying to get + past the sigkills that kill the program at the beginning + <nlightnfotis> i remember not having this problem, so I am compiling my + master branch to see if it is reproducible. If it is, it means something + is very wrong. If it's not, it means I screwed up somewhere + <braunr> i don't understand, how do you know if you get past the problem if + you still have trouble reaching that code ? + <nlightnfotis> braunr: I fixed all my problems now. I can see that both esp + and the frame_address are the same after context switching though? + <braunr> always ? + <braunr> for all goroutines ? + <nlightnfotis> for all kernel threads, not go routines. We are in + libpthread + <braunr> if they're the same after a context switch, it usually means the + scheduler didn't switch + <braunr> well obviously + <braunr> but what i asked you was to trace calls to setcontext functions + <nlightnfotis> I will run some tests again. May I show you my code to see + if there is anything wrong with it? + <braunr> what address do you have ? + <braunr> not yet + <braunr> i'm not sure you understand what i want to check + <braunr> do you see how threadvars work basically ? + <nlightnfotis> I think so yes, they keep in the stack the local variables + of a thread right? + <nlightnfotis> and the globals + <nlightnfotis> or + <nlightnfotis> wait a minute... + <braunr> yes but do you see how the thread specific data are fetched ? + <nlightnfotis> with __hurd_threadvar_location_from_sp? + <braunr> yes but "basically", what does it do ? + <nlightnfotis> it get's a stack pointer as a parameter, and returns the + location of that specific data based on that stack pointer, right? + <braunr> and how ? + <nlightnfotis> I believe it must compare the base value of the stack and + the value of the end of the stack, and if the results are consistent, it + returns a pointer to the data? + <braunr> and how does it determine the start and end of the stack ? + <nlightnfotis> stack_pointer must be pointing at the base of the + stack. That + stack_size must be the stack limit I guess. + <braunr> so you're saying the caller of __hurd_threadvar_location_from_sp + knows the stack base ? + <nlightnfotis> I am not so sure I understand this question. + <braunr> i want to know if you understand how threadvars work + <braunr> apparently you don't + <braunr> the caller only has its current stack pointer + <braunr> which does *not* point to the stack base + <braunr> threadvars work by assuming a *fixed* stack size, power of two, + aligned (obviously) + <braunr> in our case, 2MiB (except in hurd servers where a kludge reduces + that to 64k) + <braunr> this is why stack size can't be changed + <braunr> this is also why the stack pointer can't ever point outside the + initial stack + <braunr> i want you to make sure go violates this last assumption + <braunr> so 1/ show the initial stack boundaries of your threads, then show + that, after loading a goroutine, the stack pointer is outside + <braunr> which is what, if i'm right, triggers the assertion + <braunr> ask if there is anything confusing + <braunr> this is important, it should already have been done + <nlightnfotis> ok, I noted it all, I am starting to work on it right now. I + only have one question. My results, the ones with the stack pointer and + the frame address, are expected or unexpected? + <braunr> i don't know + <braunr> show me the code again please + <braunr> and explain your intent + <nlightnfotis> + https://github.com/NlightNFotis/glibc/blob/7fe202317db4c3947f8ae1d1a4e52f7f0642e9ed/libpthread/sysdeps/mach/hurd/pt-sysdep.h + <nlightnfotis> At first I print the value of esp and the frame_address + before the context switching and after the context switching. + <nlightnfotis> The different variables were introduced as part of a test to + see if my results were consistent, + <braunr> what context switch ? + <nlightnfotis> in hurd_threadvar_location + <braunr> what makes you think this is a context switch ? + <nlightnfotis> in threadvar.h, it calls __hurd_threadvar_location_from_sp. + <nlightnfotis> the full path for it is glibc/hurd/hurd/threadvar.h + <braunr> i don't see how giving me the path will explain why it's a context + switch + <braunr> and i can tell you right away it's not + <braunr> hurd_threadvar_location is basically a lookup returning the + address of the thread specific data + <nlightnfotis> wait a minute...does this mean that + hurd_threadvar_location_from_sp is also a lookup function for the same + reason + <nlightnfotis> ? + <braunr> yes + <braunr> isn't the name meaningful enough ? + <braunr> "location of the threadvars from stack pointer" + <nlightnfotis> I guess I made wrong deductions from when you originally + shared your findings... + <nlightnfotis> <braunr> thread = *(struct __pthread + **)__hurd_threadvar_location (_HURD_THREADVAR_THREAD); + <nlightnfotis> <braunr> so simply put, context switching doesn't fix up + thread specific data ... + <nlightnfotis> I thought that hurd_threadvar_location was doing the context + switching + <braunr> nlightnfotis: by context switching, i mean setcontext functions + <nlightnfotis> braunr: You mean the one in sysdeps/mach/hurd/i386? + <braunr> yes + <braunr> but + <braunr> do you understand what i want you to check now ? + <nlightnfotis> I think I got this time: Let me explain it: + <nlightnfotis> You suggested that stack sizes are fixed. That is the main + reason that the stack pointer should not be able to point outside of it. + <braunr> no + <braunr> locating threadvars is done by applying a mask, computed from the + stack size, on the stack pointer, to determine its base + <nlightnfotis> yeah, what __hurd_threadvar_location_from_sp is doing + <braunr> if size is a power of two, size - 1 is a mask that, if + complemented, aligns the address + <braunr> yes + <braunr> so, threadvars expect the stack pointer to always point to the + initial stack + <nlightnfotis> and we wanna prove that go violates this rule right? That + the stack pointer is not pointing at the initial stack + <braunr> yes diff --git a/community/gsoc/project_ideas/download_backends.mdwn b/community/gsoc/project_ideas/download_backends.mdwn index f794e814..c0bdc5b2 100644 --- a/community/gsoc/project_ideas/download_backends.mdwn +++ b/community/gsoc/project_ideas/download_backends.mdwn @@ -1,12 +1,12 @@ -[[!meta copyright="Copyright © 2009 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] [[!meta title="Use Internet Protocol Translators (ftpfs etc.) as Backends for Other Programs"]] @@ -19,8 +19,9 @@ Download protocols like FTP, HTTP, BitTorrent etc. are very good candidates for this kind of modularization: a program could simply use the download functionality by accessing FTP, HTTP etc. translators. -There is already an ftpfs translator in the Hurd tree, as well as an [httpfs -translator on hurdextras](http://www.nongnu.org/hurdextras/#httpfs); however, +There is already an [[hurd/translator/ftpfs]] translator in the Hurd tree, as +well as an [[hurd/translator/httpfs]] on +[hurdextras](http://www.nongnu.org/hurdextras/); however, these are only suitable for very simple use cases: they just provide the actual file contents downloaded from the URL, but no additional status information that are necessary for interactive use. (Progress indication, error codes, HTTP diff --git a/community/gsoc/project_ideas/mtab/discussion.mdwn b/community/gsoc/project_ideas/mtab/discussion.mdwn index 0e322c11..716fb492 100644 --- a/community/gsoc/project_ideas/mtab/discussion.mdwn +++ b/community/gsoc/project_ideas/mtab/discussion.mdwn @@ -106,7 +106,7 @@ License|/fdl]]."]]"""]] # IRC, freenode, #hurd, 2013-06-25 -In context of [[microkernel/mach/mig/documentation/structured_data]]. +In context of [[open_issues/mig_portable_rpc_declarations]]. <teythoon> should I go for an iterator like interface instead? <teythoon> btw, what's the expected roundtrip time? @@ -905,3 +905,1168 @@ In context of [[microkernel/mach/mig/documentation/structured_data]]. <teythoon> ah, i think so <braunr> then you don't need to do it again <teythoon> right, I overlooked that + + +## IRC, freenode, #hurd, 2013-07-12 + + <teythoon> recursively traversing all translators from / turns out to be + more dangerous than I expected + <teythoon> ... if done by a translator bound somewhere below /... + <teythoon> my interpretation is that the mtab translator tries to talk to + itself and deadlocks + <teythoon> (and as a side effect the whole system kinda just stops...) + + +## IRC, freenode, #hurd, 2013-07-15 + + <youpi> teythoon: did you discuss with braunr about returning port vs path + in fsys_get_children? + <teythoon> youpi: we did + <teythoon> as I wrote I looked at the getcwd source you pointed me at + <teythoon> and I started to code up something similar + <teythoon> but as far as I can see there's no way to tell from a port + referencing a file the directory this file is located in + <youpi> ah, right, there was a [0] mail + <youpi> teythoon: because it doesn't have a "..", right + <teythoon> about Neals concerns, he's right about not covering passive + translators very well + <teythoon> but the solution he proposed was similar to what I tried to do + first + <youpi> I don't like half-covering passive translators at all, to be honest + :) + <youpi> either covering them completely, or not at all, would be fine + <teythoon> and then braunr convinced me that the "recursive" approach is + more elegant and hurdish, and I came to agree with him + <teythoon> youpi: one could scan the filesystem at translator startup and + populate the list + <youpi> by "Neal's solution", you mean an mtab registry? + <teythoon> yes + <braunr> so, let's see what linux does when renaming parent directories + <teythoon> mount points you mean? + <youpi> teythoon: browsing the whole filesystem just to find passive + translators is costly + <youpi> teythoon, braunr: and that won't prevent the user from unexpectedly + starting other translators at will + <braunr> scary + <teythoon> youpi: but that requires the privilege to open the device + <youpi> the fact that a passive translator is set is nothing more than a + user having the intent of starting a translator + <braunr> linux retains the original path in the mount table + <youpi> heh + <teythoon> youpi: any unprivileged user can trigger a translator startup + <youpi> sure, but root can do that too + <youpi> and expect the system to behave nicely + <teythoon> but if I'm root and want to fsck something, I won't start + translators accessing the device just before that + <teythoon> but if there's a passive translator targetting the device, + someone else might do that + <youpi> root does not always completely control what he's doing + <youpi> linux for instance does prevent from mounting a filesystem being + checked + <teythoon> but still, including passive translators in the list would at + least prevent anyone starting an translator by accident, isn't that worth + doing then? + <youpi> if there's a way to prevent root too, that's better than having a + half-support for something which we don't necessarily really want + <youpi> (i.e. an exclusive lock on the underlying device) + <teythoon> right, that would also do the trick + <teythoon> btw, some programs or scripts seem to hardcode /proc/mounts and + procfs and I cannot bind a translator to /proc/mounts since it is + read-only and the node does not exist + <kilobug> IMHO automatically starting translators is a generic feature, and + passive translator is just a specific instance of it; but we could very + well have, like an "autofs" that automatically start translators in tar + archives and iso images, allowing to cd into any tar/iso on the system; + implementing such things is part of the Hurd flexibility, the "core + system" shouldn't be too aware on how translators are started + <youpi> so in the end, storing where the active translator was started + first seems okayish according to what linux has been exposing for decades + <youpi> kilobug: indeed + <teythoon> it could serve a mounts with a passive translator by default, or + a link to /run/mtab, or an simple file so we could bind a translator to + that node + <youpi> I'd tend to think that /proc/mounts should be a passive translator + and /run/mtab / /etc/mtab a symlink to it + <youpi> not being to choose the translator is a concern however + <teythoon> ok, I'll look into that + <youpi> it could be an empty file, and people be able to set a translator + on it + <teythoon> if it had a passive translator, people still could bind their + own translator to it later on, right? + <teythoon> afaics the issue currently is mostly, that there is no mounts + node and it is not possible to create one + <youpi> right + <teythoon> cool + <youpi> so with the actual path, you can even check for caller's permission + to read the path + <youpi> i.e. not provide any more information than the user would be able + to get from browsing by hand + <teythoon> sure, that concern of Neil's is easy to address + <youpi> I'm not so much concerned by stale paths being shown in mtab + <youpi> the worst that can happen is a user not being able to umount the + path + <youpi> but he can settrans -g it + <youpi> (which he can't on linux ;) ) + <teythoon> yes, and the device information is still valid + <youpi> yes + <braunr> despite the parent dir being renamed, linux is still able to + umount the new path + <teythoon> and so is our current umount + <braunr> good + <teythoon> (if one uses the mount point as argument) + <braunr> what's the current plan concerning /proc/mounts ? + <teythoon> serving a node with a passive translator record + <braunr> ? + <teythoon> so that /hurd/mtab / is started on access + <braunr> i mean, still planning on using the recursive approach instead of + a registry ? + <teythoon> ah + <teythoon> I do not feel confident enough to decide this, but I agree with + you, it feels elegant + <teythoon> and it works :) + <teythoon> modulo the translator deadlocking if it talks to itself, any + thoughts on that? + <youpi> it is a non-threaded translator I guess? + <teythoon> currently yes + <youpi> making it threaded should fix the issue + <teythoon> I tried to make the mtab translator multithreaded but that + didn't help + <youpi> that's odd + <teythoon> maybe I did it wrong + <braunr> i don't find it surprising + <braunr> well, not that surprising :p + <braunr> on what lock does it block ? + <teythoon> as far as i can see the only difference of hello and hellot-mt + is that it uses a different dispatcher and has lot's of locking, right? + <teythoon> braunr: I'm not sure, partly because that wrecked havoc on the + whole system + <teythoon> it just freezes + <teythoon> but it wasn't permanent. once i let it running and it recovered + <braunr> consider using a subhurd + <teythoon> ah right, I ment to set up one anyway, but my first attempts + were not successful, not sure why + <teythoon> anyway, is there a way to prevent this in the first place? + <teythoon> if one could compare ports that'd be helpful + <youpi> Mmm, did you try to simply compare the number? + <teythoon> with the bootstrap port I presume? + <youpi> Mmm, no, the send port and the receive port would be different + <youpi> no, with the receive port + <teythoon> ah + <braunr> comparing the numbers should work + <braunr> youpi: no they should be the same + <youpi> braunr: ah, then it should work yes + <braunr> that's why there are user ref counts + <youpi> ok + <braunr> only send-once rights have their own names + <teythoon> btw, I'll push my work to darnassus from now on, + e.g. http://darnassus.sceen.net/gitweb/?p=teythoon/hurd.git;a=shortlog;h=refs/heads/feature-mtab-translator-v3-wip + + +## [[open_issues/libnetfs_passive_translators]] + + +## IRC, freenode, #hurd, 2013-07-16 + + <teythoon> which port is the receive port of a translator? I mean, how is + it called in the source, there is no port in sight named receive anywhere + I looked. + <braunr> teythoon: what is the "receive port of a translator" ? + <teythoon> braunr: we talked yesterday about preventing the mtab deadlock + by comparing ports + <teythoon> I asked which one to use for the comparison, youpi said the + receive port + <braunr> i'm not sure what he meant + <braunr> it could be the receive port used for the RPC + <braunr> but i don't think it's exported past mig stub code + <teythoon> weird, I just reread it. I asked if i should use the bootstrap + port, and he said receive port, but it might have been addressed to you? + <teythoon> you were talking about send and receive ports being singletons + or not + <teythoon> umm + <braunr> no i answered him + <braunr> he was wondering if the receive port could actually be used for + comparison + <braunr> i said it can + <braunr> but still, i'm not sure what port + <braunr> if it's urgent, send him a mail + <teythoon> no, my pipeline is full of stuff I can do instead ;) + <braunr> :) + + +## IRC, freenode, #hurd, 2013-07-17 + + <teythoon> braunr: btw, comparing ports solved the deadlock in the mtab + translator rather easily + <braunr> :) + <braunr> which port then ? + <teythoon> currently I'm stuck though, I'm not sure how to address Neals + concern wrt to access permission checks + <teythoon> I believe it's called control port + <braunr> ok + <teythoon> the one one gets from doing the handshake with the parent + <braunr> i thought it was the bootstrap port + <braunr> but i don't know the details so i may be wrong + <braunr> anyway + <teythoon> yes + <braunr> what is the permission problem again ? + <teythoon> 871u73j4zp.wl%neal@walfield.org + <braunr> well, you could perform a lookup on the stored path + <braunr> as if opening the node + <teythoon> if I look at any server implementation of a procedure from + fs.defs (say libtrivfs/file-chmod.c [bad example though, that looks wrong + to me]), there is permission checking being done + <teythoon> any server implementation of a procedure from fsys.defs lacks + permission checks, so I guess it's being done somewhere else + <braunr> i must say i'm a bit lost in this discussion + <braunr> i don't know :/ + <braunr> can *you* sum up the permission problem please ? + <braunr> i mean here, now, in just a few words ? + <teythoon> ok, so I'm extending the fsys api with the get_children + procedure + <teythoon> that one should not return any children x/y if the user doing + the request has no read permissions on x + <braunr> really ? + <braunr> why so ? + <teythoon> the same way ls x would not reveal the existence of y + <braunr> i could also say unlike cat /proc/mounts + <braunr> i can see why we would want that + <braunr> i also can see why we could let this behaviour in place + <braunr> let's admit we do want it + <teythoon> true, but I thought this could easily be addressed + <braunr> what you could do is + <teythoon> now I'm not sure b/c I cannot even find the permission checking + code for any fsys_* function + <braunr> for each element in the list of child translators + <braunr> perform a lookup on the stored path on behalf of the user + <braunr> and add to the returned list if permission checks pass + <braunr> teythoon: note that i said lookup on the path, which is an fs + interface + <braunr> i assume there is no permission checking for the fsys interface + because it's done at the file (fs) level + <teythoon> i think so too, yes + <teythoon> sure, if I only knew who made the request in the first place + <teythoon> the file-* options have a convenient credential handle passed in + as first parameter + <teythoon> s/options/procedures/ + <teythoon> surely the fsys-* procedures also have a means of retrieving + that information, I just don't know how + <braunr> mig magic + <braunr> teythoon: see file_t in hurd_types.defs + <braunr> there is the macro FILE_INTRAN which is defined in subdirectories + (or not) + <teythoon> ah, retrieving the control port requires permissions, and the + fsys-* operations then operate on the control port? + <braunr> see libdiskfs/fsmutations.h for example + <braunr> uh yes but that's for < braunr> i assume there is no permission + checking for the fsys interface because it's done at the file (fs) level + <braunr> i'm answering < teythoon> sure, if I only knew who made the + request in the first place + <braunr> teythoon: do we understand each other or is there still something + fuzzy ? + <teythoon> braunr: thanks for the pointers, I'll read up on that a bit + later + <braunr> teythoon: ok + + +## IRC, freenode, #hurd, 2013-07-18 + + <teythoon> braunr: back to the permission checking problem for the + fsys_get_children interface + <teythoon> I can see how this could be easily implemented in the mtab + translator, it asks the translator for the list of children and then + checks if the user has permission to read the parent dir + <teythoon> but that is pointless, it has to be implemented in the + fsys_get_children server function + <braunr> yes + <braunr> why is it pointless ? + <teythoon> because one could circumvent the restriction by doing the + fsys_get_children call w/o the mtab translator + <braunr> uh no + <braunr> you got it wrong + <braunr> what i suggested is that fsys_get_children does it before + returning a list + <braunr> the problem is that the mtab translator has a different identity + from the users accessing it + <teythoon> yes, but I cannot see how to do this, b/c at this point I do not + have the user credentials + <braunr> get them + <teythoon> how? + <braunr> 16:14 < braunr> mig magic + <braunr> 16:15 < braunr> teythoon: see file_t in hurd_types.defs + <braunr> 16:16 < braunr> there is the macro FILE_INTRAN which is defined in + subdirectories (or not) + <braunr> 16:16 < braunr> see libdiskfs/fsmutations.h for example + <teythoon> i saw that + <braunr> is there a problem i don't see then ? + <braunr> i suppose you should define FSYS_INTRAN rather + <braunr> but the idea is the same + <teythoon> won't that change all the function signatures of the fsys-* + family? + <braunr> that's probably the only reason not to implement this feature + right now + <teythoon> then again, that change is probably easy and mechanic in nature, + might be an excuse to play around with coccinelle + <braunr> why not + <braunr> if you have the time + <teythoon> right, if this can be done, the mtab translator (if run as root) + could get credentials matching the users credentials to make that + request, right? + <braunr> i suppose + <braunr> i'm not sure it's easy to make servers do requests on behalf of + users on the hurd + <braunr> which makes me wonder if the mtab functionality shouldn't be + implemented in glibc eheheh .... + <braunr> but probably not + <teythoon> well, I'll try out the mig magic thing and see how painful it is + to fix everything ;) + <braunr> good luck + <braunr> honestly, i'm starting to think it's deviating too much from your + initial goal + <braunr> i'd be fine with a linux-like /proc/mounts + <braunr> with a TODO concerning permissions + <teythoon> ok, fine with me :) + <braunr> confirm it with the other mentors please + <braunr> we have to agree quickly on this + <teythoon> y? + + <teythoon> braunr: I actually believe that the permission issue can be + addressed cleanly and unobstrusively + <teythoon> braunr: would you still be opposed to the get_children approach + if that is solved? + <teythoon> the filesystem is a tree and the translators "creating" that + tree are a more coarse version of that tree + <teythoon> having a method to traverse that tree seems natural to me + <braunr> teythoon: it is natural + <braunr> i'm just worried it's a bit too complicated, unnecessary, and + out-of-scope for the problem at hand + <braunr> (which is /proc/mounts, not to forget it) + + +## IRC, freenode, #hurd, 2013-07-19 + + <teythoon> braunr: I think you could be a bit more optimistic and + supportive of the decentralized approach + <teythoon> I know the dark side has cookies and strong language and it's + mighty tempting + <teythoon> but both are bad for you :p + + +## IRC, freenode, #hurd, 2013-07-22 + + <youpi> teythoon: AIUI, you should be able to run the mtab translator as + no-user (i.e. no uid) + <teythoon> youpi: yes, that works fine + + <youpi> teythoon: so there is actually no need to define FSYS_INTRAN, doing + it by hand as you did is fine, right? + <youpi> (/me backlogs mails...) + <teythoon> youpi: yes, the main challenge was to figure out what mig does + and how the cpp is involved + <youpi> heh :) + <teythoon> my patch does exactly the same, but only for this one server + function + <teythoon> youpi: I'm confused by your mail, why are read permissions on + all path components necessary? + <braunr> teythoon: only execution normally + <youpi> teythoon: to avoid letting a user discover a translator running on + a hidden directory + <teythoon> braunr: exactly, and that is tested + <youpi> e.g. ~/home/foo is o+x, but o-r + <youpi> and I have a translator running on ~/home/foo/aZeRtYuyU + <youpi> I don't want that to show up on /proc/mounts + <braunr> youpi: i don't understand either: why isn't execution permission + enough ? + <teythoon> youpi: but that requires testing for read on the *last* + component of the *dirname* of your translator, and that is tested + <youpi> let me take another example :) + <youpi> e.g. ~/home/foo/aZeRtYuyU is o+x, but o-r + <youpi> and I have a translator running on ~/home/foo/aZeRtYuyU/foo + <youpi> ergl sorry, I meant this actually: + <teythoon> yes, that won't show up then in the mtab for users that are not + you and not root + <youpi> e.g. ~/home/foo is o+x, but o-r + <youpi> and I have a translator running on ~/home/foo/aZeRtYuyU/foo + <teythoon> ah + <teythoon> hmm, good point + <braunr> ? + * braunr still confused + <teythoon> well, qwfpgjlu is the secret + <teythoon> and that is revealed by the fsys_get_children procedure + <braunr> then i didn't understand the description of the call right + <braunr> > + /* check_access performs the same permission check as is + normally + <braunr> > + done, i.e. it checks that all but the last path components + are + <braunr> > + executable by the requesting user and that the last + component is + <braunr> > + readable. */ + <teythoon> braunr: youpi argues that this is not enough in this case + <braunr> from that, it looks ok to me + <youpi> the function and the documentation agree, yes + <youpi> but that's not what we want + <braunr> and that's where i fail to understand + <youpi> again, see my example + <braunr> i am + <braunr> 10:43 < youpi> e.g. ~/home/foo is o+x, but o-r + <braunr> ok + <youpi> so the user is not supposed to find out the secret + <braunr> then your example isn't enough to describe what's wron + <braunr> g + <youpi> checking read permission only on ~/home/foo/aZeRtYuyU will not + garantee that + <braunr> ah + <braunr> i thought foo was the last component + <youpi> no, that's why I changed my example + <braunr> hum + <braunr> 10:43 < youpi> e.g. ~/home/foo is o+x, but o-r + <braunr> 10:43 < youpi> and I have a translator running on + ~/home/foo/aZeRtYuyU/foo + <braunr> i meant, the last foo + <teythoon> still, this is easily fixed + <youpi> sure + <youpi> just has to be :) + <teythoon> youpi, braunr: so do you think that this approach will work? + <youpi> I believe so + <braunr> i still don't see the problem, so don't ask me :) + <braunr> i've been sick all week end and hardly slept, which might explain + <braunr> in the example, "all but the last path components" is + "~/home/foo/aZeRtYuyU" + <braunr> right ? + <youpi> braunr: well, I haven't looked at the details + <youpi> but be it the last, or but-last doesn't change the issue + <youpi> if my ~/hidden is o-r,o+x + <youpi> and I have a translator on ~/hidden/a/b/c/d/e + <youpi> checking only +x on hidden is not ok + <braunr> but won't the call also check a b c d ? + <youpi> yes, but that's not what matters + <youpi> what matters is that hidden is o-r + <braunr> hm + <youpi> so the mtab translator is not supposed to reveal that there is an + "a" in there + <braunr> ok i'm starting to understand + <braunr> so r must be checked on all components too + <youpi> yes + <braunr> right + <youpi> to simulate the user doing ls, cd, ls, cd, etc. + <braunr> well, not cd + <braunr> ah + <youpi> for being able to do ls, you have to be able to do cd + <braunr> as an ordered list of commands + <braunr> ok + <teythoon> agreed. can you think of any more issues? + <braunr> so both x and r must be checked + <youpi> so in the end this RPC is really a shortcut for a find + fsysopts + script + <youpi> teythoon: I don't see any + <braunr> teythoon: i couldn't take a clear look at the patch but + <braunr> do you perform a lookup on all nodes ? + <teythoon> yes, all nodes on the path from the root to the one specified by + the mount point entry in the active translator list + <braunr> let me rephrase + <braunr> do you at some point do a lookup, similar to a find, on all nodes + of a translator ? + <teythoon> no + <braunr> good + <teythoon> yes + <braunr> iirc, neal raised that concern once + <teythoon> and I'll also fix settrans --recursive not to iterate over *all* + nodes either + <braunr> great + <braunr> :) + <teythoon> fsys_set_options with do_children=1 currently does that (I've + only looked at the diskfs version) + + +## IRC, freenode, #hurd, 2013-07-27 + + <teythoon> youpi: ah, I just found msg_get_init_port, that should make the + translator detection feasible + + +## IRC, freenode, #hurd, 2013-07-31 + + <teythoon> braunr: can I discover the sender of an rpc message? + <braunr> teythoon: no + <braunr> teythoon: what do you mean by "sender" ? + <teythoon> braunr: well, I'm trying to do permission checks in the + S_proc_mark_essential server function + <braunr> ok so, the sending user + <braunr> that should be doable + <teythoon> I've got a struct proc *p courtesy of a mig intran mutation and + a port lookup + <teythoon> but that is not necessarily the sender, right? + <braunr> proc is really the server i know the least :/ + <braunr> there is permission checking for signals + <braunr> it does work + <braunr> you should look there + <teythoon> yes, there are permission checks there + <teythoon> but the only argument the rpc has is a mach_port_t refering to + an object in the proc server + <braunr> yes + <teythoon> anyone can obtain such a handle for any process, no? + <braunr> can you tell where it is exactly please ? + <braunr> i don't think so, no + <teythoon> what? + <braunr> 14:42 < teythoon> but the only argument the rpc has is a + mach_port_t refering to an object in the proc server + <teythoon> ah + <braunr> the code you're referring to + <braunr> a common way to give privileges to public objects is to provide + different types of rights + <braunr> a public (usually read-only) right + <braunr> and a privileged one, like host_priv which you may have seen + <braunr> acting on (modifying) a remote object normally requires the latter + <teythoon> http://paste.debian.net/20795/ + <braunr> i thought you were referring to existing code + <teythoon> well, there is existing code doing permission checks the same + way I'm doing it there + <braunr> where is it please ? + <braunr> mgt.c ? + <teythoon> proc/mgt.c (S_proc_setowner) for example + <teythoon> yes + <braunr> that's different + <teythoon> but anyone can obtain such a reference by doing proc_pid2proc + <braunr> the sender is explicitely giving the new uid + <braunr> yes but not anyone is already an owner of the target process + <braunr> (although it may look like anyone has the right to clear the owner + oO) + <teythoon> see, that's what made me worry, it is not checked who's the + sender of the message + <teythoon> unless i'm missing something here + <teythoon> ah + <teythoon> I am + <teythoon> pid2proc returns EPERM if one is not the owner of the process in + question + <teythoon> all is well + <braunr> ok + <braunr> it still requires the caller process though + <teythoon> what? + <braunr> see check_owner + <braunr> the only occurrence i find in the hurd is in libps/procstat.c + <braunr> MGET(PSTAT_PROCESS, PSTAT_PID, proc_pid2proc (server, ps->pid, + &ps->process)); + <braunr> server being the proc server AIUI + <teythoon> yes, most likely + <braunr> but pid2proc describes this first argument to be the caller + process + <teythoon> ah but it is + <braunr> ? + <teythoon> mig magic :p + <teythoon> MIGSFLAGS="-DPROCESS_INTRAN=pstruct_t reqport_find (process_t)" + \ + <teythoon> MIGSFLAGS="-DPROCESS_INTRAN=pstruct_t reqport_find (process_t)" + \ + <braunr> ah nice + <braunr> hum no + <braunr> this just looks up the proc object from a port name, which is + obvious + <braunr> what i mean is + <braunr> 14:53 < braunr> MGET(PSTAT_PROCESS, PSTAT_PID, proc_pid2proc + (server, ps->pid, &ps->process)); + <braunr> this is done in libps + <braunr> which can be used by any process + <braunr> server is the proc server for this process (it defines the process + namespace) + <teythoon> yes, but isn't the port to the proc server different for each + process? + <braunr> no, the port is the same (the name changes only) + <braunr> ports are global non-first class objects + <teythoon> and the proc server can thus tell with the lookup which process + it is talking to? + <braunr> that's the thing + <braunr> from pid2proc : + <braunr> S_proc_pid2proc (struct proc *callerp + <braunr> [...] + <braunr> if (! check_owner (callerp, p)) + <braunr> check_owner (struct proc *proc1, struct proc *proc2) + <braunr> "Returns true if PROC1 has `owner' privileges over PROC2 (and can + thus get its task port &c)." + <braunr> callerp looks like it should be the caller process + <braunr> but in libps, it seems to be the proc server + <braunr> this looks strange to me + <teythoon> yep, to me too, hence my confusion + <braunr> could be a bug that allows anyone to perform pid2proc + <teythoon> braunr: well, proc_pid2proc (getproc (), 1, ...) fails with + EPERM as expected for me + <braunr> ofc it does with getproc() + <braunr> but what forces a process to pass itself as the first argument ? + <teythoon> braunr: nothing, but what else would it pass there? + <braunr> 14:53 < braunr> MGET(PSTAT_PROCESS, PSTAT_PID, proc_pid2proc + (server, ps->pid, &ps->process)); + <braunr> everyone knows the proc server + <braunr> ok now, that's weird + <braunr> teythoon: does getproc() return the proc server ? + <teythoon> I think so, yes + <teythoon> damn those distributed systems, all of their sources are so + distributed too + <braunr> i suspect there is another layer of dark glue in the way + <teythoon> I cannot even find getproc :/ + <braunr> hurdports.c:GETSET (process_t, proc, PROC) + <braunr> that's the dark glue :p + <teythoon> ah, so it must be true that the ports to the proc server are + indeed process specific, right? + <braunr> ? + <teythoon> well, it is not one port to the proc server that everyone knows + <braunr> it is + <braunr> what makes you think it's not ? + <teythoon> proc_pid2proc (getproc (), 1, ...) fails with EPERM for anyone + not being root, but succeeds for root + <braunr> hm right + <teythoon> if getproc () were to return the same port, the proc server + couldn't distinguish these + <braunr> indeed + <braunr> in which case getproc() actually returns the caller's process + object at its proc server + <teythoon> yes, that is better worded + <braunr> teythoon: i'm not sure it's true actually :/ + <teythoon> braunr: well, exploit or it didn't happen + <braunr> teythoon: getproc() apparently returns a bootstrap port + <braunr> we must find the code that sets this port + <braunr> i have a hard time doing that :/ + <pinotree> isn't part of the stuff which is passed to a new process by + exec? + <teythoon> braunr: I know that feeling + <braunr> pinotree: probably + <braunr> still hard to find .. + <pinotree> search in glibc + <teythoon> braunr: exec/exec.c:1654 asks the proc server for the proc + object to use for the new process + <teythoon> so how much of hurd do I have to rebuild once i changed struct + procinfo in hurd_types.h? + <teythoon> oh noez, glibc uses it too :/ + + +## IRC, freenode, #hurd, 2013-08-01 + + <teythoon> I need some pointers on building the libc, specifically how to + point libcs build system to my modified hurd headers + <teythoon> nlightnfotis: hi + <teythoon> nlightnfotis: you rebuild the libc right? do you have any hurd + specific pointers for doing so? + <nlightnfotis> teythoon, I have not yet rebuild the libc (I was planning + to, but I followed other courses of action) Thomas had pointed me to some + resources on the Hurd website. I can look them up for you + <nlightnfotis> teythoon, here are the instructions + http://darnassus.sceen.net/~hurd-web/open_issues/glibc/debian/ + <nlightnfotis> and the eglibc snapshot is here + http://snapshot.debian.org/package/eglibc/ + <teythoon> nlightnfotis: yeah, I found those. the thing is I changed a + struct in the hurd_types.h header, so now I want to rebuild the libc with + that header + <teythoon> and I cannot figure out how to point libcs build system to my + hurd headers + <teythoon> :/ + <nlightnfotis> can you patch eglibc and build that one instead? + <pochu> teythoon: put your header in the appropriate /usr/include/ dir + <teythoon> pochu: is there no other way? + <pinotree> iirc nope + <pochu> teythoon: you may be able to pass some flag to configure, but I + don't know if that will work in this specific case + <teythoon> ouch >,< that explains why I haven't found one + <pochu> check ./configure --help, it's usually FOO_CFLAGS (so something + like HURD_CFLAGS maybe) + <pochu> but then you may need _LIBS as well depending on how you changed + the header... so in the end it's just easier to put the header in + /usr/include/ + <braunr> teythoon: did you find the info for your libc build ? + <teythoon> braunr: well, i firmlinked my hurd_types.h into /usr/include/... + <braunr> ew + <braunr> i recommend building debian packages + <teythoon> but the build was not successful, looks unrelated to my changes + though + <teythoon> I tried that last week and the process took more than eight + hours and did not finish + <braunr> use darnassus + <braunr> it takes about 6 hours on it + <teythoon> I shall try again and skip the unused variants + <braunr> i also suggest you use ./debian/rules build + <braunr> and then interrupt the build process one you see it's building + object files + <braunr> go to the hurd-libc-i386 build dir, and use make lib others + <braunr> make lib builds libc, others is for companion libraries lik + libpthread + <braunr> actually building libc takes less than an hour + <braunr> so once you validate your build this way, you know building the + whole debian package will succedd + <braunr> succeed* + <teythoon> so how do I get the build system to pick up my hurd_types.h? + <braunr> sorry if this is obvious to you, you might be more familiar with + debian than i am :) + <braunr> patch the hurd package + <braunr> append your own version string like +teythoon.hurd.1 + <braunr> install it + <braunr> then build libc + <braunr> i'll reboot darnassus so you have a fresh and fast build env + <braunr> almost a month of uptime without any major issue :) + <teythoon> err, but I cannot install my hurd package on darnassus, can I? I + don't think that'd be wise even if it were possible + <braunr> teythoon: rebooted, enjoy + <braunr> why not ? + <braunr> i often do it for my own developments + <braunr> teythoon: screen is normally available + <braunr> teythoon: be aware that fakeroot-tcp is known to hang when pfinet + is out of ports (that's a bug) + <braunr> it takes more time to reach that bug since a patch that got in + less than a year ago, but it still happens + <braunr> the hurd packages are quick to build, and they should only provide + the new header, right ? + <braunr> you can include the functionality too in the packages if you're + confident enough + <teythoon> but my latest work on the killing of essential processes issues + involves patching hurd_types.h and that in a way that breaks the ABI, + hence the need to rebuild the libc (afaiui) + <braunr> teythoon: yes, this isn't uncommon + <teythoon> braunr: this is much more intrusive than anything I've done so + far, so I'm not so confident in my changes for now + <braunr> teythoon: show me the patch please + <teythoon> braunr: it's not split up yet, so kind of messy: + http://paste.debian.net/21403/ + <braunr> teythoon: did you make sure to add RPCs at the end of defs files ? + <teythoon> yes, I got burned by this one on my very first attempt, you + pointed out that mistake + <braunr> :) + <braunr> ok + <braunr> you're changing struct procinfo + <braunr> this really breaks the abi + <teythoon> yes + <braunr> i.e. you can't do that + <teythoon> I cannot put it at the end b/c of that variable length array + <braunr> you probably should add another interface + <teythoon> that'd be easier, sure, but this will slow down procfs even + more, no? + <braunr> that's secondary + <braunr> it won't be easier, breaking the abi may break updates + <braunr> in which case it's impossible + <braunr> another way would be to ues a new procinfo struct + <braunr> like struct procinfo2 + <braunr> but then you need a transition step so that all users switch to + that new version + <braunr> which is the best way to deal with these issues imo, but this time + not the easiest :) + <teythoon> ok, so I'll introduce another rpc and make sure that one is + extensible + <braunr> hum no + <braunr> this usually involves using a version anyway + <teythoon> no? but it is likely that we need to save more addresses of this + kind in the future + <braunr> in which case it will be hanlded as an independant problem with a + true solution such as the one i mentioned + <teythoon> it could return an array of vm_address_ts with a length + indicating how many items were returned + <braunr> it's ugly + <braunr> the code is already confusing enough + <braunr> keep names around for clarity + <teythoon> ok, point taken + <braunr> really, don't mind additional RPCs when first adding new features + <braunr> once the interface is stable, a new and improved version becomes a + new development of its own + <braunr> you're invited to work on that after gsoc :) + <braunr> but during gsoc, it just seems like an unnecessary burden + <teythoon> ok cool, I really like that way of extending Hurd, it's really + easy + <teythoon> and feels so natural + <braunr> i share your concern about performances, and had a similar problem + when adding page cache information to gnumach + <braunr> in the end, i'll have to rework that again + <braunr> because i tried to extend it beyond what i needed + <teythoon> true, I see how that could happen easily + <braunr> the real problem is mig + <braunr> mig limits subsystems to 100 calls + <braunr> it's clearly not enough + <braunr> in x15, i intend to use 16 bits for subsystems and 16 bits for + RPCs, which should be plenty + <teythoon> that limit seems rather artificial, it's not a power of two + <braunr> yes it is + <teythoon> so let's fix it + <braunr> mach had many artificial static limits + <braunr> eh :D + <braunr> not easy + <braunr> replies are encoded by taking the request ID and adding 100 + <teythoon> uh + <braunr> "uh" indeed + <teythoon> so we need an intermediate version of mig that accepts both + id+100 and dunno id+2^x as replies for id + <teythoon> or -id - 1 + <braunr> that would completely break the abi + <teythoon> braunr: how so? the change would be in the *_server functions + and be compatible with the old id scheme + <braunr> how do you make sure id+2^x doesn't conflict with another id ? + <teythoon> oh, the id is added to the subsystem id? + <teythoon> to obtain a global message id? + <braunr> yes + <teythoon> ah, I see + <teythoon> ah, but the hurd subsystems are 1000 ids apart + <teythoon> so id+100 or id +500 would work + <braunr> we need to make sure it's true + <braunr> always true + <teythoon> so how many bits do we have for the message id in mach? + <teythoon> (mig?) + <braunr> mach shouldn't care, it's entirely a mig thing + <braunr> well yes and no + <braunr> mach defines the message header, which includes the message id + <braunr> see mach/message.h + <braunr> mach_msg_id_t msgh_id; + <braunr> typedef integer_t mach_msg_id_t; + <teythoon> well, if that is like a 32 bit integer, then allow -id-1 as + reply and forbid ids > 2^x / 2 + <braunr> yes + <braunr> seems reasonable + <teythoon> that'd give us an smooth upgrade path, no? + <braunr> i think so + + +## IRC, freenode, #hurd, 2013-08-28 + + <youpi> teythoon: Mmm, your patch series does not make e.g. ext2fs provide + a diskfs_get_source, does it? + + +## IRC, freenode, #hurd, 2013-08-29 + + <teythoon> youpi: that is correct + <youpi> teythoon: Mmm, I must be missing something then: as such the patch + series introduces an RPC, but only EOPNOTSUPP is ever returned in all + cases for now? + <youpi> ah + <youpi> /* Guess based on the last argument. */ + <youpi> since ext2fs & such report their options with store last, it seems + ok indeed + <youpi> it still seems a bit lame not to return that information in + get_source + <teythoon> yes + <teythoon> well, if it had been just for me, I would not have created that + rpc, but only guessing was frowned uppon iirc + <teythoon> then again, maybe this should be used and then the mtab + translator could skip any translators that do not provide this + information to filter out non-"filesystem" translators + <youpi> guessing is usually trap-prone, yes + <youpi> if it is to be used by mtab, then maybe it should be documented as + being used by mtab + <youpi> otherwise symlink would set a source, for instance + <youpi> while we don't really want it here + <teythoon> why would the symlink translator answer to such requests? it is + not a filesystem-like translator + <youpi> no, but the name & documentation of the RPC doesn't tell it's only + for filesystem-like translators + <youpi> well, the documentation does say "filesystem" + <youpi> but it does not clearly specify that one shouldn't implement + get_source if one is not a filesystme + <youpi> "If the concept of a source is applicable" works for a symlink + <youpi> that could be the same for eth-filter, etc. + <teythoon> right + <youpi> Mmm, that said it's fsys.defs + <youpi> not io.defs + <youpi> teythoon: it is the fact that we get EOPNOTSUPP (i.e. fsys + interface supported, just not that call), and not MIG_BAD_ID (i.e. fsys + interface not supported), that filters out symlink & such, right? + <teythoon> that's what I was thinking, but that's based on my + interpretation of EOPNOPSUPP of course ;) + <youpi> teythoon: I believe that for whatever is a bit questionable, even + if you put yourself on the side that people will probably agree on, the + discussion will still take place so we make sure it's the right side :) + <youpi> (re: start/end_code) + <teythoon> I'm not sure I follow + <teythoon> youpi: /proc/pid/stat seems to be used a lot: + http://codesearch.debian.net/search?q=%22%2Fproc%2F.*%2Fstat%22 + <teythoon> that does not mean that start/endcode is used, but still it + seems like a good thing to mimic Linux closely + <youpi> stat is used a lot for cpu usage for instance, yes + <youpi> start/endcode, I really wonder who is using it + <youpi> using it for kernel thread detection looks weird to me :) + <youpi> (questionable): I mean that even if you take the time to put + yourself on the side that people will probably agree on, the discussion + will happen + <youpi> it has to happen so people know they agree on it + <youpi> I've seen that a lot in various projects (not only CS-related) + <teythoon> ok, I think I got it + <teythoon> it's to document the reasons for (not) doing something? + <youpi> something like this, yes + <youpi> even if you look right, people will try to poke holes + <youpi> just to make sure :) + <teythoon> btw, I think it's rather unusual that our storeio experiments + would produce such different results + <teythoon> you're right about the block device, no idea why I got a + character file there + <teythoon> I used settrans -ca /tmp/hello.unzipped /hurd/storeio -T + gunzip:file /tmp/hello + <teythoon> also I tried stacking the translator on /tmp/hello directly, + from what I've gathered that should be possible, but I failed + <teythoon> ftr I use the exec server with all my patches, so the unzipping + code has been removed from it + <youpi> ah, I probably still have it + <youpi> it shouldn't matter here, though + <teythoon> I agree + <youpi> how would you stack it? + <youpi> I've never had a look at that + <youpi> I'm not sure attaching the translator to the node is done before or + after the translator has a change to open its target + <teythoon> right + <teythoon> but it could be done, if storeio used the reference to the + underlying node, no? + <youpi> yes + <youpi> btw, you had said at some point that you had issues with running + remap. Was the issue what you fixed with your patches? + * youpi realizes that he should have shown the remap.c source code during + his presentation + <teythoon> well, I tried to remap /servers/exec (iirc) and that failed + <teythoon> then again, I recently played with remap and all seemed fine + <teythoon> but I'm sure it has nothing to do with my patches + <youpi> ok + <teythoon> those I came up with investigating fakeroot-hurd + <teythoon> and I saw that this also aplies to remap.sh + <teythoon> *while + <youpi> yep, they're basically the same + <teythoon> btw, I somehow feel settrans is being abused for chroot and + friends, there is no translator setting involved + <youpi> chroot, the command? or the settrans option? + <youpi> I don't understand what you are pointing at + <teythoon> the settrans option being used by fakeroot, remap and (most + likely) our chroot + <youpi> our chroot is just a file_reparent call + <youpi> fakeroot and remap do start a translator + <teythoon> yes, but it is not being bound to a node, which is (how I + understand it) what settrans does + <teythoon> the point being that if settrans is being invoked with --chroot, + it does something completely different (see the big if (chroot) {...} + blocks) + <teythoon> to a point that it might be better of in a separate command + <youpi> Mmm, indeed, a lot of the options don't make sense for chroot + + +## IRC, freenode, #hurd, 2013-09-06 + + <braunr> teythoon: do you personally prefer /proc being able to implement + /proc/self on its own, or using the magic server to tell clients to + resolve those specific cases themselves ? + <pinotree> imho solving the "who's the sender of an rpc" could solve both + the SCM_CREDS implementation and the self case in procfs + +[[open_issues/SENDMSG_SCM_CREDS]], +[[hurd/translator/procfs/jkoenig/discussion]], *`/proc/self`*. + + <braunr> pinotree: yes + <braunr> but that would require servers impersonating users to some extent + <braunr> and this seems against the hurd philosophy + <pinotree> and there was also the fact that you could create a + fake/different port when sending an rpc + <braunr> to fake what ? + <pinotree> the sender identiy + <pinotree> *identity + <braunr> what ? + <braunr> you mean intermediate servers can do that + <teythoon> braunr: I don't know if I understand all the implications of + your question, but the magic server is the only hurd server that actually + implements fsys_forward (afaics), so why not use that? + <braunr> teythoon: my question was rather about the principle + <braunr> do people find it acceptable to entrust a server with their + authority or not + <braunr> on the hurd, it's clearly wrong + <braunr> but then it means you need special cases everywhere, usually + handled by glibc + <braunr> and that's something i find wrong too + <braunr> it restricts extensibility + <braunr> the user can always change its libc at runtime, but in practice, + it's harder to perform than simply doing it in the server + <teythoon> braunr: then I think I didn't get the question at all + <braunr> teythoon: it's kind of the same issue that you had with the mtab + translator + <braunr> about showing or not some entries the user normally doesn't have + access to + <braunr> this problem occurs when there is more than one server on the + execution path and the servers beyond the first one need credentials to + reply something meaningful + <braunr> the /proc/self case is a perfect one + <braunr> (conceptually, it's client -> procfs -> symlink) + <braunr> 1/ procfs tells the client it needs to handle this specially, + which is what the hurd does with magic + <braunr> 2/ procfs assumes the identity of the client and the symlink + translator can act as expected because of that + <braunr> teythoon: what way do you find better ? + <teythoon> braunr: by "procfs assumes the identity" you mean procfs + impersonating the user? + <braunr> yes + <teythoon> braunr: tbh I still do not see how this can be implemented at + all b/c the /proc/self symlink is not about identity (which can be + derived from the peropen struct initially created by fsys_getroot) but + the pid of the callee (which afaics is nowhere to be found) + <teythoon> s/callee/caller/ + <teythoon> the one doing the rpc + <braunr> impersonating the user isn't only about identity + <braunr> actually, it's impersonating the client + <teythoon> yes, client is the term >,< + <braunr> so basically, asking proc about the properties of the process + being impersonated + <teythoon> proc o_O + <braunr> it's not hard, it's just a big turn in the way the system would + function + <braunr> teythoon: ? + <teythoon> you lost me somewhere + <braunr> the client is the process + <braunr> not the user + <teythoon> in order to implement /proc/self properly, one has to get the + process id of the process doing the /proc/self lookup, right? + <braunr> yes + <braunr> actually, we would even slice it more and have the client be a + thread + <teythoon> so how do you get to that piece of information at all? + <braunr> the server inherits a special port designating the client, which + allows it to query proc about its properties, and assume it's identity in + servers such as auth + <braunr> its* + <teythoon> ah, but that kind of functionality isn't there at the moment, is + it? + <braunr> it's not, by design + <teythoon> right, hence my confusion + <braunr> instead, servers use the magic translator to send a "retry with + special handling" message to clients + <teythoon> right, so the procfs could bounce that back to the libc handler + that of course knows its pid + <braunr> yes + <teythoon> right, so now at last I got the whole question :) + <braunr> :) + <teythoon> ugh, I just found the FS_RETRY_MAGICAL handler in the libc :-/ + <braunr> ? + <braunr> why "ugh" ? + <teythoon> well, I'm inclined to think this is the bad kind of magic ;) + <braunr> do i need to look at the code to understand ? + <teythoon> ok, so I think option 1/ is easily implemented, option 2/ has + consequences that I cannot fully comprehend + <braunr> same for me + <teythoon> no, but you yourself said that you do not like that kind of + logic being implemented in the libc + <braunr> well + <braunr> easily + <braunr> i'm not so sure + <braunr> it's easy to code, but i assume checking for magic replies has its + cost + <teythoon> why not? the code is doing a big switch over the retryname + supplied by the server + <teythoon> we could stuff getpid() logic in there + <braunr> 14:50 < braunr> it's easy to code, but i assume checking for magic + replies has its cost + <teythoon> what kind of cost? computational cost? + <braunr> yes + <braunr> the big switch you mentioned + <braunr> run every time a client gets a reply + <braunr> (unless i'm mistaken) + <teythoon> a only for RETRY_MAGICAL replies + <braunr> but you need to test for it + <teythoon> switch (retryname[0]) + <teythoon> { + <teythoon> case '/': + <teythoon> ... + <teythoon> that should compile to a jump table, so the cost of adding + another case should be minimal, no? + <braunr> yes + <braunr> but + <braunr> it's even less than that + <braunr> the real cost is checking for RETRY_MAGICAL + <braunr> 14:55 < teythoon> a only for RETRY_MAGICAL replies + <braunr> so it's basically a if + <braunr> one if, right ? + <teythoon> no, it's switch'ing over doretry + <teythoon> you should pull up the code and see for yourself. it's in + hurd/lookup-retry.c + <braunr> ok + <braunr> well no, that's not what i'm looking for + <teythoon> it's not o_O + <braunr> i'm looking for what triggers the call to lookup_retry + <braunr> teythoon: hm ok, it's for lookups only, that's decent + <braunr> teythoon: 1/ has the least security implications + <teythoon> yes + <braunr> it could slightly be improved with e.g. a well defined interface + so a user could preload a library to extend it + <teythoon> extend the whole magic lookup thing? + <braunr> yes + <teythoon> but that is no immediate concern, you are trying to fix + /proc/self, right? + <braunr> no, i'm thinking about the big picture for x15/propel, keeping the + current design or doing something else + <teythoon> oh, okay + <braunr> solving /proc/self looks actually very easy + <teythoon> well, I'd say this depends a lot on your trust model then + <teythoon> do you consider servers trusted? + <teythoon> (btw, will there be mutual authentication of clients/servers in + propel?) + <braunr> there were very interesting discussions about that during the + l4hurd project + <braunr> iirc, shapiro insisted that using a server without trusting it + (and there were specific terminology about trusting/relying/etc..) is + nonsense + <braunr> teythoon: i haven't thought too much about that yet, for now it's + supposed to be similar to what the hurd does + <teythoon> hm, then again trust is not an on/off thing imho + <braunr> ? + <teythoon> trusting someone to impersonate yourself is a very high level of + trust + <teythoon> s/is/requires/ + <teythoon> the mobile code paper suggests that mutual authentication might + be a good thing, and I tend to agree + <braunr> i'll have to read that again + <braunr> teythoon: for now (well, when i have time to work on it again + .. :)) + <braunr> i'm focusing on the low level stuff, in a way that won't disturb + such high level features + <braunr> teythoon: have you found something related to a thread-specific + port in the proc server ? + <braunr> hurd/process.defs:297: /* You are not expected to understand + this. */ + <braunr> \o/ + <teythoon> braunr: no, why would I (the thread related question) + <teythoon> braunr: yes, that comment also cought my eye :/ + <braunr> teythoon: because you read a lot of the proc code lately + <braunr> so maybe your view of it is better detailed than mine + + +## IRC, freenode, #hurd, 2013-09-13 + + * youpi crosses fingers + <youpi> yay, still boots + <youpi> teythoon: I'm getting a few spurious entries in /proc/mounts + <youpi> none /servers/socket/26 /hurd/pfinet interface=/dev/eth0, etc. + <youpi> /dev/ttyp0 /dev/ttyp0 /hurd/term name,/dev/ptyp0,type,pty-master 0 + 0 + <youpi> /dev/sd1 /dev/cons ext2fs + writable,no-atime,no-inherit-dir-group,store-type=typed 0 0 + <youpi> fortunately mount drops most of them + <youpi> but not /dev/cons + <youpi> spurious entries in df are getting more and more common on linux + too anyway... + <youpi> ah, after a console restart, I don't have it any more + <youpi> I'm getting df: `/dev/cons': Operation not supported instead + + +## IRC, freenode, #hurd, 2013-09-16 + + <youpi> teythoon: e2fsck does not seem to be seeing that a given filesystem + is mounted + <youpi> /dev/sd0s1 on /boot type ext2 (rw,no-inherit-dir-group) + <youpi> and still # e2fsck -C 0 /dev/sd0s1 + <youpi> e2fsck 1.42.8 (20-Jun-2013) + <youpi> /dev/sd0s1 was not cleanly unmounted, check forced. + <youpi> (yes, both /etc/mtab and /run/mtab point to /proc/mounts) + <tschwinge> Yes, that is a "known" problem. + <youpi> tschwinge: no, it's supposed to be fixed by the mtab translator :) + <pinotree> youpi: glibc's paths.h points to /var/run/mtab (for us) + <tschwinge> youpi: Oh. But this is by means of mtab presence, and not by + proper locking? (Which is at least something, of course!) + <youpi> /var/run points to /run + <youpi> tschwinge: yes + <youpi> anyway, got to run + + +## IRC, freenode, #hurd, 2013-09-20 + + <braunr> teythoon: how come i see three mtab translators running ? + <braunr> 6 now oO + <braunr> looks like df -h spawns a few every time + <teythoon> yes, weird... + <braunr> accessing /proc/mounts does actually + <braunr> teythoon: more bug fixing for you :) + + +## IRC, freenode, #hurd, 2013-09-23 + + <teythoon> so it might be a problem with either libnetfs (which afaics has + never supported passive translator records before) or procfs, but tbh I + haven't investigated this yet diff --git a/community/gsoc/project_ideas/object_lookups.mdwn b/community/gsoc/project_ideas/object_lookups.mdwn index 5075f783..88ffc633 100644 --- a/community/gsoc/project_ideas/object_lookups.mdwn +++ b/community/gsoc/project_ideas/object_lookups.mdwn @@ -40,3 +40,32 @@ accurate measurements in a system that lacks modern profiling tools would also be helpful. Possible mentors: Richard Braun + + +# IRC, freenode, #hurd, 2013-09-18 + +In context of [[!message-id "20130918081345.GA13789@dalaran.sceen.net"]]. + + <teythoon> braunr: (wrt the gnumach HACK) funny, I was thinking about doind + the same for userspace servers, renaming ports to the address of the + associated object, saving the need for the hash table... + <braunr> teythoon: see + http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/object_lookups/ + <braunr> teythoon: my idea is to allow servers to set a label per port, + obtained at mesage recv time + <braunr> because, yes, looking up an object twice is ridiculous + <braunr> you normally still want port names to be close to 0 because it + allows some data structure optimizations + <teythoon> braunr: yes, I feared that ports should normally be smallish + integers and contigious at best + <teythoon> braunr: interesting that you say there that libihash suffers + from high collision rates + <teythoon> I've a theory to why that is, libihash doesn't do any hashing at + all + <pinotree> there are notes about that in the open_issues section of the + wiki + <teythoon> but I figured that this is probably ok for port names, as they + are small and contigious + <neal> braunr: That's called protected payload. + <neal> braunr: The idea is that the kernel appends data to the message in + flight. diff --git a/community/gsoc/project_ideas/sound/discussion.mdwn b/community/gsoc/project_ideas/sound/discussion.mdwn new file mode 100644 index 00000000..4a95eb62 --- /dev/null +++ b/community/gsoc/project_ideas/sound/discussion.mdwn @@ -0,0 +1,47 @@ +[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!taglink open_issue_documentation]]: update [[sound]] page. + + +# IRC, freenode, #hurd, 2013-09-01 + + <rekado> I'm new to the hurd but I'd love to learn enough to work on sound + support. + <rekado> + http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/sound/ + says drivers should be ported to GNU Mach as a first step. + <rekado> Is this information still current or should the existing Linux + driver be wrapped with DDE instead? + <auronandace> if i recall correctly dde is currently only being used for + network drivers. i'm not sure how much work would be involved for sound + or usb + + +## IRC, freenode, #hurd, 2013-09-02 + + <rekado> The sound support proposal + (http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/sound/) + recommends porting some other kernel's sound driver to GNU Mach. Is this + still current or should DDE be used instead? + <pinotree> rekado: dde or anything userspace-based is generally preferred + <braunr> rekado: both are about porting some other kernel's sound driver + <braunr> dde is preferred yes + <rekado> This email says that sound drivers are already partly working with + DDE: http://os.inf.tu-dresden.de/pipermail/l4-hackers/2009/004291.html + <rekado> So, should I just try to get some ALSA kernel parts to compile + with DDE? + <pinotree> well, what is missing is also the dde←→hurd glue + <braunr> rekado: there is also a problem with pci arbitration + <rekado> pinotree: I assumed DDEKit works with the hurd and we could use + any DDE/<other kernel> glue code with it + * rekado looks up pci arbitration + <pinotree> only for networking atm + <rekado> ah, I see. |