[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!toc]] # IRC, freenode, #hurd, 2013-06-23 braunr: sorry for the late reply. Honestly to say, the school works had taken most of my time these days. I haven't got any siginificant progress now. I am trying to write a little debugger demo on Hurd. braunr: things goes more hard than I think, these are some differences between ptrace() on Hurd and Linux. I am trying to solve this. # IRC, freenode, #hurd, 2013-06-24 this is my weekly report http://hacklu.com/blog/gsoc-weekly-report1-117/. and I have two main questions when I read the gdb source code. 1/What is the S_exception_raise_request()? 2/what is the role of ptrace in gdb port on Hurd? hacklu: where did you see S_exception_raise_request? in gdb/gnu-nat.c ah, in gdb yeah. and I have read the . is says the S_ start means server stub. yes what happens is that gnu_wait keeps calling mach_msg to get a message then it passes that message to the various stubs servers see just below, it calls exc_server, among others and that's exc_server which ends up calling S_exception_raise_request, if the message is an exception_raise request exc_server is a mere multiplexer, actually S_exception_raise_request is the implementation of the request part (so one half of a typical RPC) of the Mach exception interface. See gdb/exc_request.defs in GDB and include/mach/exc.defs in Mach. youpi: how gnu_wait pass one message to exc_server? in which function? in gnu_wait() && !exc_server (&msg.hdr, &reply.hdr) oh, I see this. firstly I think it is a type check simply. see the comment: "handle what we got" The Hurd's proc server also is involved in the exception passing protocol (see its source code). tschwinge: I will check the source code later. is the exception take place in this way: 1. the inferior call ptrace(TRACE_ME). 2.the gdb call task_set_exception_port. 3. mach send a notification to the exception port set before. 4. gdb take some action. hacklu: Yes, that's it, roughly. The idea is that GDB replaces a process' standard exception port, and replaces it "with itself", so that when the process that is debugged receives and exception (by Mach sending a exception_raise RPC), GDB can then catch that and act accordingly. hacklu: As for your other questions, about ptrace: As you can see in [glibc]/sysdeps/mach/hurd/ptrace.c, ptrace on Hurd is simply a wrapper around vm_read/write and more interfaces. hacklu: As the GDB port for Hurd is specific to Hurd by definition, you can also directly use these calls in GDB for Hurd. ..., as it is currently done. and in detail, the part 3 mach send a notification to the excetption port is like this: gnu_wait get the message in mach_msg, and then pass it to exc_serer by exc_server(),then exc_server call S_exception_raise_request()? ? tschwinge: yeah, I have see the ptrace.c. I was wonder about nobody use ptrace in Hurd except TRACEME... hacklu: Right about »and in detail, [...]«. hacklu: It would be very good (and required for your understanding anyway), if you could write up a list of things that happens when a process (both under the control of GDB as well as without GDB) is sent an exception (due to a breakpoint instruction, for example). Let me look something up. tschwinge: what's the function of exc_server? if I can get the notification in mach_msg(). to multiplex the message i.e. decoding it, etc. up to calling the S_ function with the proper parameters exc_server being automatically generated, that saves a lot of code That is generated by MIG from the gdb/exc_request.defs file. You'll find the generated file in the GDB build directory. I have wrote down the filenames. after this I will check that. hacklu: I suggest you also have a look at the Mach 3 Kernel Principles book, . This also has some explanation of the thread/task's exception mechanism. And of course, explains the RPC mechanism, which the exception mechanism is built upon. And then, really make a step-by-step list of what happens; this should help to better visualize what's going on. ok. later I will update this list on my blog. hacklu: I cannot tell off-hand why GDB on Hurd is using ptrace(PTRACE_TRACEME) instead of doing these calls manually. I will have to look that up, too. tschwinge: thanks. hacklu: Anyway -- you're asking sensible questions, so it seems you're making progress/are on track. :-) tschwinge: there is something harder than I had thought, I haven't got any meaningful progress. sorry for this. hacklu: That is fine, and was expected. :-) (Also, you're still busy with university.) I will show more time and enthusiasm on this. hacklu: Oh, and one thing that may be confusing: as you may have noticed, the names of the same RPC functions are sometimes slightly different if different *.defs files. What is important is the subsystem number, such as 2400 in [GDB]/gdb/exc_request.defs (and then incremented per each routine/simpleroutine/skip directive). hacklu: Just for completeness, [hurd]/hurd/subsystems has a list of RPC subsystems we're using. And the name given to routine 2400, for example, is just a "friendly name" that is then used locally in the code where the *.defs file has been processed by MIG. What a clumsy explanation of mine. But you'll get the idea, I think. ;-) hacklu: And don't worry about your progress -- you're making a lot of progress already (even if it doesn't look like it, because you're not writing code), but the time spent on understanding these complex issues (such as the RPC mechanism) definitely counts as progress, too. tschwinge: not clearly to got it as I am not sensitive with the MIG's grammer. But I know, the exc is the routine 2400's alias name? hacklu: I'd like to have you spend enough time to understand these fundamental concepts now, and then switch to "hacking mode" (write code) later, instead of now writing code but not understanding the concepts behind it. I have wrote a bit code to validate my understanding when I read the soruce code. But the code not run. http://pastebin.com/r3wC5hUp The subsystem directive [...]. As well, let me just point you to the documentation: , MIG - THE MACH INTERFACE GENERATOR, chapter 2.1 Subsystem identification. hacklu: Yes, writing such code for testing also is a good approach. I will have to look at that in more detail, too. * tschwinge guesses hacklu is probably laughing when seeing the years these documents were written in (1989, etc.). ;-) mach_msg make no sense in my code, and the process just hang. kill -9 can't stop is either. hacklu: do you understand why kill -KILL might not work now ? braunr: no, but I found I can use gdb to attach to that process, then quit in gdb, the process quit too. maybe that process was waiting a resume. something like that yes iirc it's related to a small design choice in the proc server something that put processes in an uninterruptible state when being debugged iirc ? if i recall cl=orrectly correctly* like D status in linux? or T there has been a lot of improvements regarding signal handling in linux over time so it's not really comparable now but that's the idea in ps, i see the process STAT is THumx did you see that every process on the hurd has at least two threads ? no, but I have see that in hurd, the exception handler can't live in the same context with the victim. so there must be at least two threads. I think hacklu: yes that thread also handles regular signals in addition to mach exceptions (there are two levels of multiplexing in servers, first locating the subsystem, then the server function) hacklu: if what i wrote is confusing, don't hesitate to ask for clarifications (i really don't intend to make things confusing) braunr: I don't know what you say about the "multiplexing in servers". For instance, is it means how to pass message from mach_msg to exc_server in gnu_wait()? hacklu: i said that the "message thread" handles both mach exceptions and unix signals hacklu: these are two different interfaces (and in mach terms, subsystems) hacklu: see hurd/msg.defs for the msg interface (which handles unix signals) hacklu: to handle multiple interfaces in the same thread, servers need to first find the right subsystem this is done by subsequently calling all demux functions until one returns true (finding the right server function is done by these demux functions) hacklu: see hurd/msgportdemux.c in glibc to see how it's done there it's short actually, i'll past it here : return (_S_exc_server (inp, outp) || _S_msg_server (inp, outp)); hacklu: did that help ? braunr: a bit more confusing. one "message thread" handles exceptions and signals, means the message thread need to recive message from two port. then pass the message to the right server which handle the message. the server also should pick the right subsystem from a lot of subsystems to handle the msg. is this ? the message thread is a server thread (which means every normal process is actually also a server, receiving exceptions and signals) there may be only two ports, or more, it doesn't matter much, the port set abstraction takes care of that so the message thread directly pass the msg to the right subsystem? not directly as you can see it tries them all until one is able to handle the incoming message i'm not sure it will help you with gdb, but it's important to understand for a better general understanding of the system ugly sentence ah, I see. like this in gnu-nat.c if(!notify_server(&msg.hdr, &reply.hdr) && !exc_server(&msg.hdr...) yes the thread just ask one by one. be careful about the wording the thread doesn't "send requests" it runs functions (one might be tempted to think there are other worker threads waiting for a "main thread" to handle demultiplexing messages) I got it. the notify_server function is just run in the same context in "message thread",and there is no RPC here. yes and the notify_server code is generater by mig automatically. yes # IRC, freenode, #hurd, 2013-06-29 [[!tag open_issue_documentation]] I just failed to build the demo on this. http://walfield.org/pub/people/neal/papers/hurd-misc/ipc-hello.c or, example in machsys.doc called simp_ipc.c we don't use cthreads anymore, but pthreads pinotree: em.. and I also failed to find the in example of that i don't know maybe the code in that book out-of-date hacklu: mig and mach ipc documentation is quite dated unfortunately, and so are many examples floating around the net btw, I have one more question. when I read . I find this state: When an exception occurs in a thread, the thread sends an exception message to its exception port, blocking in the kernel waiting for the receipt of a reply. It is assumed that some task is listening to this port, using the exc_serverfunction to decode the messages and then call the linked in catch_exception_raise. It is the job of catch_exception_raiseto handle the exception and decide the course of action for thread. that says, it assumed another task to recieve the msg send to one thread's exception port. why another task? I remmebered, there are at least two threads in one task, one is handle the exception stuffs. there are various reasons first is, the thread causing the exception is usually not waiting for a message next, it probably doesn't have all the info needed to resolve the exception (depending on the system design) and yes, the second thread in every hurd process is the msg thread, handling both mach exceptions and hurd signals but in this state, I can't find any thing with the so called msg thread ? if exist a task to do the work, why we need this thread? this thread is the "task" ? the msg thread is the thread handling exceptions for the other threads in one task wording is important here a task is a collection of resources so i'm only talking about threads really 14:11 < hacklu> assumed that some task is listening to this this is wrong a task can't listen only a thread can in you words, the two thread is in the same task? yes 14:32 < braunr> and yes, the second thread in every hurd process is the msg thread, handling both mach exceptions and hurd signals process == task here yeah, I always think the two thread stay in one task. but I found that state in . so I confuzed s/confuzed/confused statement you mean if two thread stay in the same task. and the main thread throw a exception, the other thread to handle it? depends on how it's configured the thread receiving the exceptions might not be in the same task at all on the hurd, only the second thread of a task receives exception s I just wonder how can the second thread catch the exception from its containning task forget about tasks tasks are resource containers they don't generate or catch exceptions only threads do for each thread, there is an exception port that is, one receive right, and potentially many send rights the kernel uses a send right to send exceptions the msg thread waits for messages on the receive right that's all ok. if I divide zero in main thread, the kernel will send a msg to the main thread's exception port. and then, the second thread(in the same task) is waiting on that port. so he get the msg. is it right? don't focus on main versus msg thread it applies to all other threads as well otherwise, you're right ok, just s/main/first no main *and* all others except msg main *and* all others except msg ? the msg thread gets exception messages for all other threads in its task (at least, that's how the hurd configures things) got it. if the msg thread throw exception either, who server for himself? i'm not sure but i guess it's simply forbidden i used gdb to attach a little progrom which just contains a divide zero. and I only found the msg thread is in the glibc. yes where is the msg thread located in. it's created by glibc is it glibc/hurd/catch-exc.c? that's the exception handling code, yes there are some differences between the code and the state in . state or statement ? staement which one ? http://pastebin.com/ZTBrUAsV When an exception occurs in a thread, the thread sends an exception message to its exception port, blocking in the kernel waiting for the receipt of a reply. It is assumed that some task is listening (most likely with mach_msg_server) to this port, using the exc_serverfunction to decode the messages and then call the linked in catch_exception_raise. It is the job of catch_exception_raiseto handle the exception and decide the course of action for thread. The state of the blocked thread can be examined with thread_get_state. what difference ? in the code, I can't find things like exc_server,mach_msg_server uh ok it's a little tangled but not that much you found the exception handling code, and now you're looking for what calls it simple see _hurdsig_fault_init from that statemnet I thought there are another _task_ do the exception things for all of the systems thread before you have told me the task means the msg thread. again 14:47 < braunr> forget about tasks 14:47 < braunr> tasks are resource containers 14:47 < braunr> they don't generate or catch exceptions 14:47 < braunr> only threads do yeah, I think that document need update. no it's a common misnomer once you're used to mach concepts, the statement is obvious braunr: so I need read more :) _hurdsig_fault_init send exceptions for the signal thread to the proc server? why come about _proc_ server? no it gives the proc server a send right for signals exceptions are a mach thing, signals are a hurd thing the important part is err = __thread_set_special_port (_hurd_msgport_thread, THREAD_EXCEPTION_PORT, sigexc); this one set the exception port? yes hm wait actually no, wrong part :) this sets the excpetion port for the msg thread (which i will call the signal thread as mentioned in glibc) but the comment above this line, Direct signal thread exceptions to the proc server means what? that the proc server handles exceptions on the signal thread the term signal thread equals the term msg thread? yes so, the proc server handles the exceptions throwed by the msg thread? looks that way feels a little strange. why ? this thread isn't supposed to cause exceptions if it does, something is deeply wrong, and something must clean that task up and the proc server seems to be the most appropriate place from where to do it why need a special server to just work the msg thread? I don't think that thread will throw exception frequentlly what does frequency have to do with anything here ? ok the appropriate code is _hurdsig_init the port for receiving exceptions is _hurd_msgport the body of the signal thread is _hurd_msgport_receive aha, in the _hurd_msgport_receive I have finally found the while(1) loop mach_msg_server(). so the code is conform with the documents. braunr: [21:18] what does frequency have to do with anything here ? yes, I have totally understood your words now. thank you very much. :) # IRC, freenode, #hurd, 2013-07-01 hi. this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report2-124/ welcome to any comment teythoon: I only get clear about the rpc stuff. seems a lot behind my plan good progress :) I have wrote the details of the exception handle which was asked by tschwing_ last week. Am I all right in my post? hacklu: as far as I understand signals, yes :) youpi: thanks for god, I am on the right way finally... :) the mig book says simpleroutine is the one use to implement asyn RPCs which doesn't expect an reply. But I have found a place to pass an reply port to the RPC interface which has been declared as simpleroutine hacklu: probably the simpleroutine hardcodes a reply port? hacklu: about _hurd_internal_post_signal, this is the hairiest part of GNU/Hurd, signal handling simply because it's the hairiest part of POSIX :) you probably want to just understand that it implements the POSIXity of signal delivering i.e. deliver/kill/suspend the process as appropriate I don't think you'll need to dive more aha. it will save a lot of time. it seems like the wait_for_inferior() in gdb. which also has too many lines and too many goto hacklu: btw, which simpleroutine were you talking about ? I forget where it is, I am finding it now. which version of gdb are you looking the source of? (in mine, wait_for_inferior is only 45 lines long) I dont know how to pick the verison, I just use the git version. maybe I give a wrong name. ok youpi:I remembered, my experience comes from here http://www.aosabook.org/en/gdb.html. (All of this activity is managed by wait_for_inferior. Originally this was a simple loop, waiting for the target to stop and then deciding what to do about it, but as ports to various systems needed special handling, it grew to a thousand lines, with goto statements criss-crossing it for poorly understood reasons.) youpi: the simpleroutine is gdb/gdb/exc_request.defs so there is indeed an explicit reply port but simpleroutine is for no-reply use. why use reply port here? AIUI, it's simply a way to make the request asynchronous, but still permit an answer ok, I will read the mig book carefully. hacklu: as youpi says a routine can be broken into two simpleroutines that's why some interfaces have interface.defs, interface_request.defs and interface_reply.defs files nlightnfotis: in mach terminology, a right *is* a capability the only thing mach doesn't easily provide is a way to revoke them individually braunr: Right. And ports are associated with the process server and the kernel right? I mean, from what I have understood, if a process wants to send a signal to another one, it has to do so via the ports to that process held by the process server and it has to establish its identity before doing so, so that it can be checked if it has the right to send to that port. yes do process own any ports? or are all their ports associated with the process server? *processes mach ports were intended for a lot of different uses but in the hurd, they mostly act as object references the process owning the receive right (one at most per port) implements the object processes owning send rights invoke methods on the object use portinfo to find out about the rights in a task (process is the unix terminology, task is the mach terminologyà ) i use them almost interchangeably ahh yes, I remember about the last bit. And mach tasks have a 1 to 1 association with user level processes (the ones associated with the process server) the proc server is a bit special because it has to know about all processes yes In context of [[open_issues/libpthread/t/fix_have_kernel_resources]]: hacklu: if you ever find out about either glibc or the proc server creating one receive right for each thread, please let me know # IRC, freenode, #hurd, 2013-07-07 how fork() goes? see sysdeps/mach/hurd/fork.c in glibc' sources when the father has two thread( main thread and the signal thead), if the father call fork, then the child inmediatelly call exev() to change the excute file. how many thread in the children? For instance, the new execute file also have two thread. will the exev() destroyed two threads and then create two new? s/exev()/excv() s/exev()/exec() :) what libhurduser-2.13.so does? where can I find this source? contains all the client stubs for hurd-specific RPCs it is generated and built automatically within the glibc build process and what is the "proc" server? what handles in user spaces the processes so if I call proc_wait_request(), I will go into the S_proc_wait_reply? thanks, I have found that. # IRC, freenode, #hurd, 2013-07-08 hi, this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report3-137/ this week I have met a lot of obstacles. And I am quite desired to participate in this meeting. hacklu: So from your report, the short version is: you've been able to figure out how the things work that you were looking at (good!), and now there are some new open questions that you're working on now. hacklu: That sounds good. We can of course try to help with your open questions, if you're stuck figuring them out on your own. tschwinge: the most question is: what is the proc server? why need to call proc_get_reqeust() before the mach_msg()? and Is there exist any specific running sequence between father and child task after fork()? And I found the inferior always call the trace_me() in the same time(the trace me printf always in the same line of the output log). which I have post in my report. hacklu: The fork man-page can provide a high-level answer to your Q3: »The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects [...]« hacklu: What happens in GNU Hurd is that the signal thread is also "cloned" (additionally to the thread which called fork), but then it (the signal thread) is re-started from the beginning. (So this is very much equivalent to creating a new signal thread.) hacklu: Then, upon exec, a new memory image is created/loaded, replacing the previous one. [glibc]/sysdeps/mach/hurd/execve.c. What actually happens with the existing thread (in particular, the signal thread) I don't know off-hand. Then answer is probably found in [glibc]/hurd/hurdexec.c -- and perhaps some code of the exec server ([hurd]/exec/). I have checked the status of my regiter mail to FSF. it says it had arrived in USA. hacklu: OK, good. hacklu: This is some basic information about the observer_* functions is GDB: http://sourceware.org/gdb/current/onlinedocs/gdbint/Algorithms.html#index-notifications-about-changes-in-internals-57 »3.10 Observing changes in gdb internals«. tschwinge: not too clear. I will think this latter. and what is the proc server? hacklu: /hurd/proc, maps unix processes to mach threads afaiui teythoon: question is, the mach_msg() will never return unless I called proc_wait_request() first. hacklu: sorry, I've no idea ;) teythoon: :) hacklu: I will have to look into that myself, too; don't know the answer off-hand. hacklu: In your blog you write proc_get_request -- but such a functions doesn't seems to exist? tschwinge: s/proc_get_request/proc_wait_request called in gun_wait() [gnu-nat.c] hacklu: Perhaps the wait man-page's description of WUNTRACED gives a clue: »also return if a child has stopped [...]«. But it also to me is not yet clear, how this relates to the mach_mag call, and how the proc server exactly is involved in it. I'm reading various source code files. At least, I don't undestand why it is required for an exception to be forwarded. if I need to read the proc server source code? I can see how it to become relevant for the case that GDB has to be informed that the debugee has exited normally. hacklu: Yeah, probably you should spend some time with that, as it will likely help to get a clearer picture of the situation, and is relevant for other interactions in GDB, too. hacklu: By the way, if you find that pieces of the GDB source code (especially the Hurd files of it) are insufficiently documented, it's a very good idea, once you have figured out something, to add more source code comments to the existing code. Or writed these down separately, if that is easier. which is the proc server? hurd/exec ? that ok, I already comment things on my notes. hacklu: [Hurd]/proc/ hacklu: And [Hurd]/hurd/process*.defs got it hacklu: I'll have to experiment a bit with your HDebugger example, but I'm out of time right now, sorry. Will continue later. tschwinge: yep, the HDebugger has a problem, if you put the sleep() after the printf in the just_print(), thing will hang. tschwinge: and I am a little curious about how do you find my code? I dont't remember I have mentioned that :) tschwinge: I have post my gihub link in the last week report, I found that. hacklu: That's how I found it, yes. tschwinge: :) # IRC, freenode, #hurd, 2013-07-14 hi. what is a process's msgport? And where can I find the msg_sig_post_untraced_request()? (msg_sig_post* in [hurd]/hurd/msg_defs) this is my debugger demo code https://github.com/hacklu/HDebugger.git use make test to run the demo. I put a breakpoint before the second printf in hello_world(inferior program). but I can't resume execution from that. could somebody give me some suggestions? thanks so much. hacklu: % make test make: *** No rule to make target `exc_request_S.c', needed by `all'. Stop. teythoon: updated, forget to git add that file . hacklu_: cool, seems to work now will look into this tomorrow :) exit teythoon: not work. the code can,t resume from a breakpoint # IRC, freenode, #hurd, 2013-07-15 hi, this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report4-148/ sadly to unsolve the question of resume from breakpoint. hacklu: have you tried to figure out what gdb does to resume a process? teythoon: hi. em, I have tried, but haven't find the magic in gdb yet. have you tried rpctrace'ing gdb? no, rpctrace has too many noise. I turned on the debug in gdb. I don't want rpctrace start gdb as its child task. if it can attach at some point instead of at start hacklu: you don't need to use gdb interactively, you could pipe some commands to it teythoon: that sounds a possible way. I am try it, thank you youpi: gdb can't work correctlly with rpctrace even in batch mode. get something like this "rpctrace: get an unknown send right from process 2151" hacklu: well, ideally, fix rpctrace ); ;) hacklu: but you can also as on the list, perhaps somebody knows what you need ok. or I should debug gdb more deeply. do both so either of them may win first braunr: I have found that, if there is no exception appears, the signal thread will not be createed. Then there is only one thread in the task. # IRC, freenode, #hurd, 2013-07-17 braunr: ping hacklu__: yes ? I have reply your email i don't understand "I used this (&_info)->suspend_count to get the sc value." before the thread_info call ? no, after the call but you have a null pointer the info should be returned in info, not _info strange thing is the info is a null pointer. but _info not _info isn't a pointer, that's why the kernel will use it if the data fits, which is usually the case in the begin , the info=&_info. and it will dynamically allocate memory if it doesn't yes info should still have that value after the call but the call had change it. this is what I can;t understand. are you completely sure err is 0 on return ? since the parameter is a pointer to pointer, the thread_info can change it , but I don't think it is a good ideal to set it to null pointer without any err . yes. i am sure info_len is wrong it should be the number of integers in _info i.e. sizeof(_info) / sizeof(unsigned int) i don't think that's the problem though yes, THREAD_BASIC_INFO_COUNT is already exactly that hm not exactly yes, exactly in fact I try to set it by hand, not use the macro. the macro is already defined as #define THREAD_BASIC_INFO_COUNT (sizeof(thread_basic_info_data_t) / sizeof(natural_t)) the info_len is 13. I checked. so, i said something wrong the call doesn't reallocate thread_info it uses the provided storage, nothing else yes, your call is wrong use thread_info (thread->port, THREAD_BASIC_INFO, (int *) info, &info_len); em. thread_info (thread->port, THREAD_BASIC_INFO, (int *) &info, &info_len); &info would make the kernel erase the memory where info (the pointer) was stored info, not &info or &_info directly i don't see the need for an intermediate pointer here ideally, avoid the cast but in gnu-nat.c line 3338, it use &info. use a union with both thread_info_data_t and thread_basic_info_data_t well, try it my way i think they're wrong ok, you are right, use info it is ok. the value is the same as &_info after the call. but the suspend_count is zero again. check the rest of the result to see if it's consistent I think this line need a patch. what you mean the rest of the result? the thread info run_state, sleep_time, creation_time see if they make sense ok, I try to dump it bbl braunr: thread [118] suspend_count=0 run_state=3, flags=1, sleep_time=0, creation_time.second=1374079641 something like this, seems no problems. # IRC, freenode, #hurd, 2013-07-18 how to get the thread state from TH_STATE_WAITING to TH_STATE_RUNNING hacklu__: http://www.gnu.org/software/hurd/gnumach-doc/Thread-Execution.html#Thread-Execution hacklu__: ah waiting hacklu__: this means the thread is waiting for an event so probably waiting for a message or an internal kernel event braunr: so I need to send it a message. I think I maybe forget to send some reply message. hacklu__: i'm really not sure about those low level details confirm before doing anything the gdb has called msg_sig_post_untraced_request(), I don't get clear about this function, I just call it as the same, maybe I am wrong . how will if I send a CONT to the stopped process? maybe I should try this. when the inferior is in waiting status(TH_STATE_WAITING,suspend_count=0), I use kill to send a CONT. then the become(TH_STATE_STOP,suspend_count=1). when I think I am near the success,I call thread_resume(),inferior turn out to be (TH_STATE_WAITING, suspend_count=0). so yes, probably waiting for a message braunr: after send a CONT to the inferior, then send a -9 to the debugger, the inferior continue!!! probably because it was notified there wasn't any sender any more that's funny, I will look deep into thread_resume and kill (gdb being the sender here) in hurd, when gdb attach a inferior, send signal to the inferior, who will get the signal first? the gdb or the inferior? quite differnet with linux. seems the inferior get first do you mean gdb catches its own signal through ptrace on linux ? kkk ? # IRC, freenode, #hurd, 2013-07-20 braunr: yeah, on Linux the gdb catch the signal from inferior before the signal handler. And that day my network was broken, I can't say goodbye to you. sorry for that. # IRC, freenode, #hurd, 2013-07-22 hi all, this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report5-152/ good to hear that you got the resume issue figured out teythoon: thanks :) hacklu: so your next step is to port gdbserver to hurd? yep, I am already begin to. before the mid-evaluate, I must submit something. I am far behind my personal expections hacklu: You've made great progress! Sorry, for not being able to help you very much: currently very busy with work. :-| hacklu: Working on gdbserver now is fine. I understand you have been working on HDebugger to get an understanding of how everyting works, outside of the huge GDB codebase. It's of course fine to continue working on HDebugger to test things, etc., and that also counts very much for the mid-term evaluation, so nothing to worry about. :-) but I have far away behind my application on GSOC. I haven't submit any patches. is it ok? hacklu: Don't worry. Before doing the actual work, things always look much simpler than they will be. So I was expecting/planning for that. The Hurd system is complex, with non-trivial and sometimes asynchronous communication between the different components, and so it takes some time to get an understanding of all that. yes, I haven't get all clear about the signal post. that's too mazy. hacklu: It surely is, yes. tschwinge: may you help me to understand the msg_sig_post(). I don't want to understand all details now, but I want to get the _right_ understanding of the gerneral. as I have mentioned on my weekly report, gdb is listening on the inferior's exception port, then gdb post a signal to that port. That says: gdb post a message to herself, and handle it. is this right? tschwinge: [gdb]/gdb/gnu-nat.c (line 1371), and [glibc]/hurd/hurdsig.c(line 1390) hacklu: My current understanding is that this is a "real" signal that is sent to the debugged process' signal thread (msgport), and when that process is resumed, it will process that signal. hacklu: This is different from the Mach kernel sending an exception signal to a thread's exception port, which GDB is listening to. Or am I confused? is the msgport equal the exception port? in my experience, when the thread haven't cause a exception, the signal thread will not be created. after the exception occured, the signal thread is come out. so somebody create it, who dose? the mach kernel? hacklu: My understanding is that the signal thread would always be present, because it is set up early in a process' startup. but when I call task_threads() before the exception appears, only on thread returned. "Interesting" -- another thing to look into. hacklu: Well, you must be right: GDB must also be listening to the debugged process' msgport, because otherwise it wouldn't be able to catch any signals the process receives. Gah, this is all too complex. tschwinge: that's maybe not. gdb listening on the task's exception port, and the signal maybe handle by the signal thread if it could handle. otherwise the signal thread pass the exception to the task's exception port where gdb catched. hacklu: Ah, I think I now get it. But let me first verify... ;-) something strange. I have write a program to check whether create signal threads at begining, the all created! tschwinge: this is my test code and result. http://pastebin.com/xtM6DUnG cat test.c #define _GNU_SOURCE 1 #include #include #include #include #include int main(int argc,char** argv) { mach_port_t task_port; thread_array_t threads[5]; mach_msg_type_number_t num_threads[5]; error_t err; task_port = mach_task_self(); int i; int j; for(i=0;i<5;i++) if(task_port){ err = task_threads(task_port,&threads[i],&num_threads[i]); if(err) printf("err\n"); } for(i=0;i<5;i++){ printf("===============\n"); printf("has %d threads now\n",num_threads[i]); for(j=0;j tschwinge: the result is different with HDebugger case. hacklu: It is my understanding that the two sig_post_untraced RPC calls in inf_signal indeed are invoked on the real msgport (signal thread) if the debugged process. That port is retrieved via the INF_MSGPORT_RPC/INF_RESUME_MSGPORT_RPC macro, which invoked proc_getmsgport on the proc server, and that will return (unless overridden by proc_setmsgport, but that isn't done in GDB) the msgport as set by [glibc]/hurd/hurdinit.c:_hurd_new_proc_init or _hurd_setproc. inf_signal is called from gnu_resume, which is via [target_ops]->to_resume is called from target.c:target_resume, which is called several places, for example infrun.c:resume which is used to a) just resume the debugged process, or b) resume it and have it handle a Unix signal (such as SIGALRM, or so), when using the GDB command »signal SIGALRM«, for example. So such a signal would then not be intercepted by GDB itself. By the way, this is all just from reading the code -- I hope I got it all right. Another thing: In Mach 3 Kernel Principles, the standard sequence described on pages 22, 23 is thread_suspend, thread_abort, thread_set_state, thread_resume, so you should probably do that in HDebugger too, and not call thread_set_state before. I would hope the GDB code also follows the standard sequence? Can you please check that? The one thing I'm now confused about is where/how GDB intercepts the standard setup (probably in glibc's signaling mess?) so that it receives any signals raised in the debugged process. But I'll have to continue later. tschwinge: thanks for your detail answers. I don't realize that the gnu_resume will resume for handle a signal, much thanks for point this:) tschwinge: I am not exactly comply with when I call thread_set_state. but I have called a task_suspend before. I think it's not too bad:) hacklu___: Yes, but be aware that gnu_resume is only relevant if a signal is to be forwarded to the debugged process (to be handled there), but not for the case where GDB intercepts the signal (such as SIGSEGV), and handles it itself without then forwarding it to the application. See the »info signals« GDB command. I also confused about when to start the signal thread. I will do more experiment. I have found this: when the inferior is stop at a breakpoint, I use kill to send a CONT to it, the HDebugger will get this message who listening on the exception port. # IRC, freenode, #hurd, 2013-07-28 how to understand the rpctrace output? like this. 142<--143(pid15921)->proc_mark_stop_request (19 0) 125<--1 27(pid-1)->msg_sig_post_request (20 5 task108(pid15919)); what is the (pid-1)? the kernel? 1 is /hurd/init pid-1 not means minus 1? ah, funny, you're right... I dunno then 2 is the kernel though the 142<--143 is port name? could very well be, but I'm not sure, sorry the number must be the port name. anyone knows why /hurd/init does not get dead name notifications for /hurd/exec like it does for any other essential server? as far as I can see it successfully asks for them about rpctrace, it poses as the kernel for its children, parses and relays any messages sent over the childrens message port, right? # IRC, freenode, #hurd, 2013-07-29 hi. this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report6-156/ hacklu_: the inferior voluntarily stops itself if it gets a signal and notifies its tracer? yes what if it chose not to do so? undebugable program? debugged program will be set an flag so called hurdsig_traced. normal program will handle the signal by himself. in my env, I found that when GDB attach a running program, gdb will not catch the signal send to the program. May help me try it? it doesn't? I'll check... hacklu_: yes, you're right you can just gdb a loop program, and kill -CONT to it. If I do this I will get "Can't wait for pid 12332:NO child processes" warning. yes, I noticed that too does gdb reparent the tracee? I don't think this is a good behavior. gdb should get inferior's signal absolutely In linux it does, not sure about hurd. but I think it should. definitively. there is proc_child in process.defs, but that may only be used once to set the parent of a process gdb doesn't set the inferior as its child process if attached a running procss in HURD. hacklu_: So you figured out this tracing/signal stuff. Great! tschwinge: Hi. not exactly. as I have mentioned, gdb can't get signal when attach to a running process. I also want to know how to build glibc in hurd. I have got this " relocation error: ./libc.so: symbol _dl_find_dso_for_object, version GLIBC_PRIVATE not defined in file ld.so.1 with link time reference" when use LD_PRELOAD=./my_build_glibc/libc.so hacklu: You can't just preload the new libc.so, but you'll also need to use the new ld.so. Have a look at [glibc-build]/testrun.sh for how to invoke these properly. Or, link with »-Wl,-dynamic-linker=[glibc-build]/elf/ld.so,-rpath,[glibc-build]:[glibc-build]/elf -L [glibc-build] -L [glibc-build]/elf«. If using the latter, I suggest to also add »-Wl,-t« to verify that you're linking against the correct libraries, and »ldd [executable]« to verify that [€xecutable] will load the correct libraries when invoked. I will try that, and I can't find this call pthread_cond_broadcast(). which will called in the proc_mark_stop hacklu: Oh, right, you'll also need to add libpthread (I think that's the directory name?) to the rpath and -L commands. is libpthread a part of glibc or hurd? glibc hacklu: it is a different repository available here http://git.savannah.gnu.org/cgit/hurd/libpthread.git/ tschwinge: thanks for that, but I don't think I need help about the comiler error now, it just say missing some C file. I will look into the Makefile to verify. but I think it's a part of glibc as a whole hacklu: OK. glibc is/was a stand-alone package and library, but in Debian GNU/Hurd is nowadays integrated into glibc's build process. NlightNFotis: thanks. I only add hurd, glibc, gdb,mach code to my cscope file. seems need to add libpthread. hacklu: If you use the Debian glibc package, our libpthread will be in the libpthread subdirectory. Ignore nptl, which is used for the Linux kernel. tschwinge:BTW, I have found that, to continue the inferior from a breakpoint, doesn't need to call msg_sig_post_untraced. just call thread_abort and thread_resume is already ok. I get the glibc from http://git.savannah.gnu.org/cgit/hurd. hacklu: That sounds about right, because you want the inferior to continue normally, instead of explicitly sending a (Unix) signal to it. hacklu: I suggest you use: »apt-get source eglibc« on your Hurd system. hacklu: The Savannah repository does not yet have libpthread integrated. I have this on my TODO list... tschwinge: no, apt-get source doesn't work in my Hurd. I got any code from git clone *** you most probably lack the deb-src entry in your sources.list hacklu: Do you have deb-src lines in /etc/apt/source-list? Or how does it fail? tschwinge: I have deb-src lines. and apt-get complain that: E: Unable to find a source package for eglibc or E: Unable to find a source package for glibc hacklu: which deb-src lines do you have? and piece of my source_list : deb http://ftp.debian-ports.org/debian unreleased main deb-src http://ftp.debian-ports.org/debian unreleased main you also need a deb-src line with the main archive deb-src http://cdn.debian.net/debian unstable main hacklu: Oh, hmm. And you did run »apt-get update« before? That aside, there also is that you can use. You'll need the *.dsc and *.debian.tar.xz files corresponbding to your version of glibc, and the *.orig.tar.xz file. And then run »dpkg-source -x *.dsc«. The Debian snapshot is often very helpful if you need source packages that are no longer in the main Debian repository. or simply running dget on the dsc url Oh. Good to know. e.g. dget http://cdn.debian.net/debian/pool/main/e/eglibc/eglibc_2.17-7.dsc the network is slowly. and I am in apt-get update. I will be away from this evening until sunday, too what the main difference between the source site? is dget means wget? no not exist in linux? it does, in devscripts it's a debian tool oh, yes, I have installed devscripts. I have got the libphread code, thanks. teythoon: the simple fact that this msg thread exists to receive requests and that these requests are sent by ps and procfs is a potential DoS braunr: but does that mean that on Hurd a process can prevent a debugger from intercepting signals? teythoon: yes that's not a problem for interactive programs it's part of the hurd design that programs have limited trust in each other a user can interrupt his debugger if he sees no activity that's more of a problem for non interactive system stuff like init scripts or procfs why gdb can't get inferior's signal if attach a running process? hacklu: try to guess braunr: it is not a reasonable thing. I always think it should catch the signal. hacklu: signals are a unix thing built on top of mach hacklu: think in terms of ports all communication on the hurd goes through ports but when use gdb to start a process and debugg it, this way, gdb can catch the signal hacklu: my guess is : when starting a process, gdb can act as a proxy, much like rpctrace when attaching, it can't braunr: ah, my question should ask like this: why gdb can't set the inferior as its child process when attaching it? or it can not ? hacklu: i'm not sure, the proc server is one of the parts i know the less but again, i guess there is no facility to update the msg port of a process in the proc server check that before taking it as granted braunr: aha, I alway think you know everything:) braunr: There is: setmsgport or similar. if there is one, gdb doesn't use it hacklu: That is a good question -- I can't answer it off-hand, but it might be possible (by setting the tracing flag, and such things). Perhaps it's just a GDB bug, which omits to do that. Perhaps just a one-line code change, perhaps not. That's a new bug (?) report that we may want to have a look at later on. hacklu: But also note, this new problem is not really related to your gdbserver work -- but of course you're fine to have a look at it if you'd like to. I just to ask for whether this is a normal behavior. this is related to my gdbserver work, as gdbserver also need to attach a running process... gdbserver can start a process just like gdb does you may want to focus on that first Yes. Attaching to processes that are already running is, I think, always more complicated compared to the case where GDB/gdbserver has complete control about the inferior right from the beginning. yes, I am only focus on start one. the attach way I haven't research now. hacklu: That's totally fine. You can just say that attaching to processes is not supported yet. that's sound good:) Ther will likely be more things in gdbserver that you won't be able to easily support, so it's fine to do it step-by-step. And then later add more features incrementally. That's also easier for reviewing the patches. and one more question I have ask yestoday. what is the rpctrace output (pid-1) mean? hacklu: Another thing I can't tell off-hand. I'll try to look it up. hacklu, tschwinge: my theory is that it is in fact an error message, maybe the proc server did not now a pid for the task hacklu: utsl tschwinge: for saving your time, I will look the code myself, I don;t think this is a real hard question need you to help me by reading the source code. teythoon, hacklu: Yes, from a quick inspection it looks like task2pid returning a -1 PID -- but I can't tell yet what that is supposed to mean, if it's an actualy bug, or just means there is no data available, or similar. braunr: utsl?? hacklu: http://www.catb.org/~esr/jargon/html/U/UTSL.html tschwinge: thank you. braunr like say abbreviation which I can't google out. hacklu: Again, if this affects your work, it is fine to have a look at that presumed rpctrace problem, if not, it is fine to have a look at it if you'd like to, and otherwise, we'll file it as a possible bug to be looked at laster. hacklu: Now you learned that one. :-) tschwinge: ok , this doesn't affect me now. If I have time I will figure out it. how to understand the asyn RPC? hacklu: hm ? for instance, [hurd]/proc/main.c proc_server is loop in listening message. and handle it by message_demuxer. but when I send a request like proc_wait_request() to it, will it block in the message_demuxer? and where is the function of ports_manage_port_operations_multithread()? this one is in libports it's the last thing a server calls after bootstrapping itself message_demuxer normally blocks, yes but it's not "async" the names seems the proc_server is listening message with many threads? every server in the hurd does threads are created by ports_manage_port_operations_multithread when incoming messages can't be processed quick enough by the set of already existing threads if too many task send request to the server, will it ddos? yes every server but /hurd/init (and /hurd/hello) hacklu: that's, in my opinion, a major design defect yes, that is reasonable. that's what causes what i like to call thread storms on message floods ... :) my hurd clone is intended to address such major issues couldn't that be migitated by some kind of heuristic? it already is .. I don't image that the port_manage_port_operations_multithread will dynamically create threads. I thought the server will hang if all work thread is in use. that would also be a major defect creating as many threads as necessary is a good thing the problem is the dos hacklu: btw, ddos is "distributed" dos, and it doesn't really apply to what can happen on the hurd why not ? as far as I known, the message transport is transparent. hurd has the chance to be DDOSed we don't care about the distributed property of the dos oh, I know what you mean. it simply doesn't matter on thread calling select in an event loop with a low timeout (high frequency) on a bunch of file descriptors is already enough to generate many dead-name notifications Oh! Based on what I've read in GDB source code, I thought the proc server was single-threaded. However, it no longer is, after 1996's Hurd commit fac6d9a6d59a83e96314103b3181f6f692537014. those notifications cause message flooding at servers (usually pflocal/pfinet), which spawn a lot of threads to handle those messages one* thread tschwinge: ah, the comment in gnu_nat.c is out of date! hacklu: and please, please, clean the hello_world processes you're creating on darnassus i had to do it myself again :/ braunr: [hacklu@darnassus ~]$ ps ps: No applicable processes ps -eflw htop hacklu: Probably the proc_wait_pid and proc_waits_pending stuff could be simplified then? (Not an urgent issue, of course, will file as an improvement for later.) braunr: ps -eflw |grep hacklu 1038 12360 10746 26 26 2 87 22 148M 1.06M 97:21001 S p1 0:00.00 grep --color=auto hacklu 15:08 < braunr> i had to do it myself again :/ braunr: so as a very common special case, a lot of dead name notifications cause problems for pf*? and use your numeric uid teythoon: yes braunr: I am so sorry. I only used ps to check. forgive me teythoon: simply put, a lot of messages cause problems select is one special use case braunr: blocking other requests? the other is page cache writeback creating lots of threads potentially deadlocking on failure and in the case of writebacks, simply starving braunr: but dead name notifications should mostly trigger cleanup actions, couldn't those be handled by a different thread(pool) than the rest? that's why you can bring down a hurd system with a simple cp bigfile somewhere, bigfile being a few hundreds MiBs teythoon: it doesn't change the problem threads are per task and the contention would remain the same hm since dead-name notifications are meant to release resources created by what would then be "regular" threads don't worry, there is a solution it's simple it's well known it's just hard to directly apply to the hurd and impossible to enforce on mach tschwinge: I am confuzed after I have look into S_proc_wait() [hurd/proc/wait.c], it has relate pthread_hurd_cond_wait_np. I can't find out when it will return. And the signal is report to the debuger by S_proc_wait. braunr: a pointer please ;) teythoon: basically, synchronous ipc then, enforcing one server thread per client thread and replace mach-generated notifications with messages sent from client threads the only kind of notification required by the hurd are no-senders notifications this happens when a client releases all references it has to a resource so it's easy to make that synchronous as well trying to design RPCs as closely as system calls on monolithic kernels helps in viewing how this works the only real additions are address space crossing, and capability invocation sounds reasonable, why is it hard to apply to the hurd? most rpcs are synchonous, no? mach ipc isn't braunr: When client C send a request to server S, but doesn't wait for the reply message right now, for a while, C call mach_msg to recieve reply. Can I think this is a synchronous RPC? a malicious client can still overflow message queues hacklu: no yes, I can see how this is impossible to enforce, but still we could all try to play nice :) teythoon: no :) async ipc is heavy, error-prone, less performant than sync ipc some async ipc is necessary to handle asynchronous events, but something like unix signals is actually a lot more appropriate we're diverging from the gsoc though don't waste too much time on that 15:13 < braunr> it's just hard to directly apply to the hurd I wont why is it hard almost everything is synchronous on the hurd except a few critical bits signals :) and select and pagecache writebacks fixing those parts require some work which isn't trivial for example, select should be rewritten not to use dead-name notifications adding a light weight signalling mechanism to mach and using that instead of async ipc? instead of destroying ports once an event has been received, it should (synchyronously) remove the requests installed at remote servers uh no well maybe but that would be even harder hacklu: This (proc/wait.c) is related to POSIX thread cancellation -- I don't think you need to be concerned about that. That function's "real" exit points are earlier above. teythoon: do you understand what i mean about select ? ^^ is that a no go area? for now it is we don't want to change the mach interface too much yes, I get the point about select, but I haven't looked at its implementation yet tschwinge: when I want to know the child task's state, I call proc_wait_request(), unless the child's state not change. the S_proc_wait() will not return? it creates ports, puts them in a port set, gives servers send rights so they can notify about events y not? it's not that hurd is portable to another mach, or is it? and is there another that we want to be compatible with? when an event occurs, all ports are scanned then destroyed on destruction, servers are notified by mach the problem is that the client is free to continue and make more requests while existing select requests are still being cancelled uh, yeah, that sounds like a costly way of notifying somewone the cost isn't the issue select must do something like that on a multiserver system, you can't do much about it but it should be synchronous, so a client can't make more requests to a server until the current select call is complete and it shouldn't use a server approach at the client side client -> server should be synchronous, and server -> client should be asynchronous (e.g. using a specific SIGSELECT signal like qnx does) this is a very clean way to avoid deadlocks and denials of service yes, I see qnx actually provides excellent documentation about these issues and their ipc interface is extremely simple and benefits from decades of experience on the subject hacklu: This function implements the POSIX wait call, and per »man 2 wait«: »The wait() system call suspends execution of the calling process until one of its children terminates.« hacklu: This is implemented in glibc in sysdeps/posix/wait.c, sysdeps/unix/bsd/bsd4.4/waitpid.c, sysdeps/mach/hurd/wait4.c, by invoking this RPC synchronously. hacklu: GDB on the other hand, uses this infrastructure (as I understand it) to detect (that is, to be informed) when a debuggee exits (that is, when the inferior process terminates). hacklu: Ah, so maybe I miss-poke earlier: the pthread_hurd_cond_wait_np implements the blocking. And depending on its return value the operation will be canceled or restarted (»start_over«). s%maybe%% hacklu: Does this information help? tschwinge: proc_wait_request is not only to detect the inferior exit. it also detect the child's state change as tschwinge said, it's wait(2) tschwinge: and I have see this, when kill a signal to inferior, the gdb will get the message id=24120 which come from S_proc_wait braunr: man 2 wait says: wait, waitpid, waitid - wait for process to change state. (in linux, in hurd there is no man wait) uh there is, it's the linux man page :) make sure you have manpages-dev installed I always think we are talk about linux's manpage :/ but regardless the manpage, gdb really call proc_wait_request() to detect whether inferior's changed states in any case, keep in mind the hurd is intended to be a posix system which means you can always refer to what wait is expected to do from the posix spec see http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html braunr: even in the manpags under hurd, man 2 wait also says: wait for process to change state. yes that's what it's for what's the problem ? the problem is what tschwinge has said I don't understand. like and per »man 2 wait«: »The wait() system call suspends execution of the calling process until one of its children terminates.« terminating is a form of state change historically, wait was intended to monitor process termination only so the thread become stoped wait also return afterwards, process tracing was added too what ? so when the child state become stopped, the wait() call will return? yes and I don't know this pthread_hurd_cond_wait_np. wait *blocks* until the process it references changes state pthread_hurd_cond_wait_np is the main blocking function in hurd servers well, pthread_hurd_cond_timedwait_np actually all blocking functions end up there (or in mach_msg) (well pthread_hurd_cond_timedwait_np calls mach_msg too) since I use proc_wait_request to get the state change, so the thread in proc_server will be blocked, not me. is that right? no both this is just a request, why should block me? because you're waiting for the reply afterwards or at least, you should be again, i'm not familiar with those parts after call proc_wait_request(), gdb does a lot stuffs, and then call mach_msg to recieve reply. ok I think it will be blocked only in mach_msg() if need. usually, xxx_request are the async send-only versions of RPCs Yes, that'S my understanding too. and xxx_reply the async receive-only so that makes sense so I have ask you is it a asyn RPC. yes 15:18 < hacklu> braunr: When client C send a request to server S, but doesn't wait for the reply message right now, for a while, C call mach_msg to recieve reply. Can I think this is a synchronous RPC? 15:19 < braunr> hacklu: no if it's not synchronous, it's asynchronous sorry, I spell wrong. missing a 'a' :/ S_proc_wait_reply will then be invoked once the procserver actually answers the "blocking" proc_wait call. Putting "blocking" in quotes, because (due to the asyncoronous RPC invocation), GDB has not actually blocked on this. well, it doesn't call proc_wait tschwinge: yes, the S_proc_wait_reply is called by process_reply_server(). tschwinge: so the "blocked" one is the thread in proc_server . braunr: Right. »It requests the proc_wait service.« gdb will also block on mach_msg 16:05 < braunr> both braunr: yes, if gdb doesn't call mach_msg to recieve reply it will not be blocked. i expect it will always call mach_msg right ? braunr: yes, but before it call mach_msg, it does a lot other things. but finally will call mach_msg that's ok that's the kind of things asynchronous IPC allows tschwinge: I have make a mistake in my week report. The signal recive by inferior is notified by the proc_server, not the send_signal. Because the send_singal send a SIGCHLD to gdb's msgport not gdbself. That make sense. # IRC, freenode, #hurd, 2013-07-30 braunr: before I go to sleep last night, this question pop into my mind. How do you find my hello_world is still alive on darnassus? The process is not a CPU-heavy or IO-heavy guy. You will not feel any performance penalization. I am so curious :) hacklu: have you looked into patching the proc server to allow reparenting of processes? teythoon:not yet hacklu: i've familiarized myself with proc in the last week, this should get you started nicely: http://paste.debian.net/19985/ diff --git a/proc/mgt.c b/proc/mgt.c index 7af9c1a..a11b406 100644 --- a/proc/mgt.c +++ b/proc/mgt.c @@ -159,9 +159,12 @@ S_proc_child (struct proc *parentp, if (!childp) return ESRCH; + /* XXX */ if (childp->p_parentset) return EBUSY; + /* XXX if we are reparenting, check permissions. */ + mach_port_deallocate (mach_task_self (), childt); /* Process identification. @@ -176,6 +179,7 @@ S_proc_child (struct proc *parentp, childp->p_owner = parentp->p_owner; childp->p_noowner = parentp->p_noowner; + /* XXX maybe need to fix refcounts if we are reparenting, not sure */ ids_rele (childp->p_id); ids_ref (parentp->p_id); childp->p_id = parentp->p_id; @@ -183,11 +187,14 @@ S_proc_child (struct proc *parentp, /* Process hierarchy. Remove from our current location and place us under our new parent. Sanity check to make sure parent is currently init. */ - assert (childp->p_parent == startup_proc); + assert (childp->p_parent == startup_proc); /* XXX */ if (childp->p_sib) childp->p_sib->p_prevsib = childp->p_prevsib; *childp->p_prevsib = childp->p_sib; + /* XXX we probably want to keep a reference to the old + childp->p_parent around so that if the debugger dies or detaches, + we can reparent the process to the old parent again */ childp->p_parent = parentp; childp->p_sib = parentp->p_ochild; childp->p_prevsib = &parentp->p_ochild; the code doing the reparenting is already there, but for now it is only allowed to happen once at process creation time teythoon: good job. This is in my todo list, when I implement attach feature to gdbserver I will need this hacklu: i use htop braunr: why is that process so disruptive? the big problem with those stale processes is that they're in a state that prevents one important script to complete there is a bug on the hurd with regard to terminals when you log out of an ssh session, the terminal remains open for some reason (bad reference counting somewhere, but it's quite tricky to identify) to work around the issue, i have a cron job that calls a script to kill unused terminals this works by listing processes your hello_world processes block that listing uh, how so? braunr: ok. I konw. teythoon: probably the denial of service we were talking about yesterday select flooding a server? no, a program refusing to answer on its msg port ps has an option -M : -M, --no-msg-port Don't show info that uses a process's msg port the problem is that my script requires those info ah, I see, right hacklu being working on gdb, it's not surprising he's messing with that yes indeed. couldn't ps use a timeout to detect that? braunr: yes, once I have found ps will hang when I has run hello_world in a breakpoint state. braunr: thanks for explaining the issue, i always wondered why that process is such big a deal ;) teythoon: how do you tell between processes being slow to answer and intentionnally refusing to answer ? a timeout is almost never the right solution sometimes it's the only solution though, like for networking but on a system running on a local machine, there is usually another way braunr: I don't of course ? ah ok it was rethorical :) yes I know, and I was implying that I wasn't expecting a timeout to be the clean solution and the current behaviour is hardly acceptable i agree it's ok for interactive cases you can use Ctrl-C, which uses a 3 seconds delay to interrupt the client RPC if nothing happens braunr: btw, what about *_reply.defs? Should I add a corresponding reply simpleroutine if I add a routine? normally yes right, forgot about that so that the procedure ids are kept in sync in case one wants to do this async at some point in the future? yes this happened with select i had to fix the io interface ok, noted # IRC, freenode, #hurd, 2013-07-31 Do we need write any other report for the mid-evaluation? I have only submit a question-answer to google. # IRC, freenode, #hurd, 2013-08-05 hi, this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report7build-gdbserver-on-gnuhurd-164/ youpi: can you show me some suggestions about how to design the interface and structure of gdbserver? hacklu: well, I've read your blog entry, I was wondering about tschwinge's opinion, that's why I asked whether he was here I would tend to start from an existing gdbserver, but as I haven't seen the code at all, I don't know how much that can help so you mean I shoule get a worked gdbserver then to improve it? I'd say so, but again it's not a very strong opinion I'd rather let tschwinge comment on this youpi: ok :) # IRC, freenode, #hurd, 2013-08-12 hi, this is my weekly report http://hacklu.com/blog/gsoc-weekly-report8-168/ . sorry for so late. hacklu: it seems we misunderstood ourselves last week, I meant to start from the existing gdbserver implementation but never mind :) starting from the lynxos version was a good idea youpi: em... yeah, the lynxos port is so clean and simple. youpi: aha, the "Remote connection closed" problem has been fixed after I add a init_registers_i386() and set the structure target_desc. but I don't get understand with the structure target_desc. I only know it is auto-generated which configured by the configure.srv. Hi! hacklu: In gdbserver, you should definitely re-use existing infrastructure, especially anything that deals with the protocol/communication with GDB (that is, server.c and its support files). hacklu: Then, for the x86 GNU Hurd port, it should be implemented in the same way as an existing port. The Linux port is the obvious choice, of course, but it is also fine to begin with something simpler (like the LynxOS port you've chosen), and then we can still add more features later on. That is a very good approach actually. hacklu: The x86 GNU Hurd support will basically consist of three pieces -- exactly as with GDB's native x86 GNU Hurd port: x86 processor specific (tge existing gdbserver/i386-low.c etc. -- shouldn't need any modifications (hopefully)), GNU Hurd specific (gdbserver/gnu-hurd-low.c (or similar)), and x86 GNU Hurd specific (gdbserver/gnu-hurd-x86-low.c (or similar)). s%tge%the tschwinge: now I have only add a file named gnu-low.c, I should move some part to the file gnu-i386-low.c I think. hacklu: That's fine for the moment. We can move the parts later (everything with 86 in its name, probably). that's ok. tschwinge: Can I copy code from gnu-nat.c to gdbserver/gnu-hurd-low.c? I think the two file will have many same code. hacklu: That's correct. Ideally, the code should be shared (for example, in a file in common/), but that's an ongoing discussion in GDB, for other duplicated code. So, for the moment, it is fine to copy the parts you need. hacklu: Oh, but it may be a good idea to add a comment to the source code, where it is copied from. maybe I can do a common-part just for hurd gdb port. That should make it easier later on, to consolidate the duplicated code into one place. Or you can do that, of course. If it's not too difficult to do? I think at the begining it is not difficult. But when the gdbserver code grow, the difference with gdb is growing either. That will be too many #if else. I think we should check with the GDB maintainers, what they suggest. hacklu: Please send an email To: Cc: , , and ask about this: you need to duplicate code that already exists in gnu-nat.c for new gdbserver port -- how to share code? tschwinge: ok, I will send the email right now. tschwinge: need I cc to hurd mail-list? hacklu: Not really for that questions, because that is a question only relevant to the GDB source code itself. tschwinge: got it. [[!message-id "CAB8fV=jzv_rPHP3-HQVBA-pCNZNat6PNbh+OJEU7tZgQdKX3+w@mail.gmail.com"]]. # IRC, freenode, #hurd, 2013-08-19 . when and where is the best time and place to get the regitser value in gdb? well, I'm not sure to understand the question you mean in the gdb source code, right? isn't it already done in gdb? probably similarly to i386? (linux i386 I mean) I don't find the fetch_register or relate function implement in gnu-nat.c so I can't make decision how to implement this in gdbserver. it's in i386gnu-nat.c, isn't it? yeah. does that answer your issue? thank you. I am so stupid # IRC, freenode, #hurd, 2013-08-26 < hacklu> hello everyone, this is my week report. http://hacklu.com/blog/gsoc-weekly-report10-174/ < hacklu> but now I face a new problem, when I typed the first continue command, gdb will continue all the breakpoint, and the inferior will run until normally exit. # IRC, freenode, #hurd, 2013-08-30 tschwinge: hi, does gdb's attach feature work correctlly on Hurd? on my hurd-box, the gdb can't attach to a running process, after a attaching, when I continue, gdb complained "can't find pid 12345" hacklu: attaching works, not sure why gdb is complaining teythoon: yeah, it can attaching, but can't contine process. in this case, the debugger is useless if it can't resume execution hacklu: well, gdb on Linux reacts a little differently, but for me attaching and then resuming works teythoon: yes, gdb on linux works well. % gdb --pid 21506 /bin/sleep [...] (gdb) c Continuing. warning: Can't wait for pid 21506: No child processes # pkill -SIGILL sleep warning: Pid 21506 died with unknown exit status, using SIGKILL. yes. I used a sleep program to test too. I believe that the warning and deficiencies with the signal handling are b/c on Hurd the debuggee cannot be reparented to the debugger oh, I remembered, I have asked this before. Confirming that attaching to a process in __sleep -> __mach_msg -> mach_msg_trap works fine, but then after »continue«, I see »warning: Can't wait for pid 4038: No child processes« and three times »Can't fetch registers from thread bogus thread id 1: No such thread« and the sleep process exits (normally, I guess? -- interrupted "system call"). If detaching (exit GDB) instead, I see »warning: Can't modify tracing state for pid 4041: No such process« and the sleep process exits. Attaching to and then issueing »continue« in a process that is not currently in a mach_msg_trap (tested a trivial »while (1);«) seems to work. hacklu: ^ tschwinge: in my hurdbox, if I just attach a while(1), the system is near down. nothing can happen, maybe my hardware is slow. so I can only test on the sleep one. my gdbserver doesn't support attach feature now. the other basic feather has implement. I am doing test and review the code now. Great! :-) It is fine if attaching does not work currently -- can be added later. btw, How can I submit my code? put the patch in email directly? Did you already run the GDB testsuite using your gdbserver? no, haven't yet Either that, or a Git branch to pull from. I think I should do more review and test than I submit patches. hacklu: See [GDB]/gdb/testsuite/boards/native-gdbserver.exp (and similar files) for how to run the GDB testsuite with gdbserver. ok. But don't be disappointed if there are still a lot of failures, etc. It'll already be great if some basic stuff works. now it can set and remove breakpoint. show register, access variables. ... which already is enogh for a lot of debugging sessions. :-) I will continue to make it more powerful. :) Yes, but please first work on polishing the existing code, and get it integrated upstream. That will be a great milestone. No doubt that GDB maintainers will have lots of comments about proper formatting of the source code, and such things. Trivial, but will take time to re-work and get right. oh, I got it. I will give my pathch before this weekend. Then once your basic gdbserver is included, you can continue to implement additional features, piece by piece. And then we can run the GDB testsuite with gdbserver and compare that there are no regressions, etc. Heh, »before the weekend« -- that's soon. ;-) honestly to say, most of the code is copyed from other files, I haven't write too many code myself. Good -- this is what I hoped. Often, more time in software development is spent on integrating existing things rathen than writing new code. but I have spent a lot of time to get known the code and to debug it to work. Thzis is normal, and is good in fact: existing code has already been tested and documented (in theory, at least...). Yes, that's expected too: when relying on/reusing existing code, you first have to understand it, or at least its interfaces. Doing that, you're sort of "mentally writing the existing code again". So, this sounds all fine. :-) your words make me happy. :) Well, I am, because this seems to be going well. thank you. I am going to coding now~~ # IRC, freenode, #hurd, 2013-09-02 hi, this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report11-181/ please give me any advice on how to use mig to generate stub-files in gdbserver? hacklu: http://darnassus.sceen.net/gitweb/rbraun/slabinfo.git/blob/HEAD:/Makefile braunr: shouldnt' I work like this https://github.com/hacklu/gdbserver/blob/gdbserver/gdb/config/i386/i386gnu.mh ? hacklu: seems that you need server code other than that i don't see the difference gdb use autoconf to generate the Makefile, and part from the *.mh file, but in gdbserver, there is no .mh like files. hacklu: why can't you reuse /i386gnu.mh ? braunr: question is that, there are something not need in /i386gnu.mh. hacklu: like what ? braunr: like fork-child.o msg_U.o core-regset.o hacklu: well, adjust the dependencies as you need hacklu: do you mean they become useless for gdbserver but are useful for gdb ? braunr: yes, so I need another one gnu.mh file. braunr: but the gdbserver's configure doesn't have any *.mh file, can I add the first one? or adjust the values of those variables depending on the building mode maybe tschwinge is likely to better answer those questions braunr: ok, I will wait for tschwinge's advice. hacklu, The gdb/config/ dir is for files related to the native gdb builds, as opposed to a cross gdb that does not have any native bits in it. In the latter, gdbserver will be used to touch the native layer, and GDB will only guide gdbserver through the debugging session... hacklu, In case you haven't figured that out already. luisgpm: I am not very clear with you. According to your words, I shouldn't use gdb/config for gdbserver? hacklu, Correct. You should use configure.srv for gdbserver. hacklu, gdb/gdbserver/configure.srv that is. hacklu, gdb/configure.tgt for non-native gdb files... hacklu, and gdb/config for native gdb files. hacklu, The native/non-native separation for gdb is due to the possibility of having a cross gdb. what's srv file purpose? hacklu, gdbserver, on the other hand, is always native. Doing the target-to-object-files mapping. how can I use configure.srv to config the MIG to generate stub-files? What are stub-files in this context? On Hurd, some rpc stub file are auto-gen by MIG with *.defs file luisgpm: c source code handling low level ipc stuff mig is the mach interface generator luisgpm, hacklu: If that is still helpful by now, in I described the MIG usage in GDB. (Which also states that ptrace is a system call which it is not.) hacklu: For the moment, it is fine to indeed copy the rules related to MIG/RPC stubs from gdb/config/i386/i386gnu.mh to a (possibly new) file in gdbserver. Then, later, we should work out how to properly share these, as with all the other code that is currently duplicated for GDB proper and gdbserver. hacklu, tschwinge: If there is code gdbserver and native gdb can use, feel free to put them inside gdb/common for now. hacklu, luisgpm: Right, that was the conclusion from . tschwinge, luisgpm : ok, I got it. tschwinge: sorry for haven't submit pathes yet, I will try to submit my patch tomorrow. [[!message-id "CAB8fV=iw783uGF8sWyqJNcWR0j_jaY5XO+FR3TyPatMGJ8Fdjw@mail.gmail.com"]]. # IRC, freenode, #hurd, 2013-09-06 If I want compile a file which is not in the current directory, how should I change the Makefile. I have tried that obj:../foo.c, but the foo.o will be in ../, not in the current directory. As say, When I build gdbserver, I want to use [gdb]/gdb/gnu-nat.c, How can I get the gnu-nat.o under gdbserver's directory? tschwinge: ^^ Hi! hacklu: Heh, unexpected problem. hacklu: How is this handled for the files that are already in gdb/common/? I think these would have the very same problem? tschwinge: ah. I got it I see, for example: ./gdb/Makefile.in:linux-btrace.o: ${srcdir}/common/linux-btrace.c ./gdb/gdbserver/Makefile.in:linux-btrace.o: ../common/linux-btrace.c $(linux_btrace_h) $(server_h) If I have asked before, I won't use soft link to solve this. But isn't that what you've been trying? when this, where the .o file go to? Yes, symlinks can't be used, because they're not available on every (file) system GDB can be built on. I would assume the .o files to go into the current working directory. Wonder why this didn't work for you. in gdbserver/configure.srv, there is a srv_tgtobj="gnu_nat.c ..", if I change the Makefile.in, it doesn't gdb's way. So I can't use the variable srv_tgtobj? That should be srv_tgtobj="gnu_nat.o [...]"? (Not .c.) I have try this, srv_tgtobj="../gnu_nat.c", then the gnu_nat.o is generate in the parent directory. s/.c/.o (wrong input) For my understand now, I should set the srv_tgtobj="", and then set the gnu_nat.o:../gnu_nat.c in the gdbserver/Makefile.in. right? Hmm, I thought you'd need both. Have you tried that? no, haven't yet. I will try soon. I have met an strange thing. I have this in Makefile, i386gnu-nat.o:../i386gnu-nat.c $(CC) -c $(CPPFLAGS) $(INTERNAL_CFLAGS) $< When make, it will complain that: no rules for target i386gnu-nat.c but I also have a line gnu-nat.o:../gnu-nat.c ../gnu-nat.h. this works well. hacklu: Does it work if you use $(srcdir)/../i386gnu-nat.c instead of ../i386gnu-nat.c? Or similar. I have try this, i386gnu-nat.c: echo "" ; then it works. (try $(srcdir) ing..) make: *** No rule to make target `.../i386gnu-nat.c', needed by `i386gnu-nat.o'. Stop. seems no use. tschwinge: I have found another thing, if I rename the i386gnu-nat.o to other one, like i386gnu-nat2.o. It works! # IRC, freenode, #hurd, 2013-09-07 hi, I have found many '^L' in gnu-nat.c, should I fix it or keep origin? hacklu: fix in what sense? remove the line contains ^L hacklu: see bottom of http://www.gnu.org/prep/standards/standards.html#Formatting hacklu: "Please use formfeed characters (control-L) to divide the program into pages at logical places (but not within a function)." hacklu: so unless a reason has come up to deviate from the gnu coding standards, those ^L's are there by design LarstiQ: Thank you! I always think that are some format error. I am stupid. hacklu: not stupid, you just weren't aware * LarstiQ thought the same when he first encountered them # IRC, freenode, #hurd, 2013-09-09 hacklu_, hacklu__: I don't know what tschwinge thinks, but I guess you should work with upstream on integration of your existing work, this is part of the gsoc goal: submitting one's stuff to projects youpi: Which is what we're doing (see the patches recently posted). :-) ok youpi: I always doing what you have suggest. :) I have asked in my new mail, I want to ask at here again. Should I change the gdb use lwp filed instead of tid field? There are too many functions use tid. Like named tid in the structure proc also. make_proc(),inf_tid_to_thread(),ptid_build(), and there is a field (sorry for the bad \n ) and this is my weekly report. http://hacklu.com/blog/gsoc-weekly-report12-186/ And in Pedro Alves's reply, he want me to integration only one back-end for gdb and gdbserver. but the struct target_obs are just decalre different in both of the two. How can I integrate this? or I got the mistaken understanding? tschwinge: ^^ hacklu: I will take this to email, so that Pedro et al. can comment, too. hacklu: I'm not sure about your struct target_ops question. Can you replay to Pedro's email to ask about this? tschwinge: ok. hacklu: I have sent an email about the LWP/TID question. tschwinge: Thanks for your email, now I know how to fix the LWP/TID for this moment. hacklu: Let's hope that Pedro also is fine with this. :-) tschwinge: BTW, I have a question, if we just use a locally auto-generated number to distignuish threads in a process, How can we do that? How can we know which thread throwed the exception? I haven't thought about this before. hacklu: make_proc sets up a mapping from Mach threads to GDB's TIDs. And then, for example inf_tid_to_thread is used to look that up. tschwinge: oh, yeah. that is. # IRC, freenode, #hurd, 2013-09-16 hacklu: Even when waiting for Pedro (and me) to comment, I guess you're not out of work, but can continue in parallel with other things, or improve the patch? tschwinge: honestly to say, these days I am out of work T_T after I have update the patch. I am not sure how to improve the patch beyond your comment in the email. I have just run some testcase and nothing others. hacklu: I have not yet seen any report on the GDB testsuite results using your gdbserver port (see gdb/testsuite/boards/native-gdbserver.exp). :-D question is, the resule of that testcase is just how many pass how many not pass. and I am not sure whether need to give this information. Just as a native run of GDB's testsuite, this will create *.sum and *.log files, and these you can diff to those of a native run of GDB's testsuite. https://paste.debian.net/41066/ this is my result === gdb Summary === # of expected passes 15573 # of unexpected failures 609 # of unexpected successes 1 # of expected failures 31 # of known failures 57 # of unresolved testcases 6 # of untested testcases 47 # of unsupported tests 189 /home/hacklu/code/gdb/gdb/testsuite/../../gdb/gdb version 7.6.50.20130619-cvs -nw -nx -data-directory /home/hacklu/code/gdb/gdb/testsuite/../data-directory make[3]: *** [check-single] Error 1 make[3]: Leaving directory `/home/hacklu/code/gdb/gdb/testsuite' make[2]: *** [check] Error 2 make[2]: Leaving directory `/home/hacklu/code/gdb/gdb' make[1]: *** [check-gdb] Error 2 make[1]: Leaving directory `/home/hacklu/code/gdb' make: *** [do-check] Error 2 I got a make error so I don't get the *.sum and *.log file. Well, that should be fixed then? hacklu: When does university start again for you? My university have start a week ago. but I will fix this, Oh, OK. So you won't have too much time anymore for GDB/Hurd work? it is my duty to finish my work. time is not the main problem to me, I will shedule it for myself. hacklu: Thanks! Of course, we'd be very happy if you stay with us, and continue working on this project (or another one)! :-D I also thanks all of you who helped me and mentor me to improve myself. then, what the next I can do is that fix the testcase failed? hacklu: It's been our pleasure! hacklu: A comparison of the GDB testsuite results for a native and gdbserver run would be good to get an understanding of the current status. ok, I will give this comparison soon. BTW,should I compare the native gdb result with the one before my patch You mean compare the native run before and after your patch? Yes, that also wouldn't hurt to do, to show that your patch doesn't introduce any regressions to the native GDB port. ok, beside this I should compare the native gdb with gdbserver ? Yes. beside this, what I can do more? No doubt, there will be differences between the native and gdbserver test runs -- the goal is to reduce these. (This will probably translate to: implement more stuff for the Hurd port of gdbserver.) ok, I know it. Start it now As time permits. :-) It's ok. :) # IRC, freenode, #hurd, 2013-09-23 I have to go out in a few miniutes, will be back at 8pm. I am sorry to miss the meeting this week, I will finishi my report soon. tschwinge, youpi ^^