[[!meta copyright="Copyright © 2010, 2011, 2013, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!meta title="Profiling, Tracing"]] *Profiling* ([[!wikipedia Profiling_(computer_programming) desc="Wikipedia article"]]) is a tool for tracing where CPU time is spent. This is usually done for [[performance analysis|performance]] reasons. * [[hurd/debugging/rpctrace]] * [[gprof]] Should be working, but some issues have been reported, regarding GCC spec files. Should be possible to fix (if not yet done) easily. * [[glibc]]'s sotruss * [[ltrace]] * [[latrace]] * [[community/gsoc/project_ideas/dtrace]] Have a look at this, integrate it into the main trees. * [[LTTng]] * [[SystemTap]] * ... or some other Linux thing. # IRC, freenode, #hurd, 2013-06-17 is that possible we develop rpc msg analyse tool? make it clear view system at different level? hurd was dynamic system, how can we just read log line by line congzhang: well, you can use rpctrace and then analyze the logs, but rpctrace is quite intrusive and will slow down things (like strace or similar) congzhang: I don't know if a low-overhead solution could be made or not that's the problem when real system run, the msg cross different server, and then the debug action should not intrusive the process itself we observe the system and analyse os when rms choose microkernel, it's expect to accelerate the progress, but not microkernel make debug a litter hard well, it's not limited to microkernels, debugging/tracing is intrusive and slow things down, it's an universal law of compsci no, it makes debugging easier I don't think so you can gdb the various services (like ext2fs or pfinet) more easily and rpctrace isn't any worse than strace how easy when debug lpc lpc ? because cross context classic function call when find the bug source, I don't care performance, I wan't to know it's right or wrong by design, If it work as I expect I optimize it latter I have an idea, but don't know weather it's usefull or not rpctrace is a lot less instrusive than ptrace based tools congzhang: debugging is not made hard by the design choice, but by implementation details as a simple counter example, someone often cited usb development on l3 being made a lot easier than on a monolithic kernel Collect the trace information first, and then layout the msg by graph, when something wrong, I focus the trouble rpc, and found what happen around "by graph" ? yes braunr: directed graph or something similar and not caring about performance when debugging is actually stupid i've seen it on many occasions, people not being able to use debugging tools because they were far too inefficient and slow why a graph ? what you want is the complete trace, taking into account cross address space boundaries yes well it's linear switching server by independent process view it's linear it's linear on cpu's view too yes, I need complete trace, and dynamic control at microkernel level os, if server crash, and then I know what's other doing, from the graph graph needn't to be one, if the are not connect together, time sort them when hurd was complete ok, some tools may be help too i don't get what you want on that graph sorry, I need a context like uml sequence diagram, I need what happen one by one from server's view and from the function's view that's still linear so please stop using the word graph you want a trace a simple call trace yes, and a tool with some work gdb could do it you mean under some microkernel infrastructure help ? if needed braunr: will that be easy? not too hard i've had this idea for a long time actually another reason i insist on migrating threads (or rather, binding server and client threads) braunr: that's great the current problem we have when using gdb is that we don't know which server thread is handling the request of which client we can guess it but it's not always obvious I read the talk, know some of your idea make things happen like classic kernel, just from function ,sure:) that's it I think you and other do a lot of work to improve the mach and hurd, buT we lack the design document and the diagram, one diagram was great than one thousand words diagrams are made after the prototypes that prove they're doable i'm not a researcher and we have little time the prototype is the true spec that's why i wan't cllector the trace info and show, you can know what happen and how happen, maybe just suitable for newbie, hope more young hack like it once it's done, everything else is just sugar candy around it # IRC, freenode, #hurd, 2014-01-05 braunr: do you speak ocaml ? i had this awesome idea for a universal profiling framework for c universal as in not os dependent, so it can be easily used on hurd or in gnu mach it does a source transformation, instrumenting what you are interested in for this transformation, coccinelle is used i have a prototype to measure how often a field in a struct is accessed unfortunately, coccinelle hangs while processing kern/slab.c :/ teythoon: I do speak ocaml awesome :) unfortunately, i do not :/ i should probably get in touch with the coccinelle devs, most likely the problem is that coccinelle runs in circles somewhere it's not so complex actually possibly, yes do you know coccinelle ? the only really peculiar thing in ocaml is lambda calculus +c I know a bit, although I've never really written an semantic patch myself i'm okay with that but I can understand them then ocaml should be fine for you :) just ask the few bits that you don't understand :) yeah, i haven't really made an effort yet writing ocaml is a bit more difficult because you need to understand the syntax, but for putting printfs it should be easy enough if you get a backtrace with ocamldebug (it basically works like gdb), I can probably explain you what might be happening ## IRC, freenode, #hurd, 2014-01-06 braunr: i'm not doing microoptimizations, i'm developing a profiler :p teythoon: nice :) i thought you might like it teythoon: you may want to look at http://pdos.csail.mit.edu/multicore/dprof/ from the same people who brought radixvm which data structure should i test it with next ? uh, no idea :) the ipc ones i suppose yeah, or the task related ones but be careful, there many "inline" versions of many ipc functions in the fast paths and when they say inline, they really mean they copied it +are but i have a microbenchmark for ipc performance you sure have been busy ;p it's funny you're working on a profiler at the same time a collegue of mine said he was interested in writing one in x15 :) i don't think inlining is a problem for my tool well, you can use my tool for x15 i told him he could look at what you did so i expect he'll ask soon cool :) my tool uses coccinelle to instrument c code, so this works in any environment one just needs a little glue and a method to get the data seems reasonable for gnumach, i just stuff a tiny bit of code into the kdb hm debians bigmem patch with my code transformation makes gnumach hang early on i don't even get a single message from gnumach ouch or it is somethign else entirely it didn't even work without my patches o_O weird uh oh, the kmem_cache array is not properly aligned braunr: http://paste.debian.net/74588/ teythoon: do you mean, with your patch ? i'm not sure i understand are you saying gnumach doesn't start because of an alignment issue ? no, that's unrelated i skipped the bigmem patch, have a running gnumach with instrumentation hum, what is that aliased column ? but, despite my efforts with __attribute__((align(64))), i see lot's of accesses to kmem_cache objects which are not properly aligned is that reported by the performance counters ? no http://paste.debian.net/74593/ aer those the previous lines accessed by other unrelated code ? previous bytes in the same line* this is a patch generated to instrument the code so i instrument field access of the form i->a but if one does &i->a, my approach will no longer keep track of any access through that pointer so i do not count that as an access but as creating an alias for that field ok so if that aliased count is not zero, the tool might underestimate the access count hm static struct kmem_cache kalloc_caches[KALLOC_NR_CACHES] __attribute__((align(64))); but nm gnumach|grep kalloc_caches c0226e20 b kalloc_caches ah, that's fine yes nevr mind don't we have a macro for the cache line size ? ah, there are a great many more kmem_caches around and noone told me ... teythoon: eh :) aren't you familiar with type-specific caches ? no, i'm not familiar with anything in gnumach-land well, it's the regular slab allocator, carrying the same ideas since 1994 it's pretty much the same in linux and other modern unices ok the main difference is likely that we allocate our caches statically because we have no kernel modules and know we'll never destroy them, only reap them is there a macro for the cache line size ? there is one burried in the linux source L1_CACHE_BYTES from linux/src/include/asm-i386/cache.h there is one in kern/slab.h but it is out of date there is ? but it's commented out only used when SLAB_USE_CPU_POOLS is defined but the build system should give you CPU_L1_SHIFT hm and we probably should define CPU_L1_SIZE from that unconditionnally in config.h or a general param.h file if there is one the architecture-specific one perhaps although it's exported to userland so maybe not ## IRC, freenode, #hurd, 2014-01-07 braunr: linux defines ____cacheline_aligned : http://lxr.free-electrons.com/source/include/linux/cache.h#L20 where would i put a similar definition in gnumach ? .oO( four underscores ?!? ) heh yes, four teythoon: yes :) are kmem_cache objects ever allocated dynamically in gnumach ? no hm i figured that, since there are no kernel modules, there is no need to allocate them dynamically, since they're never destroyed so i aligned all statically declarations with __attribute__((align(1 << CPU_L1_SHIFT))) but i still see 77% of all accesses being to objects that are not properly aligned o_O ah >,< you could add an assertion in kmem_cache_init to find out what's wrong *aligned eh :) right grr sweet, the kmem_caches are now all properly aligned :) :) hm i guess i should change what vmstat reports as "cache" from the cached objects to the external ones (which map files and not anonymous memory) braunr: http://paste.debian.net/74869/ turned out that struct kmem_cache was actually an easy target no bitfields, no embedded structs that were addressed as such (and not aliased) :) ## IRC, freenode, #hurd, 2014-01-09 braunr: i didn't quite get what you and youpi were talking about wrt to the alignment attribute define a type for struct kmem_cache with the alignment attribute ? is that possible ? ah, like it's done for kmem_cpu_pool teythoon: that's it :) note that aligning a struct doesn't change what sizeof returns heh, that save's one a whole lot of trouble indeed you have to align a member inside for that why would it change the size ? imagine an array of such structs ah right but it fits into two cachelines exactly that wouldn't be a problem with an array either so an array of those will still be aligned element-wise yes and it's often used like that, just as i did for the cpu pools but then one is tempted to think the size of each element has changed too and then use that technique for, say, reserving a whole cache line for one variable ah, now i get that remark ;) :) braunr: i annotated struct kmem_cache in slab.h with __cacheline_aligned and it did not have the desired effect can you show the diff please ? http://paste.debian.net/75192/ i don't know why :/ that's how it's done for kmem_cpu_pool i'll try it here wait i made a typo >,< __cachline_aligned bad one uh :) i don't see it ah yes missing e yep, works like a charme :) nice, good to know :) :) given the previous discussion, shall i send it to the list or commit it right away ? i'd say go ahead and commit