[[!meta copyright="Copyright © 2009, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!meta title=BPF]] [[!tag open_issue_gnumach open_issue_hurd]] This is a collection of resources concerning *Berkeley Packet Filter*s. # Documentation * Wikipedia: [[!wikipedia "Berkeley Packet Filter"]] * [The Packet Filter: An Efficient Mechanism for User-level Network Code](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.8755), 1987, Jeffrey C. Mogul, Richard F. Rashid, Michael J. Accetta * [The BSD Packet Filter: A New Architecture for User-level Packet Capture](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.7849), 1992, Steven Mccanne, Van Jacobson * [Protocol Service Decomposition for High-Performance Networking](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.8387), 1993, Chris Maeda, Brian N. Bershad * [Efficient Packet Demultiplexing for Multiple Endpoints and Large Messages](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.44), 1994, Masanobu Yuhara Fujitsu, Masanobu Yuhara, Brian N. Bershad, Chris Maeda, J. Eliot, B. Moss * ... and many more # Implementation * [[community/HurdFr]] * * * Git repository: The patch for [[GNU Mach|microkernel/mach/gnumach]] is expected to be complete and functional, the [[hurd/translator]] less so -- amongst others, there are unresolved issues concerning support of [[hurd/glibc/IOCTL]]s. * * [[zhengda]] * [[!GNU_Savannah_bug 25054]] -- Kernel panic with eth-multiplexer * [[!GNU_Savannah_patch 6619]] -- pfinet uses the virtual interface * [[!GNU_Savannah_patch 6620]] -- pfinet changes its filter rules with its IP address * [[!GNU_Savannah_patch 6621]] -- pfinet sets the mach device into the promiscuous mode * [[!GNU_Savannah_patch 6622]] -- pfinet uses the BPF filter * [[!GNU_Savannah_patch 6851]] -- fix a bug in BPF # IRC ## IRC, freenode, #hurd, 2012-01-13 hm, i think the bpf code needs a complete redesign :p unless it's actually a true hurdish way to do things antrik: i need your help :) antrik: I need advice on the bpf "architecture" the current implementation uses a translator installed at /dev/bpf which means packets from the kernel are copied to that translator and then to client applications does that seem ok to you ? couldn't the translator be used to set a direct link between the kernel and the client app ? which approach seems the more Hurdish to you ? (<= this is what I need your help on) braunr: so there would be a roundtrip like kernel → bpf translator → pfinet? braunr: TBH, I don't see why we need a BPF translator at all... antrik: it handles the ioctls pinotree: pfinet isn't involved (it was merely modified to use the "new" filter format to specify it used the old packet filter, and not bpf) braunr: do we really need to emulate the ioctl()s? can't we assume that all packages using BPF will just use libpcap? (and even if we *do* want to emulate ioctl()s, why can't we handle this is libc?) antrik: that's what i'm wondering actually even if assuming all packages use libpcap, i'd like our bpf interface to be close to what bsds have, and most importantly, what libpcap expects from a bpf interface well, why? if we already have a library handling the abstraction, I don't see much point in complicating the design and use by adding another layer :-) so you would advise adapting libpcap to include a hurd specific module ? there are two reasons for adding translators: more robustness or more flexibility... so far I don't see how a BPF translator would add either right yes so we'd end up with a bpf-like interface, the same instructions and format, with different control calls right note that I had more or less the same desicion to make for KGI (emulate Linux/BSD ioctl()s, or implement a backend in libggi for handling Hurd-specific RPC; and after much consideration, I decided on the latter) ## IRC, freenode, #hurd, 2012-01-16 antrik: is there an existing facility to easily give a send right to the device master port to a task ? another function of the bpf translator is to handle the /dev/bpf node, and most importantly its permissions so that users which have read/write access to the node have access to the packet filter i guess the translator could limit itself to that functionality and then provide a device port on which libpcap operates directly by means of device_{g,s}et_status/device_set_filter braunr: I don't see the point in seperating permissions for filter from permissions from general network device access... as for device master port, all root tasks can obtain it from proc IIRC antrik: yes, but how do we allow non-root users to access that facility ? on a unix like system, it's a matter of changing the permissions of /dev/bpf with devnode, non-root users can get access to specific device nodes, including network devices i can't imagine the hurd being less flexible for that ah devnode good so we can for example make /dev/eth0 accessible by users of some group what's devnode exactly ? it's a very simple translator that implements an FS node that looks somewhat like a file, but the only operation it supports is obtaining a pseudo device master port, giving access to a specific Mach device is it already part of the hurd ? or hurdextras maybe ? it's only in zhengda's branch ah needed for both eth-multipexer and DDE and bpf soon i guess indeed :-) "obtaining a pseudo device master port", i believe you meant a pseudo device port I must admit that I don't remember exactly whether devnode proxies device_open(), so clients direct get a port to the device in question, or whether it implements a pseudo device master port... but definitely not a pseudo device port :-) i'm almost positive it gives the target device port, otherwise i don't see the point i don't understand the user of the "pseudo" word here either s/user/use/ aiui, devnode should be started as root (or in any way which gives it the device master port) the point is that the client doesn't need to know the Mach device name, and also is not bound to actual kernel devices and when started, implement the required permissions before giving clients a device port to the specific device it was installed for right but it mustn't be a proxy yes, devnode needs access to either the real master device port (for kernel devices), or one provided by eth-multiplexer or the DDE network driver well, a very simple proxy for deviceopen ok that seems exactly what i wanted to do we now need to see if we can integrate it separately create a separate branch that works for the current gnumach code, and merge dde/other specific code later on you mean independent of eth-multiplexer or DDE? yes, it was generally agreed that devnode is a good idea in any case. I have no idea why there are no device nodes for network devices on other UNIX systems... i've been wondering that for years too :) zhengda's branch has a pfinet modified to a) use devnode, and b) use BPF why bpf ? for more specific filters maybe ? hm... don't remember whether there was any technical reason for going with BPF; I guess it just seemed more reasonable to invest new work in BPF rather than obsolete Mach-specific NPF... cspf could be removed altogether, i agree another plus side of his modified pfinet is that it actually sets an appropriate filter for TCP/IP and the IP in use, rather than just setting a dummy filter catching app packets (including those irrelevant to the specific pfinet instance) err... catching all packets that's what i meant by "for more specific filters maybe ?" he was probably more comfortable with the bpf interface to write his filter rules well, it would probably be doable with NPF too :-) so by itself it's not a reason for switching to BPF... it's rather the other way around: as it was necessary to implement filters in eth-multiplexer, and implementing BPF seemed more reasoable, pfinet had to be changed to use BPF... antrik: where is zhengda's branch btw ? (I guess using proper filters with eth-multiplexer is not strictly necessary; but it would be a major performance hit not to) it's in incubator.git but it's very messy ok at some point I asked him to provide cleaned up branches, and I'm pretty sure he said he did, but I totally fail to remember where he published them :-( hm, i don't like how devnode is "architectured" :/ but it makes things a little more easy to get working i guess antrik: any idea what to grep the logs on for that? ok never mind, devnode is fine exactly what i need i wonder however if it shouldn't be improved to better handle permissions ok, never mind either, permission handling is fine so what are we waiting for ? :) I remember that there were some issues with permission handling, but I don't remember whether all were fixed :-( LarstiQ: hm... good question... ah ? hm actually, there could be issues for packet filters, yes i guess we want to allow e.g. read-only opens for capture only braunr: that would have to be handled by the actual BPF implementation I'd say it should already be the case what's the problem then? but when the actual device_open() is performed, the appropriate permissions must be provided and checking those is the responsibility of the proxy, devnode in this case and it doesn't do that? apparently not the only check is against the device name i'll begin playing with that first I vaguely remember that there has been discussion about the relation of underlying device open mode and devnode open mode... but I don't remember the outcome. in fact it was probably one of the discussions I never got around to follow up on... :-( before you begin playing, take a look at the relevant messages in the ML archive :-) must have been around two years ago ok some thread with me and scolobb (Sergiu Ivanov +- spelling) and probably zhengda there might also be some outstanding patch(es) from scolobb, not sure ## IRC, freenode, #hurd, 2012-01-17 antrik: i think i found the thread you mentioned about devnode neither sergiu nor zhengda considered the use of a read-only device for packet filtering leading to assumptions such as "only receiving packets is not terribly useful, in view of the fact that you have to at least request them, which implies *sending* packets :-) " IMO, devnode should definitely check its node permissions to build the device open flags good news is that it doesn't depend on anything specific to other incubator projects making it almost readily mergeable in the hurd i'm not sure devnode is an appropriate name though maybe something like device, or devproxy proxy-devopen maybe braunr: well, I don't remember the details of the disucssion; but as I mentioned in some mail, I did actually want to write a followup, just didn't get around to it... so I was definitely not in agreement with some of the statements made by others. I just don't remember on which point :-) which thread was it? anyways, this should in no way be specific to network devices... the idea is simply that if the client has only read permissions on the device node, it should only get to open the underlying device for read. it's up to the kernel to handle the read-only status for the device once it's opened as for the naming, the idea is that devnode simply makes Mach devices accessible through FS nodes... so the name seemed appropriate you may be right though that just "device" might be more straightforward... I don't agree on the other variants antrik: http://lists.gnu.org/archive/html/bug-hurd/2009-12/msg00155.html antrik: i agree with the general idea behind permission handling, i was just referring to their thoughts about it, which probably led to the hard coded READ | WRITE flags braunr: unfortunately, I don't remember the context of the discussion... would take me a while to get into this again :-( the discussion seems to be about eth-multiplexer as much as about devnode (if not more), and I don't remember the exact interaction ## IRC, freenode, #hurd, 2012-01-18 so, does anyone have an objection to getting devnode into the hurd and calling it something else like e.g. device ? braunr: it's Zhengda's work, right? yes I'm completely for it, it just perhaps needs some cleanup i have a few changes to add to what already exists ok well i'm assigning myself to the task braunr: I'm still not convinced just "device" is preferable perhaps machdevice ;-) but otherwise, I'd LOVE to see it in :-) i don't know .. what if the device is actually eth-multiplexer or a dde one ? it's not really "mach", is it ? or do we only refer to the interface ? that translator is only for mach devices so you consider dde devices as being mach devices too ? it's a simple proxy for device_open really will these devices use that translator? ah I thought it was using a mach-specific RPC so we can consider whatever we want braunr: yes, the translator is for Mach device interface only. it might be provided by other servers, but it's still Mach devices then drop the mach, yes i'd tend to agree with antrik antrik: I'd say the device interface is part of the hur dinterfaces then machdev :p no, it's really part of the mach interface it's part of the mach interface, yes but also of the Hurd, no? DDE network servers also use the Mach device interface no can't we say it's part of it? I mean even if we change the kernel dde is the only thing that implements it besides the kernel that i know of we will probably want to keep the same interface yes but that's a mach thing what we have now is not necessarily a reason as for other DDE drivers, I for my part believe they should export proper Hurd (UNIX) device nodes directly... but for some reason zhengda insisted on implementing it as Mach devices too :-( antrik: i agree with you on that too i was a bit surprised to see the same interface was reused youpi: we can, we just have to agree on what we'll do what do you mean by "even if we change the kernel" ? the problem with "machdev" is that it might suggest the translator actually implements the device... not sure whether this would cause serious confusion "devopen" might be another option or "machdevopen" to be entirely verbose ;-) an option i suggested earlier which you disagreed on :p but devopen is the one i'd choose youpi: as I already mentioned in the libburn thread, I don't actually think the Mach device interface is very nice; IMHO we should get rid of it as soon as we can, rather than port it to other architectures... but even *if* we decided to reuse it after all, it would still be the Mach device interface :-) actually, zheng da already suggested that name a long time ago http://lists.gnu.org/archive/html/bug-hurd/2008-08/msg00005.html no actually antrik did eh ok let's use devopen braunr: you suggested proxy-devopen, which I didn't like because of the "proxy" part :-) not only, but i don't have the logs any more :p oh, I already suggested devopen once? didn't expect myself to be that consistent... ;-) braunr: you suggested device, devproxy or proxy-devopen ah, ok devopen is better I wonder whether it's more important for clarity to have "mach" in there or "open"... or whether it's really too unweildy to have both ## IRC, freenode, #hurd, 2012-01-21 oh btw, i made devopen run today, it shouldn't be hard getting it in properly patching libpcap will be somewhat trickier i don't even really need it, but it allows having user access to mach devices, which is nice for the libpcap patch and tcpdump tests permission checking is actually its only purpose well, no, not really, it also allows opening devices implemented by user space servers transparently ## IRC, freenode, #hurd, 2012-01-27 hmm, bpf needs more work :( or we can use the userspace bpf filter in libpcap, so that it works with both gnumach and dde drivers braunr: there is a userspace BPF implementation in libpcap? I'm surprised that zhengda didn't notice it, and ported the one from gnumach instead... what is missing in the kernel implementation? antrik: filling the bpf header frankly, i'm not sure we want to bother with the kernel implementation i'd like it to work with both gnumach and dde drivers and in the long run, we'll be using userspace drivers anyway the bpf header was one of the things the defunct translator did which involved ugly memcpy()s :p braunr: well, if you want to get rid of the kernel implementation, basically you would have to take up eth-multiplexer and get it into mainline (and make sure it's used by default in Debian) I frankly believe it's the better design anyways... but quite a major change :-) not that major to me in the meantime i'll use the libpcap embedded implementation we'll have something useful faster, with minimum work when eth-multiplexer is available eth-multiplexer is ready for use, it just needs to go upstream though it's probably desirable to switch it to the BPF implementation from libpcap using the libpcap implementation in libpcap and in eth-multiplexer are two different things the latter is preferrable (and yes, by available, i meant upstream ofc) eth-mulitplexer is already using libpcap anyways (for compiling the filters); I'm sure zhengda just didn't realize it has an actual BPF implementation too... we want the filter implementation as close to the packet source as possible I have been using eth-multiplexer for at least two years now hm, there is a "snoop" source type, using raw sockets too far from the packet source, but i'll try it anyway hm wrong, snoop was the solaris packet filter fyi ## IRC, freenode, #hurd, 2012-01-28 nice, i have tcpdump working :) let's see if it's as simple with wireshark \o/ pinotree: it was actually very simple heh, POV ;) yep, wireshark works too promiscuous mode is harder to test :/ but that's a start ## IRC, freenode, #hurd, 2012-01-30 ok so next step: get tcpreplay working braunr: BTW, when you checked the status of the kernel BPF code, did you take zhengda's enhancements/fixes into account?... no when did i check it ? braunr: well, you said the kernel BPF code has serious shortcomings. did you take zhengda's changes into account? antrik: ah, when i mention the issues, i considered the userspace translator only antrik: and stuff like non blocking io, exporting a selectable file descriptor antrik: deb http://ftp.sceen.net/debian-hurd experimental/ antrik: this is my easy to use repository with a patched libpcap0.8 and a small and unoptimized pcap-hurd.c module it doesn't use devopen yet i thought it would be better to have packet filtering working first as a debian patch, then get the new translator+final patch upstream braunr, tcpdump works great here (awesome!). I'm probably using exactly the same setup and "hardware" as you do, though :-P ## IRC, freenode, #hurd, 2012-01-31 antrik: i tend to think we need a bpf translator, or anything between the kernel and libpcap to provide selectable file descriptors jkoenig: do you happen to know how mach_msg (as called in a hello.c file without special macros or options) deals with signals ? i mean, is it wrapped by the libc in a version that sets errno ? braunr: no idea. braunr: what's up with it? (not that i have an idea about your actual question, just curious) pinotree: i'm improving signal handling in my pcap-hurd module i guess checking for MACH_RCV_INTERRUPTED will dio -INFO is correctly handled :) ok new patch seems fine braunr: selectable file descriptors? antrik: see pcap_fileno() for example it returns a file descriptor matching the underlying object (usually a socket) that can be multiplexed in a select/poll call obviously a mach port alone can't do the job i've upgraded the libpcap0.8 package with improved signal handling for tests braunr: no idea what you are talking about :-( ## IRC, freenode, #hurd, 2012-02-01 antrik: you do know about select/poll antrik: you know they work with multiple selectable/pollable file descriptors on most unix systems, packet capture sources are socket descriptors they're selectable/pollable braunr: what are packet capture sources? antrik: objects that provide applications with packets :) antrik: a PF_PACKET socket on Linux for example, or a Mach device, or a BPF file descriptor on BSD for a single network device? or all of them? AIUI the userspace BPF implementation in libpcap opens this device, waits for packets, and if any arrive, decides depending on the rules whether to pass them to the main program? antrik: that's it, but it's not the point antrik: the point is that, if programs need to include packet sources in select/poll calls, they need file descriptors without a translator, i can't provide that so we either decide to stick with the libpcap patch only, and keep this limitation, or we write a translator that enables this feature braunr: are the two options exclusive? pinotree: unless we implement a complete bpf translator like i did years ago, we'll need a patch in libpcap pinotree: the problem with my early translator implementation is that it's buggy :( pinotree: and it's also slower, as packets are small enough to be passed through raw copies braunr: I'm not sure what you mean when talking about "programs including packet sources". programs only interact with packet sources through libpcap, right? braunr: or are you saying that programs somehow include file descriptors for packet sources (how do they obtain them?) in their main loop, and explicitly pass control to libpcap once something arrives on the respecitive descriptors? antrik: that's the idea, yes braunr: what is the idea? 20:38 < antrik> braunr: or are you saying that programs somehow include file descriptors for packet sources (how do they obtain them?) in their main loop, and explicitly pass control to libpcap once something arrives on the respecitive descriptors? braunr: you didn't answer my question though :-) braunr: how do programs obtain these FDs? antrik: using pcap_fileno() for example ## IRC, freenode, #hurd, 2012-02-02 braunr: oh right, you already mentioned that one... braunr: so you want some entity that exposes the device as something more POSIXy, so it can be used in standard FS calls, unlike the Mach devices used for pfinet this is probably a good sentiment in general... but I'm not in favour of a special solution only for BPF. rather I'd take this as an indication that we probably should expose network interfaces as something file-like in general after all, and adapt pfinet, eth-multiplexer, and DDE accordingly antrik: i agree antrik: eth-multiplexer would be the right place ## IRC, freenode, #hurd, 2012-04-24 braunr: Is BPF fully supported by now? Can it be used for isc-dhcp? gnu_srs: bpf isn't supported at all gnu_srs: instead of emulating it, i added a hurd-specific module in libpcap if isc-dhcp can use libpcap, then fine (otherwise we could create a hurd-specific patch for dhcp that uses the in-kernel bpf filter implementation) gnu_srs: can't it use a raw socket ? it can it's just that the shape of the patch to do so wasn't exactly how they needed it so they have to rework it a bit and that takes time ok antrik: for now, we prefer encapsulating the system specific code in libpcap, and let users of that library benefit from it instead of implementing the low level bpf interface, which nonetheless has some system-specific variants .. ## IRC, freenode, #hurd, 2012-08-03 In context of the [[select]] issue. i understand now why my bpf translator was so buggy the condition_timedwait i wrote at the time was .. incomplete :)