[[!meta copyright="Copyright © 2009, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!meta title=BPF]] [[!tag open_issue_gnumach open_issue_hurd]] This is a collection of resources concerning *Berkeley Packet Filter*s. # Documentation * Wikipedia: [[!wikipedia "Berkeley Packet Filter"]] * [The Packet Filter: An Efficient Mechanism for User-level Network Code](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.8755), 1987, Jeffrey C. Mogul, Richard F. Rashid, Michael J. Accetta * [The BSD Packet Filter: A New Architecture for User-level Packet Capture](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.7849), 1992, Steven Mccanne, Van Jacobson * [Protocol Service Decomposition for High-Performance Networking](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.8387), 1993, Chris Maeda, Brian N. Bershad * [Efficient Packet Demultiplexing for Multiple Endpoints and Large Messages](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.44), 1994, Masanobu Yuhara Fujitsu, Masanobu Yuhara, Brian N. Bershad, Chris Maeda, J. Eliot, B. Moss * ... and many more # Implementation * [[community/HurdFr]] * <http://wiki.hurdfr.org/index.php/BPF> * <http://wiki.hurdfr.org/index.php/Reseau_dans_gnumach> * Git repository: <http://rcs-git.duckcorp.org/hurdfr/bpf.git/> The patch for [[GNU Mach|microkernel/mach/gnumach]] is expected to be complete and functional, the [[hurd/translator]] less so -- amongst others, there are unresolved issues concerning support of [[hurd/glibc/IOCTL]]s. * <http://lists.gnu.org/archive/html/bug-hurd/2006-03/msg00025.html> * [[zhengda]] * [[!GNU_Savannah_bug 25054]] -- Kernel panic with eth-multiplexer * [[!GNU_Savannah_patch 6619]] -- pfinet uses the virtual interface * [[!GNU_Savannah_patch 6620]] -- pfinet changes its filter rules with its IP address * [[!GNU_Savannah_patch 6621]] -- pfinet sets the mach device into the promiscuous mode * [[!GNU_Savannah_patch 6622]] -- pfinet uses the BPF filter * [[!GNU_Savannah_patch 6851]] -- fix a bug in BPF # IRC ## IRC, freenode, #hurd, 2012-01-13 <braunr> hm, i think the bpf code needs a complete redesign :p <braunr> unless it's actually a true hurdish way to do things <braunr> antrik: i need your help :) <braunr> antrik: I need advice on the bpf "architecture" <braunr> the current implementation uses a translator installed at /dev/bpf <braunr> which means packets from the kernel are copied to that translator and then to client applications <braunr> does that seem ok to you ? <braunr> couldn't the translator be used to set a direct link between the kernel and the client app ? <braunr> which approach seems the more Hurdish to you ? (<= this is what I need your help on) <pinotree> braunr: so there would be a roundtrip like kernel → bpf translator → pfinet? <antrik> braunr: TBH, I don't see why we need a BPF translator at all... <braunr> antrik: it handles the ioctls <braunr> pinotree: pfinet isn't involved (it was merely modified to use the "new" filter format to specify it used the old packet filter, and not bpf) <antrik> braunr: do we really need to emulate the ioctl()s? can't we assume that all packages using BPF will just use libpcap? <antrik> (and even if we *do* want to emulate ioctl()s, why can't we handle this is libc?) <braunr> antrik: that's what i'm wondering actually <braunr> even if assuming all packages use libpcap, i'd like our bpf interface to be close to what bsds have, and most importantly, what libpcap expects from a bpf interface <antrik> well, why? if we already have a library handling the abstraction, I don't see much point in complicating the design and use by adding another layer :-) <braunr> so you would advise adapting libpcap to include a hurd specific module ? <antrik> there are two reasons for adding translators: more robustness or more flexibility... so far I don't see how a BPF translator would add either <braunr> right <antrik> yes <braunr> so we'd end up with a bpf-like interface, the same instructions and format, with different control calls <antrik> right <antrik> note that I had more or less the same desicion to make for KGI (emulate Linux/BSD ioctl()s, or implement a backend in libggi for handling Hurd-specific RPC; and after much consideration, I decided on the latter) ## IRC, freenode, #hurd, 2012-01-16 <braunr> antrik: is there an existing facility to easily give a send right to the device master port to a task ? <braunr> another function of the bpf translator is to handle the /dev/bpf node, and most importantly its permissions <braunr> so that users which have read/write access to the node have access to the packet filter <braunr> i guess the translator could limit itself to that functionality <braunr> and then provide a device port on which libpcap operates directly by means of device_{g,s}et_status/device_set_filter <antrik> braunr: I don't see the point in seperating permissions for filter from permissions from general network device access... <antrik> as for device master port, all root tasks can obtain it from proc IIRC <braunr> antrik: yes, but how do we allow non-root users to access that facility ? <braunr> on a unix like system, it's a matter of changing the permissions of /dev/bpf <antrik> with devnode, non-root users can get access to specific device nodes, including network devices <braunr> i can't imagine the hurd being less flexible for that <braunr> ah devnode <braunr> good <antrik> so we can for example make /dev/eth0 accessible by users of some group <braunr> what's devnode exactly ? <antrik> it's a very simple translator that implements an FS node that looks somewhat like a file, but the only operation it supports is obtaining a pseudo device master port, giving access to a specific Mach device <braunr> is it already part of the hurd ? <braunr> or hurdextras maybe ? <antrik> it's only in zhengda's branch <braunr> ah <antrik> needed for both eth-multipexer and DDE <braunr> and bpf soon i guess <antrik> indeed :-) <braunr> "obtaining a pseudo device master port", i believe you meant a pseudo device port <antrik> I must admit that I don't remember exactly whether devnode proxies device_open(), so clients direct get a port to the device in question, or whether it implements a pseudo device master port... <antrik> but definitely not a pseudo device port :-) <braunr> i'm almost positive it gives the target device port, otherwise i don't see the point <braunr> i don't understand the user of the "pseudo" word here either <braunr> s/user/use/ <braunr> aiui, devnode should be started as root (or in any way which gives it the device master port) <antrik> the point is that the client doesn't need to know the Mach device name, and also is not bound to actual kernel devices <braunr> and when started, implement the required permissions before giving clients a device port to the specific device it was installed for <braunr> right <braunr> but it mustn't be a proxy <antrik> yes, devnode needs access to either the real master device port (for kernel devices), or one provided by eth-multiplexer or the DDE network driver <braunr> well, a very simple proxy for deviceopen <braunr> ok <braunr> that seems exactly what i wanted to do <braunr> we now need to see if we can integrate it separately <braunr> create a separate branch that works for the current gnumach code, and merge dde/other specific code later on <antrik> you mean independent of eth-multiplexer or DDE? yes, it was generally agreed that devnode is a good idea in any case. I have no idea why there are no device nodes for network devices on other UNIX systems... <braunr> i've been wondering that for years too :) <antrik> zhengda's branch has a pfinet modified to a) use devnode, and b) use BPF <braunr> why bpf ? <braunr> for more specific filters maybe ? <antrik> hm... don't remember whether there was any technical reason for going with BPF; I guess it just seemed more reasonable to invest new work in BPF rather than obsolete Mach-specific NPF... <braunr> cspf could be removed altogether, i agree <antrik> another plus side of his modified pfinet is that it actually sets an appropriate filter for TCP/IP and the IP in use, rather than just setting a dummy filter catching app packets (including those irrelevant to the specific pfinet instance) <antrik> err... catching all packets <braunr> that's what i meant by "for more specific filters maybe ?" <braunr> he was probably more comfortable with the bpf interface to write his filter rules <antrik> well, it would probably be doable with NPF too :-) so by itself it's not a reason for switching to BPF... <antrik> it's rather the other way around: as it was necessary to implement filters in eth-multiplexer, and implementing BPF seemed more reasoable, pfinet had to be changed to use BPF... <braunr> antrik: where is zhengda's branch btw ? <antrik> (I guess using proper filters with eth-multiplexer is not strictly necessary; but it would be a major performance hit not to) <antrik> it's in incubator.git <antrik> but it's very messy <braunr> ok <antrik> at some point I asked him to provide cleaned up branches, and I'm pretty sure he said he did, but I totally fail to remember where he published them :-( <braunr> hm, i don't like how devnode is "architectured" :/ <braunr> but it makes things a little more easy to get working i guess <LarstiQ> antrik: any idea what to grep the logs on for that? <braunr> ok never mind, devnode is fine <braunr> exactly what i need <braunr> i wonder however if it shouldn't be improved to better handle permissions <braunr> ok, never mind either, permission handling is fine <braunr> so what are we waiting for ? :) <antrik> I remember that there were some issues with permission handling, but I don't remember whether all were fixed :-( <antrik> LarstiQ: hm... good question... <braunr> ah ? <braunr> hm actually, there could be issues for packet filters, yes <braunr> i guess we want to allow e.g. read-only opens for capture only <antrik> braunr: that would have to be handled by the actual BPF implementation I'd say <braunr> it should already be the case <antrik> what's the problem then? <braunr> but when the actual device_open() is performed, the appropriate permissions must be provided <braunr> and checking those is the responsibility of the proxy, devnode in this case <antrik> and it doesn't do that? <braunr> apparently not <braunr> the only check is against the device name <braunr> i'll begin playing with that first <antrik> I vaguely remember that there has been discussion about the relation of underlying device open mode and devnode open mode... but I don't remember the outcome. in fact it was probably one of the discussions I never got around to follow up on... :-( <antrik> before you begin playing, take a look at the relevant messages in the ML archive :-) <antrik> must have been around two years ago <braunr> ok <antrik> some thread with me and scolobb (Sergiu Ivanov +- spelling) and probably zhengda <antrik> there might also be some outstanding patch(es) from scolobb, not sure ## IRC, freenode, #hurd, 2012-01-17 <braunr> antrik: i think i found the thread you mentioned about devnode <braunr> neither sergiu nor zhengda considered the use of a read-only device for packet filtering <braunr> leading to assumptions such as "only receiving packets <braunr> is not terribly useful, in view of the fact that you have to at least <braunr> request them, which implies *sending* packets :-) <braunr> " <braunr> IMO, devnode should definitely check its node permissions to build the device open flags <braunr> good news is that it doesn't depend on anything specific to other incubator projects <braunr> making it almost readily mergeable in the hurd <braunr> i'm not sure devnode is an appropriate name though <braunr> maybe something like device, or devproxy <braunr> proxy-devopen maybe <antrik> braunr: well, I don't remember the details of the disucssion; but as I mentioned in some mail, I did actually want to write a followup, just didn't get around to it... so I was definitely not in agreement with some of the statements made by others. I just don't remember on which point :-) <antrik> which thread was it? <antrik> anyways, this should in no way be specific to network devices... the idea is simply that if the client has only read permissions on the device node, it should only get to open the underlying device for read. it's up to the kernel to handle the read-only status for the device once it's opened <antrik> as for the naming, the idea is that devnode simply makes Mach devices accessible through FS nodes... so the name seemed appropriate <antrik> you may be right though that just "device" might be more straightforward... I don't agree on the other variants <braunr> antrik: http://lists.gnu.org/archive/html/bug-hurd/2009-12/msg00155.html <braunr> antrik: i agree with the general idea behind permission handling, i was just referring to their thoughts about it, which probably led to the hard coded READ | WRITE flags <antrik> braunr: unfortunately, I don't remember the context of the discussion... would take me a while to get into this again :-( <antrik> the discussion seems to be about eth-multiplexer as much as about devnode (if not more), and I don't remember the exact interaction ## IRC, freenode, #hurd, 2012-01-18 <braunr> so, does anyone have an objection to getting devnode into the hurd and calling it something else like e.g. device ? <youpi> braunr: it's Zhengda's work, right? <braunr> yes <youpi> I'm completely for it, it just perhaps needs some cleanup <braunr> i have a few changes to add to what already exists <braunr> ok <braunr> well i'm assigning myself to the task <antrik> braunr: I'm still not convinced just "device" is preferable <antrik> perhaps machdevice ;-) <antrik> but otherwise, I'd LOVE to see it in :-) <braunr> i don't know .. what if the device is actually eth-multiplexer or a dde one ? <braunr> it's not really "mach", is it ? <braunr> or do we only refer to the interface ? <youpi> that translator is only for mach devices <braunr> so you consider dde devices as being mach devices too ? <braunr> it's a simple proxy for device_open really <youpi> will these devices use that translator? <youpi> ah <youpi> I thought it was using a mach-specific RPC <braunr> so we can consider whatever we want <antrik> braunr: yes, the translator is for Mach device interface only. it might be provided by other servers, but it's still Mach devices <youpi> then drop the mach, yes <braunr> i'd tend to agree with antrik <youpi> antrik: I'd say the device interface is part of the hur dinterfaces <braunr> then machdev :p <braunr> no, it's really part of the mach interface <youpi> it's part of the mach interface, yes <youpi> but also of the Hurd, no? <antrik> DDE network servers also use the Mach device interface <braunr> no <youpi> can't we say it's part of it? <youpi> I mean <youpi> even if we change the kernel <braunr> dde is the only thing that implements it besides the kernel that i know of <youpi> we will probably want to keep the same interface <braunr> yes but that's a mach thing <youpi> what we have now is not necessarily a reason <antrik> as for other DDE drivers, I for my part believe they should export proper Hurd (UNIX) device nodes directly... but for some reason zhengda insisted on implementing it as Mach devices too :-( <braunr> antrik: i agree with you on that too <braunr> i was a bit surprised to see the same interface was reused <braunr> youpi: we can, we just have to agree on what we'll do <braunr> what do you mean by "even if we change the kernel" ? <antrik> the problem with "machdev" is that it might suggest the translator actually implements the device... not sure whether this would cause serious confusion <antrik> "devopen" might be another option <antrik> or "machdevopen" to be entirely verbose ;-) <braunr> an option i suggested earlier which you disagreed on :p <braunr> but devopen is the one i'd choose <antrik> youpi: as I already mentioned in the libburn thread, I don't actually think the Mach device interface is very nice; IMHO we should get rid of it as soon as we can, rather than port it to other architectures... <antrik> but even *if* we decided to reuse it after all, it would still be the Mach device interface :-) <braunr> actually, zheng da already suggested that name a long time ago <braunr> http://lists.gnu.org/archive/html/bug-hurd/2008-08/msg00005.html <braunr> no actually antrik did eh <braunr> ok let's use devopen <antrik> braunr: you suggested proxy-devopen, which I didn't like because of the "proxy" part :-) <braunr> not only, but i don't have the logs any more :p <antrik> oh, I already suggested devopen once? didn't expect myself to be that consistent... ;-) <antrik> braunr: you suggested device, devproxy or proxy-devopen <braunr> ah, ok <braunr> devopen is better <antrik> I wonder whether it's more important for clarity to have "mach" in there or "open"... or whether it's really too unweildy to have both ## IRC, freenode, #hurd, 2012-01-21 <braunr> oh btw, i made devopen run today, it shouldn't be hard getting it in properly <braunr> patching libpcap will be somewhat trickier <braunr> i don't even really need it, but it allows having user access to mach devices, which is nice for the libpcap patch and tcpdump tests <braunr> permission checking is actually its only purpose <braunr> well, no, not really, it also allows opening devices implemented by user space servers transparently ## IRC, freenode, #hurd, 2012-01-27 <braunr> hmm, bpf needs more work :( <braunr> or we can use the userspace bpf filter in libpcap, so that it works with both gnumach and dde drivers <antrik> braunr: there is a userspace BPF implementation in libpcap? I'm surprised that zhengda didn't notice it, and ported the one from gnumach instead... <antrik> what is missing in the kernel implementation? <braunr> antrik: filling the bpf header <braunr> frankly, i'm not sure we want to bother with the kernel implementation <braunr> i'd like it to work with both gnumach and dde drivers <braunr> and in the long run, we'll be using userspace drivers anyway <braunr> the bpf header was one of the things the defunct translator did <braunr> which involved ugly memcpy()s :p <antrik> braunr: well, if you want to get rid of the kernel implementation, basically you would have to take up eth-multiplexer and get it into mainline <antrik> (and make sure it's used by default in Debian) <antrik> I frankly believe it's the better design anyways... but quite a major change :-) <braunr> not that major to me <braunr> in the meantime i'll use the libpcap embedded implementation <braunr> we'll have something useful faster, with minimum work when eth-multiplexer is available <antrik> eth-multiplexer is ready for use, it just needs to go upstream <antrik> though it's probably desirable to switch it to the BPF implementation from libpcap <braunr> using the libpcap implementation in libpcap and in eth-multiplexer are two different things <braunr> the latter is preferrable <braunr> (and yes, by available, i meant upstream ofc) <antrik> eth-mulitplexer is already using libpcap anyways (for compiling the filters); I'm sure zhengda just didn't realize it has an actual BPF implementation too... <braunr> we want the filter implementation as close to the packet source as possible <antrik> I have been using eth-multiplexer for at least two years now <braunr> hm, there is a "snoop" source type, using raw sockets <braunr> too far from the packet source, but i'll try it anyway <braunr> hm wrong, snoop was the solaris packet filter fyi