The GSoC Experience - Porting LwIP to the GNU/Hurd

Posted 2 July 2017 & filed under English.

Adding support for IPv6 has took most of my time this week, and part of the previous one, but it's done and now I can say the LwIP server supports IPv6 :). Specifically, these items are implemented:

Automatic assignation of link-local addresses.
Automatic auto-configuration with SLAAC.
Manual assignation of addresses.
The MLD protocol.

Enabling IPv6 in LwIP is not hard, its a matter of setting some macros. And the creation of link-local addresses is pretty straightforward too, it's enough to call one function that uses EUI-64 to generate the interface identifier and adds it to the prefix fe80::/64. In fact, this is a process that can be done locally, without any connectivity. However, the stack offers the option of using DAD to check if the address is already in use. This isn't likely to happen, since the interface identifier is based on the link address, which should be unique, but I left it enabled just in case.

Regarding the SLAAC auto-configuration, LwIP makes it easy for us again. It's enough to set a macro and enable a flag in the interfaces that we want to auto-configure. Even so, from the three available SLAAC modes, I've only tested one: the "SLAAC alone" mode, in which the gateway provides the network prefix, its link-local address and the DNS server, whilst the the node has to calculate its interface id by using EUI-64 or randomly if the prefix is not 64. There's another mode called "SLACC with DHCPv6 stateless" in which the gateway provides its link-local address and the network prefix, but the node is expected to contact the DHCv6 server to get the DNS server; and finally there's also a third mode called "SLAAC with DHCPv6 stateful" in which the gateway only provides its link-local address and the node is expected to contact the DHCPv6 server to get the whole address, including the prefix, and also the DNS server. I haven't tested these last two modes but taking a look at some configuration macros in LwIP, they seem to be supported too. Another line in my TODO.

About the remaining items of the list, as expected, they can be enabled through macros and function calls. However, I was stuck for a few days because I wasn't able to ping any other address than the link-local one. The fact is that everything seemed to be correct, I could check with the debugger that the addresses were indeed assigned to their interfaces, but for some reason the pings were still not being responded. After some time making tests, I found that the ping queries didn't even arrive to the stack, while all the other frames did it. This error made me learn a basic point: Ethernet multicast is required for IPv6 to work.

Let me give a brief explanation of what was happening. A key part of IPv6 is the ND protocol that works over ICMPv6 and is used by the above mentioned DAD protocol to know whether an address is in use. But it's also used by the SLAAC protocol to get the network prefix and finally, it's also used to resolve a link address from a network address. That is, is the replacement for ARP. The problem is that this protocol sends messages that are addressed to Ethernet multicast addresses, but not always the same address: the destination link address is based on the destination network address. As we've seen in the Wikipedia article above linked, IPv6 uses destination MAC addresses that start by 33:33 and end by the two last hextets of the destination IPv6 address. For instance, to send a message to ff02::2, the link address must be 33:33:00:00:00:02. Well then, the default behavior of most of the NIC drivers is to allow the frames that are aimed to some common addresses such as 33:33:00:00:00:01, for ff02::1, but to discard all the other 33:33:X frames which aren't in an internal list managed by the OS. The aim is to discard the frames that are known to not be addressed to our node in order to prevent the stack from doing this filtering and save resources.

When it comes to resolve addresses, the ND protocol sends its frames to the so-called solicited node address. Therefore, if we want to resolve, say, fc00:124::178, frames will be sent to ff02::1:ff00:178 and thus the destination link address will be 33:33:ff:00:01:78. Note the difference with ARP, where the frames were always sent to ff:ff:ff:ff:ff:ff. Now we can see why the pings weren't arriving, because the destination link address wasn't even being resolved. For now, the Hurd doesn't provide a way to manage the list of the Ethernet multicast addresses allowed by the NIC, so I solved this by simply allowing all multicast packets and leaving the filtering up to the stack.

However, there were still two unresolved questions. First, why LwIP doesn't allow me to set a prefix length when I add a new IPv6 address to an interface, and the answer is that, for the moment, LwIP only supports 64-bit prefixes, so all addresses are supposed to be /64. Second, why there's no way to set a default IPv6 gateway in LwIP. I referred the question to David van Moolenbroek, who ported LwIP to Minix, and it's worth to share his answer as it focuses on a key difference between IPv4 and IPv6, which I was unaware of:

This is an area where IPv4 and IPv6 are vastly different. As I mentioned above, for the true RFC model of IPv6, there is the conceptual distinction between "hosts" and "routers". As per this true IPv6 model, a host never forwards packets, and is expected to configure itself based on router announcements from routers on that local network.

The routers are the gateways. More specifically: not all routers are gateways, but a router can announce that it is a gateway, making it one of the "default routers" (see RFC 4861 for all the details). In any case, *only* routers can be gateways, because a non-router is by definition a host and a host never forwards packets. Also, all functional routers are required to send out router advertisements. Combining all of this results in the conclusion that *if* there is a gateway (i.e. a default router) on the network, lwIP will learn about it through router advertisements, so it never needs to be configured with a gateway address. And thus, there is no (manual) way to do that either.

The main "victim" of this whole model is the wish for a host to implement a routing rule of the type "for particular address range X, use particular gateway Y". IPv6 routers follow an all-or-nothing concept here: either they can handle all possible destinations (possibly by redirecting to another local router) or they aren't a gateway at all. Therefore, every default router is good for all possible address ranges. This makes the true IPv6 model somewhat of a mismatch for the more traditional approach of (IPv4) routing tables, at least for hosts.

But there's still one remaining issue: how to deal with an IPv4-IPv6 dual-stack in the Hurd. The problem is this: in the Hurd, the servers name space is the file system, so all translator must be associated to a particular file and user programs use that file's path to find the translator. However, there's no one but two names for the TCP/IP stack: /servers/socket/2, for the IPv4 stack; and /servers/socket/26, for the IPv6 stack. This could suggest that we may just run two instances of the stack, each one working on a different protocol family. But in the real world, user programs expect an IPv6 socket to listen in all possible addresses, and that includes all IPv4 addresses. In fact, there are some programs that explicitly listen using the two families. For this reason, what we did with pfinet and what we're doing now with the LwIP server is to run it for one family with a particular name, say, IPv4 in /servers/socket/2, and then make it install itself on the other name for the other family, for instance /servers/socket/26 for IPv6, so there's only one instance to work with both families.

Once we have a dual-stack like LwIP and a way to refer to it using two names, we only need to find a way to know which of the two names has the user task used to find the server. Again, the solution is to copy what pfinet does and create two libports classes, one for each family, and configure libtrivfs to give each new protid the proper class depending on the underlying file used by the user. The only operation that actually needs to know which family is it working with is socket-create, all the other operations work transparently whether the socket is AF_INET or AF_INET6.