Posted & filed under English.

When we tried to configure the stack with ifupdown during the boot we found two problems. Fist, the system failed to find the loopback interface to give it an address, and second, the DHCP client wasn't able to get the MAC address of the interfaces it was trying to configure. For doing these actions, ifupdown uses the iioctl interface. To this day, all ioctl operations implemented by pfinet are implemented in the LwIP server as well, that is, all operations declared in iioctl but SIOCSIFMETRIC and SIOCDIFADDR. Once everything was working, it was time to try again and see how the LwIP server was reacting to ifupdown actions.

Let's start by the loopback interface, Basically, LwIP provides two ways to enable it: the first is to actually add a new interface configured to handle the traffic for 127.0.0.0/8, and the second is telling the stack to behave as if this interface existed, but without creating it and handling all loopback traffic using the other interfaces. At the beginning, I chose the second one, because adding a new interface would add complexity to the fsysopts operation. However, after seeing that ifupdown sends ioctls targeted to an interfaces called "lo" I had to choose the first option and configure the stack to create a new properly configured interface.

Regarding the DHCP client, once all iioctl operations were working the error getting the MAC address disappeared, but the client just kept waiting forever instead. Making my tests with gdb and Wireshark I discovered that the client was actually sending the DHCPDISCOVER message, but that message never reached the wire. After a little research, I found two problems: on the one hand, the stack couldn't send packets with source IP address equal to 0.0.0.0; on the other, the stack wasn't able to send packets to 255.255.255.255 neither.

About the first error, it's worth highlighting that the IP address assigned to an interface may not necessarily be the same as the address bound to a socket. The user can bind the address 0.0.0.0 to a socket to indicate it must be able to receive messages from all available interfaces, but of course, when using that same socket for sending, the source address is replaced by the address of the outgoing interface. Apart from that, the reason why a packet couldn't be sent to 255.255.255.255 was because the outgoing interface hasn't an assigned address, and since INADDR_BROADCAST == INADDR_NONE, the stack was thinking the destination was itself and pushing the message to the loopback queue.

In both cases, I fixed the problem by adding the proper exceptions to the relevant if statements for allowing the DHCPDISCOVER message to be sent. Particularly, now it's allowed to send packets from 0.0.0.0, but only UDP packets and only when the selected interface hasn't an valid address. Similarly, now no broadcast message will be never sent to the loopback queue. Based on my tests, everything seems to be working fine, but it could be these exceptions lead the stack to a wrong behavior in particular scenarios, so I'll have to remain alert.

One of the issues that took me more time this week has been trying to debug the translator while the system was booting. To this end, I tried to launch a subhurd. I spent more the two days with this, and finally made it based in this guide. In particular, I followed this steps:

  1. Download a clean system image and add it as a new qemu drive.

    In my case, the relevant partition can be found at /dev/hd2s5.

  2. Boot the subhurd with the command:

    boot --kernel-command-line="fastboot root=pseudo-root" -f hd0=/dev/hd2 -f hd0s1=/dev/hd2s1 -f hd0s5=/dev/hd2s5 -f eth0=/dev/eth1 -T typed device:hd2s5
  3. Edit /etc/fstab in the subhurd, and replace the first line for the next:

    /dev/pseudo-root	      /               ext2    defaults        0       1
  4. From here, the subhurd can be started with the command:

    boot -f hd0=/dev/hd2 -f hd0s1=/dev/hd2s1 -f hd0s5=/dev/hd2s5 -f eth0=/dev/eth1 -T typed device:hd2s5

The most interesting option in the boot command is -d, that pauses the subhurd some times during the boot for the user to attach a debugger. However, in my case this option only pauses the subhurd once before booting, and it's not paused anymore, so my initial goal of debugging a translator when booting hasn't been achieved, but I was able to restart the networking service and debug the server then. Enough for me.

Moving on, past week I talked about a problem in device_get_status(), who always returned 0x00000041 for the flags value when called with the NET_STATUS parameter, regardless the real flags set in the device. This week I took a look at the code and found the problem seemed to be in the controller, I switched from e1000 to rtl8139 and it worked fine. That's why I finally decided to keep the NET_STATUS option as it's also available in eth-multiplexer.