Posted & filed under English.

After finishing with IPv6, I spent the last week fixing some bugs, mainly those related with multithreading. This is an issue I never thought on until now, and turned up to be a real mess. I shall give you a summary.

First, LwIP has a main thread called "tcpip_thread" where the stack is actually running, so the stack itself is single-threaded and non thread-safe. Its multithreading system consists on ensuring the tcpip thread is the only one to have access to the stack resources, while the other threads communicate with it by message passing. A global semaphore makes sure the requests received by the main thread are serviced sequentially. You may find further information in the LwIP documentation. Well then, the tcpip thread is initialized when starting the translator, but in our case, the stack is also restarted for each call to fsysopts. As a consequence, there were as many tcpip threads as calls to fsysopts + 1. For some reason, the stack was still working, but many threads were being wasted. This bug is fixed now because the stack is not restarted anymore for each call to fsysopts, instead, old interfaces are removed and new ones created on the same main thread. However, the thread doesn't like to have its interfaces changed in run time, so new bugs have arose in which I'll have to work later.

Another issue was related with the Hurd's architecture. The server has a component called "Ethernet module" that is responsible for taking the data generated by the stack and sending it to the driver, and vice versa. In the Hurd, the communication with the driver is done by message passing through the device interface, and a thread is needed for listening to any incoming message from the driver and calling the demuxer. In our case, there was a design error and a new listening thread was created for each new interface added to the stack. Further, each call to fsysopts restarted the stack and created a new thread for each new interface, without removing the previous ones. I fixed this problem by starting one single listening thread from main() when the translator is started.

I discovered the last threading error when I tried to run SSH over LwIP. After a few seconds, the server used to crash because of an overflow on a variable used to count the number of threads blocked on lwip_select(). The type of that variable is uint8_t, so there were... 255 waiting threads!. I spent about three days trying to understand what was going on, but finally found the error and solved it. It's worth to examine the error carefully, because it's very useful for understanding how the Hurd works.

Let's take a look at the hurdselect.c file in Glibc, particularly at two sections: the one starting at the line 280 and the two if statements at lines 494 and 498. At the line 280 and following lines, we can see how a call to select() from a user program may lead to many RPC calls to io_select(). Each RPC call is responsible for one single socket, so if the user has set, say, three socket among all the FD_SETs, then three RPCs will be performed simultaneously, one for each socket and each one with its own thread. When an event occurs on one of the three sockets, its io_select() operation returns and its thread is destroyed, but the other two remaining RPCs are blocked until its timeout is over. If no timeout is given, the threads are blocked forever. The SSH server calls select() with no timeout over three sockets for each character it sends or receives, so it can generate hundreds of blocked threads in a matter of seconds.

This design is pretty smart actually, because it allows the user to work transparently over multiple TCP/IP stacks. We can see it using SSHD as an example. As we can see in the code, the server doesn't assume it's working over a dual-stack, in that case, there would be enough to create a IPv6 socket to receive messages addressed to IPv4 addresses as well. Instead, SSH creates IPv4 and IPv6 sockets explicitly, and sets the IPV6_V6ONLY option on the last one, to prevent it from listening on IPv4 addresses just in case there's a dual-stack bellow it. In the Hurd, the RPC to get the IPv4 socket would be addressed to /servers/socket/2 while the IPv6 socket would be got from /servers/socket/26. Therefore, if the user calls to select() and includes the two sockets as SSHD does, then one io_select() RPC will go to /servers/socket/2 and the other one to /servers/socket/26.

But, how can we cancel the pending io_select() threads that have no timeout? and more importantly: when there's an event in one socket, how can we know which are the threads that were created at the same time and are not useful anymore? The answer is at the lines 494 and 498 in hurdselect.c. Each thread has a reply port that is destroyed when the thread is not useful anymore, and the operation receives a copy of the port name, so it can use it to receive notifications. Libports has a particular function for that. If we call ports_interrupt_self_on_notification(), we can cancel the current thread if something happens on the given port, for instance, when it's destroyed.

However, after all the pending threads still were not being canceled. The problem here was that the standard function where the threads were blocked on, pthread_cond_wait(), didn't respond to cancel requests from hurd_thread_cancel(). It was strange, because pthread_cond_wait() is a valid cancellation point. But in the Hurd servers we need to call our own non-standard version, pthread_hurd_cond_wait_np(), which reacts to requests from hurd_thread_cancel() and stops blocking the thread.