Several people have expressed interested in a port of GNU/Hurd for the ARM architecture. Luckily a userspace port of the Hurd servers and glibc is underway. As early as January 1, 2024 an AArch64 port is making some progress. Sergey did some hacking on glibc, binutils, GCC, and added some headers to GNU Mach. He was able to build the core Hurd servers: ext2fs, proc, exec, and auth.

One would think that he would need to port GNU Mach to run the binaries, but Sergey ran a statically linked hello world executable on GNU/Linux, under GDB, being careful to skip over and emulate syscalls and RPCs. The glibc port has the TLS setup, hwcaps / cpu-features, and ifuncs.

Now to some of the more technical things:

  • The TLS implementation is basically complete and working. We're using tpidr_el0 for the thread pointer (as can be seen in the listing above), like GNU/Linux and unlike Windows (which uses x18, apparently) and macOS (which uses tpidrro_el0). We're using "Variant I" layout, as described in "ELF Handling for Thread-Local Storage", again same as GNU/Linux, and unlike what we do on both x86 targets. This actually ends up being simpler than what we had for x86! The other cool thing is that we can do msr tpidr_el0, x0 from userspace without any gnumach involvement, so that part of the implementation is quite a bit simpler too.

  • Conversely, while on x86 it is possible to perform "cpuid" and identify CPU features entirely in user space, on AArch64 this requires access to some EL1-only registers. On Linux and the BSDs, the kernel exposes info about the CPU features via AT_HWCAP (and more recently, AT_HWCAP2) auxval entries. Moreover, Linux allows userland to read some otherwise EL1-only registers (notably for us, midr_el1) by catching the trap that results from the EL0 code trying to do that, and emulating its effect. Also, Linux exposes midr_el1 and revidr_el1 values through procfs.

  • The Hurd does not use auxval, nor is gnumach involved in execve anyway. So I thought the natural way to expose this info would be with an RPC, and so in mach_aarch64.defs I have an aarch64_get_hwcaps routine that returns the two hwcaps values (using the same bits as AT_HWCAP{,2}) and the values of midr_el1/revidr_el1. This is hooked to init_cpu_features in glibc, and used to initialize GLRO(dl_hwcap) / GLRO(dl_hwcap2) and eventually to pick the appropriate ifunc implementations.

  • The page size (or rather, paging granularity) is notoriously not necessarily 4096 on ARM, and the best practice is for userland not to assume any specific page size and always query it dynamically. GNU Mach will (probably) have to be built support for some specific page size, but I've cleaned up a few places in glibc where things were relying on a statically defined page size.

  • There are a number of hardware hardening features available on AArch64 (PAC, BTI, MTE — why do people keep adding more and more workarounds, including hardware ones, instead of rewriting software in a properly memory-safe language...). Those are not really supported right now; all of them would require some support form gnumach side; we'll probably need new protection flags (VM_PROT_BTI, VM_PROT_MTE), for one thing.

We would need to come up with a design for how we want these to work Hurd-wide. For example I imagine it's the userland that will be generating PAC keys (and settings them for a newly exec'ed task), since gnumach does not contain the functionality to generate random values (nor should it); but this leaves an open question of what should happen to the early bootstrap tasks and whether they can start using PAC after initial startup.

  • Unlike on x86, I believe it is not possible to fully restore execution context (the values of all registers, including pc and cpsr) purely in userland; one of the reasons for that being that we can apparently no longer do a load from memory straight into pc, like it was possible in previous ARM revisions. So the way sigreturn () works on Linux is of course they have it as a syscall that takes a struct sigcontext, and writes it over the saved thread state, which is similiar to thread_set_state () in Mach-speak. The difference being that thread_set_state () explicitly disallows you to set the calling thread's state, which makes it impossible to use for implementing sigreturn (). So I'm thinking we should lift that restriction; there's no reason why thread_set_state () cannot be made to work on the calling thread; it only requires some careful coding to make sure the return register (%eax/%rax/x0) is not rewritten with mach_msg_trap's return code, unlike normally.

But other than that, I do have an AArch64 versions of trampoline.c and intr-msg.h (complete with SYSCALL_EXAMINE & MSG_EXAMINE). Whether they work, we'll only learn once we have enough of the Hurd running to have the proc server.

MIG seems to just work (thanks to all the Flávio's work!). We are using the x86_64 ABI, and I have not seen any issues so far — neither compiler errors / failed static assertions (about struct sizes and such), nor hardware errors from misaligned accesses.

To bootstrap gnumach someone must fix the console, set up the virtual memory, thread states, context switches, irqs and userspace entry points, etc.

Also, there is a bunch of design work to do.

Will/can AArch64 use the same mechanism for letting userland handle interrupts? Do we have all the mechanisms required for userland to poke at specific addresses in memory (to replace I/O ports)? — I believe we do, but I haven't looked closely.

AFAIK there are no I/O ports in ARM, the usual way to configure things is with memory-mapped registers, so this might be easy. About IRQs, probably it needs to be arch-specific anyway.

What should the API for manipulating PAC keys look like? Perhaps it should be another flavor of thread state, but then it is really supposed to be per-task, not per-thread. Alternatively, we could add a few aarch64-specific RPCs in mach_arrch64.defs to read and write the PAC keys. But also AFAICS Mach currently has no notion of per-task arch-specific data (unlike for threads, and other than the VM map), so it'd be interesting to add one. Could it be useful for something else?

What are the debugging facilities available on ARM / AArch64? Should we expose them as more flavors of thread state, or something else? What would GDB need?

Should gnumach accept tagged addresses (like PR_SET_TAGGED_ADDR_CTRL on Linux)?

Can we make Linux code (in-Mach drivers, pfinet, netdde, ...) work on AArch64?

One can trivially port pfinet to AArch64. Eventually, we should fix any remaining issues with lwip. That way we can stop spending time maintaining pfinet, which is Linux's old abandoned networking stack.

Developers will have a difficult time porting the in-Mach drivers (arm64 was probably not even a thing at the time). We can perhaps port Netdde, but we should instead get our userspace drivers from a rumpkernel.

Starting the kernel itself should be easy, thanks to GRUB, but it shouldn't be too hard to add support for U-Boot either if needed.

I think more issues might come out setting up the various pieces of the system. For example, some chips have heterogeneous cores, (e.g. mine has two A72 cores and four A53 cores) so SMP will be more complicated.

Also, about the serial console, it might be useful at some point to use a driver from userspace, if we can reuse some drivers from netbsd or linux, to avoid embedding all of them in gnumach.

IRC, freenode, #hurd, 2012-11-15

<matty3269> Well, I have a big interest in the ARM architecture, I worked
  at ARM for a bit too, and I've written my own little OS that runs on
  qemu. Is there an interest in getting hurd running on ARM?
<braunr> matty3269: not really currently
<braunr> but if that's what you want to do, sure
<tschwinge> matty3269: Well, interest -- sure!, but we don't really have
  people savvy in low-level kernel implementation on ARM.  I do know some
  bits about it, but more about the instruction set than about its memory
  architecture, for example.
<tschwinge> matty3269: But if you're feeling adventurous, by all means work
  on it, and we'll try to help as we can.
<tschwinge> matty3269: There has been one previous attempt for an ARM port,
  but that person never published his code, and apparently moved to a
  different project.
<tschwinge> matty3269: I can help with toolchains (GCC, etc.) things for
  ARM, if there's need.
<matty3269> tschwinge: That sounds great, thanks! Where would you recommend
  I start (at the moment I've got Mach checked out and am trying to get it
  compiled for i386)
<matty3269> I'm guessing that the Mach micro-kernel is all that would need
  to be ported or are there arch-dependant bits of code in the server
  processes?
<tschwinge> matty3269:
  http://www.gnu.org/software/hurd/faq/system_port.html has some
  information.  Mach is the biggest part, yes.  Then some bits in glibc and
  libpthread, and even less in the Hurd libraries and servers.
<tschwinge> matty3269: Basically, you'd need equivalents for the i386 (and
  similar) directories, yep.
<tschwinge> Though, you may be able to avoid some cruft in there.
<tschwinge> Does building for x86 have any issues?
<tschwinge> matty3269: How is generally your understanding of the Hurd on
  Mach system architecture, and on microkernel-based systems generally, and
  on Mach in particular?
<matty3269> tschwinge: yes, it seems to be progressing... I've got mig
  installed and it's just compiling now
<matty3269> hmm, not too great if I'm honest, I've done mostly monolithic
  kernel development so having such low-level processes, such as
  scheduling, done in user-space seems a little strinage
<tschwinge> Ah, yes, MIG will need a little bit of porting, too.  I can
  help with that, but that's not a priority -- first you have to get Mach
  to boot at all; MIG will only be needed once you need to deal with RPCs,
  so user-land/kernel interaction, basically.  Before, you can hack around
  it.
<matty3269> tschwinge: I have been running a GNU/Hurd system for a while
  now though
<tschwinge> I'm happy to tell you that the schedules is still in the
  kernel.  ;-)
<tschwinge> OK, good, so you know about the basic ideas.
<braunr> matty3269: there has to be machine specific stuff in user space
<braunr> for initial thread contexts for example
<matty3269> tschwinge: Ok, just got gnumach built
<braunr> but there isn't much and you can easily base your work from the
  x86 implementation
<tschwinge> Yes.  Mach itself is the more difficult one.
<matty3269> braunr: Yeah, looking around at things, it doesn't seem that
  there will be too much work involoved in the user-space stuff
<tschwinge> braunr: Do you know off-hand whether there are some old Mach
  research papers describing architecture ports?
<tschwinge> I know there are some describing the memory system (obviously),
  and I/O system -- which may help matty3269 to understand the general
  design/structure.
<tschwinge> We might want to identify some documents, and make a list.
<braunr> all mach related documentation i have is available here:
  ftp://ftp.sceen.net/mach/
<braunr> (also through http://)
<tschwinge> matty3269: Oh, definitely I'd suggest the Mach 3 Kernel
  Principles book.  That gives a good description of the Mach architecture.
<matty3269> Great, that's my weekends reading then!
<braunr> you don't need all that for a port
<matty3269> Is it possible to run the gnumach binary standalone with qemu?
<braunr> you won't go far with it
<braunr> you really need at least one program
<braunr> but sure, for a port development, it can easily be done
<braunr> i'd suggest writing a basic static application for your tests once
  you reach an advanced state
<braunr> the critical parts of a port are memory and interrupts
<braunr> and memory can be particularly difficult to implement correctly
<tschwinge> matty3269: I once used QEMU's
  virtual-FAT-filesystem-from-a-directory-on-the-host, and configured GRUB
  to boot from that one, so it was easy to quickly reboot for kernel
  development.
<braunr> but the good news is that almost every bsd system still uses a
  similar interface
<tschwinge> matty3269: And, you may want to become familiar with QEMU's
  built-in gdbserver, and how to connect to and use that.
<braunr> so, for example, you could base your work from the netbsd/arm pmap
  module
<tschwinge> matty3269: I think that's better than starting on real
  hardware.
<braunr> tschwinge: you can use -kernel with a multiboot binary now

qemu.

<braunr> tschwinge: and even creating iso images is so fast it's not any
  slower

<braunr> ah, the gnumach executable is a correct elf image
<matty3269> Is there particular reason that mach is linked at 0xc0100000?
<matty3269> or is that where it is expected to be in VM>
<tschwinge> That's my understanding.
<braunr> kernels commmonly sti at high addresses
<braunr> that's the "standard" 3G/1G split for user/kernel space
<matty3269> I think Linux sits at a similar VA for 32-bit
<braunr> no
<matty3269> Oh, I thought it did, I know it does on ARM, the kernel is
  mapped to 0xc000000 
<braunr> i don't know arm, but are you sure about this number ?
<braunr> seems to lack a 0
<matty3269> Ah, yes sorry
<matty3269> so 0xC0000000
<braunr> 0xc0100000 is just 1 MiB above it
<braunr> the .text section of linux on x86 actually starts at c1000000
  (above 16 MiB, certainly to preserve as much dma-able memory since modern
  machines now have a lot more)
<matty3269> so with gnumach, does the boot-up sequence use PIC until VM is
  active and the kernel mapped to the linking address?
<braunr> no
<braunr> actually i'm not certain of the details
<braunr> but there is no PIC
<braunr> either special sections are linked at physical addresses
<braunr> or it relies on the fact that all executable code uses near jumps
<braunr> and uses offsets when accessing data
<braunr> (which is why the kernel text is at 3 GiB + 1 MiB, and not 3 GiB)
<matty3269> hmm,
<braunr> but you shouldn't worry about that i suppose, as the protocol
  between the boot loader and an arm kernel will certainly not be the saem
<braunr> same*
<matty3269> indeed, ARM is tricky because memory maps are vastly differnt
  on every platform

IRC, freenode, #hurd, 2012-11-21

<matty3269> Well, I have a ARM gnumach kernel compiled. It just doesn't
  run! :)
<braunr> matty3269: good luck :)

IRC, freenode, #hurd, 2013-01-30

<slpz> Hi, i've read there's an ongoing effort to port GNU Mach to ARM. How
  is it going?
<braunr> not sure where you read that
<braunr> but i'm pretty sure it's not started if it exists
<slpz> braunr: http://www.gnu.org/software/hurd/open_issues/arm_port.html
<braunr> i confirm what i said
<slpz> braunr: OK, thanks. I'm interested on it, and didn't want to
  duplicate efforts.
<braunr> little addition: it may have started, but we don't know about it

IRC, freenode, #hurd, 2013-09-18

<Hooligan0> as i understand ; on startup, vm_resident.c functions configure
  the whole available memory ; but at this point the system does not split
  space for kernel and space for future apps
<Hooligan0> when pages are tagged to be used by userspace ?
<braunr> Hooligan0: at page fault time
<braunr> the split is completely virtual, vm_resident deals with physical
  memory only
<Hooligan0> braunr: do you think it's possible to change (at least)
  pmap_steal_memory to mark somes pages as kernel-reserved ?
<braunr> why do you want to reserve memory ?
<braunr> and which memory ?
<Hooligan0> braunr: first because on my mmu i have two entry points ; so i
  want to set kernel pages into a dedicated space that never change on
  context switch (for best cache performance)
<Hooligan0> braunr: and second, because i want to use larger pages into
  kernel (1MB) to reduce mmu work
<braunr> vm_resident isn't well suited for large pages :(
<braunr> i don't see the effect of context switch on kernel pages
<Hooligan0> at many times, context switch flush caches
<braunr> ah you want something like global pages on x86 ?
<Hooligan0> yes, something like
<braunr> how is it done on arm ?
<Hooligan0> virtual memory is split into two parts depending on msb bits
<Hooligan0> for example 3G/1G
<Hooligan0> MMU will use two pages tables depending on vaddr (hi-side or
  low-side)
<braunr> hi is kernel, low is user ?
<Hooligan0> so, for the moment i've put mach at 0xC0000000 -> 0xFFFFFFFF  ;
  and want to use 0x00000000 -> 0xBFFFFFFF for userspace
<Hooligan0> yes
<braunr> ok, that's what is done for x86 too
<Hooligan0> 1MB pages for kernel ; and 4kB (or 64kB) pages for apps
<braunr> i suggest you give up the large page stuff
<braunr> well, you can use them for the direct physical mapping, but for
  kernel objects, it's a waste
<braunr> or you can rewrite vm_resident to use something like a buddy
  allocator but it's additional work
<Hooligan0> for the moment it's waste ; but with some littles changes this
  allow only one level of allocation mapping ;  -i think- it's better for
  performances
<braunr> Hooligan0: it is, but not worth it
<Hooligan0> will you allow changes into vm_resident if i update i386 too ?
<braunr> Hooligan0: sure, as long as these are relevant and don't introduce
  regressions
<Hooligan0> ok
<braunr> Hooligan0: i suggest you look at x15, since you may want to use it
  as a template for your own changes
<braunr> as it was done for the slab allocator for example
<braunr> e.g. x15 already uses a buddy allocator for physical memory