[[!meta copyright="Copyright © 2024 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] # ServerBootV2 RFC Draft [[!inline pagenames=hurd/what_is_an_os_bootstrap raw=yes feeds=no]] Sergey Bugaev proposed: The Hurd's current bootstrap, [[Quiet-Boot|hurd/bootstrap]] (a biased and made-up name), is fragile, hard to debug, and complicated: * `Quiet-boot` chokes on misspelled or missing boot arguments. When this happens, the Hurd bootstrap will likely hang and display nothing. This is tricky to debug. * `Quiet-Boot` is hard to change. For instance, when the Hurd developers added `acpi`, the `pci-arbiter`, and `rumpdisk`, they struggled to get `Quiet-Boot` working again. * `Quiet-Boot` forces each bootstrap task to include special bootstrap logic to work. This limits what is possible during the bootstrap. For instance, it should be trivial for the Hurd to support netboot, but `Quiet-Boot` makes it hard to add `nfs`, `pfinet`, and `isofs` to the bootstrap. * `Quiet-Boot` hurts other Hurd distributions too. When Guix developers updated their packaged version of the Hurd, that included support for SATA drives, a simple misspelled boot argument halted their progress for a few weeks. The alternative `ServerBoot V2` proposal (which was discussed on [irc](https://logs.guix.gnu.org/hurd/2023-07-18.log) and is similar to the previously discussed [bootshell proposal](https://mail-archive.com/bug-hurd@gnu.org/msg26341.html)) aims to code all or most of the bootstrap specific logic into one single task (`/hurd/serverboot`). `Serverboot V2` has a number of enticing advantages: * It simplifies the hierarchical dependency of translators during bootstrap. Developers should be able to re-order and add new bootstrap translators with minimal work. * It gives early bootstrap translators like `auth` and `ext2fs` standard input and output which lets them display boot errors. It also lets signals work. * One can trivially use most Hurd translators during the bootstrap. You just have to link them statically. * `libmachdev` could be simplified to only expose hardware to userspace; it might even be possible to remove it entirely. Also the `pci-arbiter`, `acpi`, and `rumpdisk` could be simplified. * Developers could remove any bootstrap logic from `libdiskfs`, which detects the bootstrap filesystem, starts the `exec` server, and spawns `/hurd/startup`. Instead,`libdiskfs` would only focus on providing filesystem support. * If an error happens during early boot, the user could be dropped into a REPL or mini-console, where he can try to debug the issue. We might call this `Bootshell V2`, in reference to the original proposal. This could be written in lisp. Imagine having an extremely powerful programming language available during bootstrap that is only [436 bytes!](https://justine.lol/sectorlisp2) * It would simplify the code for subhurds by removing the logic from each task that deals with the OS bootstrap. Now that you know why we should use `Serverboot V2`, let's get more detailed. What is `Serverboot V2` ? `Serverboot V2` would be an empty filesystem dynamically populated during bootstrap. It would use a `netfs` like filesystem that will populate as various bootstrap tasks are started. For example, `/servers/socket2` will be created once `pfinet` starts. It also temporarily pretends to be the Hurd process server, `exec`, and `/` filesystem while providing signals and `stdio`. Let's explain how `Serverboot V2` will bootstrap the Hurd. **FIXME The rest of this needs work.** Any bootstrap that the Hurd uses will probably be a little odd, because there is an awkward and circular startup-dance between `exec`, `ext2fs`, `startup`, `proc`, `auth`, the `pci-arbiter`, `rumpdisk`, and `acpi` in which each translator oddly depends on the other during the bootstrap, as this ascii art shows. pci-arbiter | acpi | rumpdisk | ex2fs -- storeio / \ exec startup / \ auth proc This means that there is no *perfect* Hurd bootstrap design. Some designs are better in some ways and worse in others. `Serverboot V2` would simplify other early bootstrap tasks, but all that complicated logic would be in one binary. One valid criticism of `Serverboot V2` is that it will may be a hassle to develop and maintain. In any case, trying to code the *best* Hurd bootstrap may be a waste of time. In fact, the Hurd bootstrap has been rewritten several times already. Our fearless leader, Samuel, feels that rewriting the Hurd bootstrap every few years may be a waste of time. Now that you understand why Samuel's discourages a Hurd bootstrap rewrite, let's consider why we should develop `Serverboot V2`. # How ServerBoot V2 will work Bootstrap begins when Grub and GNU Mach start some tasks, and then GNU Mach resumes the not-yet-written `/hurd/serverboot`. `/hurd/serverboot` is the only task to accept special ports from the kernel via command line arguments like `--kernel-task`; `/hurd/serverboot` tries to implement/emulate as much of the normal Hurd environment for the other bootstrap translators. In particular, it provides the other translators with `stdio`, which lets them read/write without having to open the Mach console device. This means that the various translators will be able to complain about their bad arguments or other startup errors, which they cannot currently do. `/hurd/serverboot` will provide a basic filesystem with netfs, which gives the other translators a working `/` directory and `cwd` ports. For example, `/hurd/serverboot`, would store its port at `/dev/netdde`. When `/hurd/netdde` starts, it will reply to its parent with `fsys_startup ()` as normal. `/hurd/serverboot` will also emulate the native Hurd process server to early bootstrap tasks. This will allow early bootstrap tasks to get the privileged (device master and kernel task) ports via the normal glibc function `get_privileged_ports (&host_priv, &device_master).` Other tasks will register their message ports with the emulated process server. This will allow signals and messaging during the bootstrap. We can even use the existing mechanisms in glibc to set and get init ports. For example, when we start the `auth` server, we will give every task started thus far, their new authentication port via glibc's `msg_set_init_port ()`. When we start the real proc server, we query it for proc ports for each of the tasks, and set them the same way. This lets us migrate from the emulated proc server to the real one. **Fix me: Where does storeio (storeio with** `device:@/dev/rumpdisk:wd0`**), rumpdisk, and the pci-arbiter come in?** Next, we start `ext2fs`. We reattach all the running translators from our `netfs` bootstrap filesystem onto the new root. We then send those translators their new root and cwd ports. This should happen transparently to the translators themselves! # Supporting Netboot `Serverboot V2` could trivially support netboot by adding `netdde`, `pfinet` (or `lwip`), and `isofs` as bootstrap tasks. The bootstrap task will start the `pci-arbiter`, and `acpi` (FIXME add some more detail to this sentence). The bootstrap task starts `netdde`, which will look up any `eth` devices (using the device master port, which it queries via the fake process server interface), and sends its fsys control port to the bootstrap task in the regular `fsys_startup ()`. The bootstrap task sets the fsys control port as the translator on the `/dev/netdde` node in its `netfs` bootstrap fs. Then `/hurd/serverboot` resumes `pfinet`, which looks up `/dev/netdde`. Then `pfinet` returns its `fsys` control port to the bootstrap task, which it sets on `/servers/socket/2`. Then bootstrap resumes `nfs`, and `nfs` just creates a socket using the regular glibc socket () call, and that looks up `/servers/socket/2`, and it just works. **FIXME where does isofs fit in here?** Then `nfs` gives its `fsys` control port to `/hurd/serverboot`, which knows it's the real root filesystem, so it take the netdde's and pfinet's fsys control ports. Then it calls `file_set_translator ()` on the nfs on the same paths, so now `/dev/netdde` and `/servers/socket/2` exist and are accessible both on our bootstrap fs, and on the new root fs. The bootstrap can then take the root fs to broadcast a root and cwd port to all other tasks via a `msg_set_init_port ()`. Now every task is running on the real root fs, and our little bootstrap fs is no longer used. `/hurd/serverboot` can resume the exec server (which is the first dynamically-linked task) with the real root fs. Then we just `file_set_translator ()` on the exec server to `/servers/exec`, so that `nfs` doesn't have to care about this. The bootstrap can now spawn tasks, instead of resuming ones loaded by Mach and grub, so it next spawns the `auth` and `proc` servers and gives everyone their `auth` and `proc` ports. By that point, we have enough of a Unix environment to call `fork()` and `exec()`. Then the bootstrap tasks would do the things that `/hurd/startup` used to do, and finally spawns (or execs) `init / PID 1`. With this scheme you will be able to use ext2fs to start to your root fs via as `/hurd/ext2fs.static /dev/wd0s1`. This eliminates boot arguments like `--magit-port` and `--next-task`. This also simplifies `libmachdev`, which exposes devices to userspace via some Mach `device_*` RPC calls, which lets the Hurd contain device drivers instead of GNU Mach. Everything that connects to hardware can be a `machdev`. Additionally, during the `Quiet Boot` bootstrap,`libmachdev` awkwardly uses `libtrivfs` to create a transient `/` directory, so that the `pci-arbiter` can mount a netfs on top of it at bootstrap. `libmachdev` needs `/servers/bus` to mount `/pci,`and it also needs `/servers` and `/servers/bus` (and `/dev`, and `/servers/socket`). That complexity could be moved to `ServerbootV2`, which will create directory nodes at those locations. `libmachdev` provides a trivfs that intercepts the `device_open` rpc, which the `/dev` node uses. It also fakes a root filesystem node, so you can mount a `netfs` onto it. You still have to implement `device_read` and `device_write` yourself, but that code runs in userspace. An example of this can be found in `rumpdisk/block-rump.c`. `libpciaccess` is a special case: it has two modes, the first time it runs via `pci-arbiter`, it acquires the pci config IO ports and runs as x86 mode. Every subsequent access of pci becomes a hurdish user of pci-arbiter. `rumpdisk` exposes `/dev/rumpdisk`: $ showtrans /dev/rumpdisk /hurd/rumpdisk # FAQ ## `Server Boot V2` looks like a ramdisk + a script...? Its not quite a ramdisk, its more a netfs translator that creates a temporary `/`. Its a statically linked binary. I don't think it differs from a multiboot module. ## How are the device nodes on the bootstrap netfs attached to each translator? ## How does the first non-bootstrap task get invoked? ## does bootstrap resume it? ## Could we just use a ram disk instead? ## One could stick an unionfs on top of it to load the rest of the system after bootstrap. It looks similar to a ramdisk in principle, i.e. it exposes a fs which lives only in ram, but a ramdisk would not help with early bootstrap. Namely during early bootstrap, there are no signals or console. Passing control from from one server to the next via a bootstrap port is a kludge at best. How many times have you seen the bootstrap process hang and just sit there? `Serverboot V2` would solve that. Also, it would allow subhurds to be full hurds without special casing each task with bootstrap code. It would also clean up `libmachdev`, and Damien, its author, is in full support. ## A ramdisk could implement signals and stdio. Isn't that more flexible? But if its a ramdisk essentially you have to provide it with a tar image. Having it live inside a bootstrap task only is preferable. Also the task could even exit when its done whether you use an actual ramdisk or not. You still need to write the task that boots the system. That is different than how it works currently. Also a ramdisk would have to live in mach, and we want to move things out of mach. Additionally, the bootstrap task will be loaded as the first multiboot module by grub. It's not a ramdisk, because a ramdisk has to contain some fs image (with data), and we'd need to parse that format. It might make sense to steer it more into that direction (and Samuel seems to have preferred it), because there could potentially be some config files, or other files that the servers may need to run. I'm not super fond of that idea. I'd prefer the bootstrap fs to be just a place where ports (translators) can be placed and looked up. Actually in my current code it doesn't even use `netfs`, it just implements the RPCs directly. I'll possibly switch to `netfs` later, or if the implementation stays simple, I won't use `netfs`. ## Serverboot V2 just rewrites proc and exec. Why reimplement so much code? I don't want to exactly reimplement full `proc` and `exec` servers in the bootstrap task, it's more of providing very minimal emulation of some of their functions. I want to implement the two RPCs from the `proc` interface, one to give a task the privileged ports on request and one to let the task give me its msg port. That seems fairly simple to me. While we were talking of using netfs, my actual implementation doesn't even use that, it just implements the RPCs directly (not to suggest I have anything resembling a complete implementation). Here's some sample code to give you an idea of what it is like error_t S_proc_getprivports (struct bootstrap_task *task, mach_port_t *host_priv, mach_port_t *device_master) { if (!task) return EOPNOTSUPP; if (bootstrap_verbose) fprintf (stderr, "S_proc_getprivports from %s\n", task->name); *host_priv = _hurd_host_priv; *device_master = _hurd_device_master; return 0; } error_t S_proc_setmsgport (struct bootstrap_task *task, mach_port_t reply_port, mach_msg_type_name_t reply_portPoly, mach_port_t newmsgport, mach_port_t *oldmsgport, mach_msg_type_name_t *oldmsgportPoly) { if (!task) return EOPNOTSUPP; if (bootstrap_verbose) fprintf (stderr, "S_proc_setmsgport for %s\n", task->name); *oldmsgport = task->msgport; *oldmsgportPoly = MACH_MSG_TYPE_MOVE_SEND; task->msgport = newmsgport; return 0; } Yes, it really is just letting tasks fetch the priv ports (so `get_privileged_ports ()` in glibc works) and set their message ports. So much for a slippery slope of reimplementing the whole process server :) ## Let's bootstrap like this: initrd, proc, exec, acpi, pci, drivers, ## unionfs+fs with every server executable included in the initrd tarball? I don't see how that's better, but you would be able to try something like that with my plan too. The OS bootstrap needs to start servers and integrate them into the eventual full hurd system later when the rest of the system is up. When early servers start, they're running on bare Mach with no processes, no `auth`, no files or file descriptors, etc. I plan to make files available immediately (if not the real fs), and make things progressively more "real" as servers start up. When we start the root fs, we send everyone their new root `dir` port. When we start `proc`, we send everyone their new `proc` port. and so on. At the end, all those tasks we have started in early boot are full real hurd proceses that are not any different to the ones you start later, except that they're statically linked, and not actually `io map`'ed from the root fs, but loaded by Mach/grub into wired memory. # IRC Logs showtrans /dev/wd0 and you can open() that node and it will act as a device master port, so you can then `device_open` () devices (like wd0) inside of it, right? oh it's a storeio, that's… cute. that's another translator we'd need in early boot if we want to boot off /hurd/ext2fs.static /dev/wd0 We implemented it as a storeio with device:@/dev/rumpdisk:wd0 so the `@` sign makes it use the named file as the device master, right? the `@` symbol means it looks up the file as the device master yes. Instead of mach, but the code falls back to looking up mach, if it cant be found. I see it's even implemented in libstore, not in storeio, so it just does `file_name_lookup ()`, then `device_open` on that. pci-arbiter also needs acpi because the only way to know the IRQ of a pci device reliably is to use ACPI parser, so it totally implements the Mach `device_*` functions. But instead of handling the RPCs directly, it sets the callbacks into the `machdev_device_emulations_ops` structure and then libmachdev calls those. Instead of implementing the RPCs themselves, It abstracts them, in case you wanted to merge drivers. This would help if you wanted multiple different devices in the same translator, which is of course the case inside Mach, the single kernel server does all the devices. but that shouldn't be the case for the Hurd translators, right? we'd just have multiple different translators like your thing with rumpdisk and rumpusb. `` i dont know ok, so other than those machdev emulation dispatch, libmachdev uses trivfs and does early bootstrap. pci-arbiter uses it to centralize the early bootstrap so all the machdevs can use the same code. They chain together. pci-arbiter creates a netfs on top of the trivfs. How well does this work if it's not actually used in early bootstrap? and rumpdisk opens device ("pci"), when each task is resumed, it inherits a bootstrap port and what does it do with that? what kind of device "pci" is? its the device master for pci, so rumpdisk can call pci-arbiter rpcs on it hm, so I see from the code that it returns the port to the root of its translator tree actually. Does pci-arbiter have its own rpcs? does it not just expose an fs tree? it has rpcs that can be called on each fs node called "config" per device: hurd/pci.defs. libpciaccess uses these. how does that compare to reading and writing the fs node with regular read and write? so the second and subsequent instances of pciaccess end up calling into the fs tree of pci-arbiter. you can't call read/write on pci memory its MMIO, and the io ports need `inb`, `inw`, etc. They need to be accessed using special accessors, not a bitstream. but I can do $ hexdump /servers/bus/pci/0000/00/02/0/config yes you can on the config file how is that different from `pci_conf_read` ? it calls that. the `pci fs` is implemented to allow these things. why is there a need for `pci_conf_read ()` as an RPC then, if you can instead use `io_read` on the "config" node? i am not 100% sure. I think it wasn't fully implemented from the beginning, but you definitely cannot use `io_read ()` on IO ports. These have explicit x86 instructions to access them MMIO. maybe, im not sure, but it has absolute physical addressing. I don't see how you would do this via `pci.defs` either? We expose all the device tree of pci as a netfs filesystem. It is a bus of devices. you may be right. It would be best to implement pciaccess to just read/write from the filesystem once its exposed on the netfs. yes, the question is: 1 is there anything that you can do by using the special RPCs from pci.defs that you cannot do by using the regular read/write/ls/map on the exported filsystem tree, 2 if no, why is there even a need for `pci.defs`, why not always use the fs? But anyway, that's irrelevant for the question of bootstrap and libmachdev There is a need for rpcs for IO ports. Could you point me to where rumpdisk does `device_open ("pci")`? grep doesn't show anything. which rpcs are for the IO ports? They're not implemented yet we are using raw access I think. The way it works, libmachdev uses the next port, so it all chains together: `libmachdev/trivfs_server.c`. but where does it call `device_open ("pci")` ? when the pci task resumes, it has a bootstrap port, which is passed from previous task. There is no `device_open ("pci")`. or if its the first task to be resumed, it grabs a bootstrap port from glibc? im not sure ok, so if my plan is implemented how much of `libmachdev` functionality will still be used / useful? i dont know. The mach interface? device interface\*. maybe it will be useless. I'd rather you implemented the Mach device RPCs directly, without the emulation structure, but that's an unrelated change, we can leave that in for now. I kind of like the emulation structure as a list of function pointers, so i can see what needs to be implemented, but that's neither here nor there. `libmachdev` was a hack to make the bootstrap work to be honest.…and we'd no longer need that. I would be happy if it goes away. the new one would be so much better. is there anything else I should know about this all? What else could break if there was no libmachdev and all that? acpi, pci-arbiter, rumpdisk, rumpusbdisk right, let's go through these The pci-arbiter needs to start first to claim the x86 config io ports. Then gnumach locks these ports. No one else can use them. so it starts and initializes **something** what does it need? the device master port, clearly, right? that it will get through the glibc function / the proc API it needs a /servers/bus and the device master right, so then it just does fsys_startup, and the bootstrap task places it onto `/servers/bus` (it's not expected to do `file_set_translator ()` itself, just as when running as a normal translator) it exposes a netfs on `/servers/bus/pci` so will pci-arbiter still expose mach devices? a mach device master? or will it only expose an fs tree + pci.defs? i think just fs tree and pci.defs. should be enough ok, so we drop mach dev stuff from pci-arbiter completely. then acpi starts up, right? what does it need? It needs access to `pci.defs` and the pci tree. It accesses that via libpciaccess, which calls a new mode that accesses the fstree. It looks up `servers/bus/pci`. ok, but how does that work now then? It looks up the right nodes and calls pci.defs on them. looks up the right node on what? there's no root filesystem at that point (in the current scheme) `` It needs pci access that's why I was wondering how it does `device_open ("pci")` I think libmachdev from pci gives acpi the fsroot. there is a doc on this. so does it set the root node of pci-arbiter as the root dir of acpi? as in, is acpi effectively chrooted to `/servers/bus/pci`? i think acpi is chrooted to the parent of /servers. It shares the same root as pci's trivfs. i still don't quite understand how netfs and trivfs within pci-arbiter interact. you said there would be a fake /. Can't acpi use that? yeah, in my plan / the new bootstrap scheme, there'll be a / from the very start. ok so acpi can look up /servers/bus/pci, and it will exist. and pci-arbiter can really sit on `/servers/bus/pci` (no need for trivfs there at all) and acpi will just look up `/servers/bus/pci`. And we do not need to change anything in acpi to get it to do that. And how does it do it now? maybe we'd need to remove some no-longer-required logic from acpi then? it looks up device ("pci") if it exists, otherwise it falls back to `/servers/bus/pci`. Ah hold on, maybe I do understand now. currently pci-arbiter exposes its mach dev master as acpi-s mach dev master. So it looks up device("pci") and finds it that way. correct, but it doesnt need that if the `/` exists. yeah, we could remove this in the new bootstrap scheme, and just always open the fs node (or leave it in for compatibility, we'll see about that). acpi just sits on `/servers/acpi/tables`. `rumpdisk` runs next and it needs `/servers/bus/pci`, `pci.defs`, and `/servers/acpi/tables`, and `acpi.defs`. It exposes `/dev/rumpdisk`. Would it make sense to make rumpdisk expose a tree/directory of Hurd files and not Mach devices? This is not necessary for anything, but just might be a nice little cleanup. well, it could expose a tree of block devices, like `/dev/rumpdisk/ide/1`. and then `ln -s /rumpdisk/ide/1 /dev/wd1`. and no need for an intermediary storeio. plus the Hurd file interface is much richer than Mach device, you can do fsync for instance. the rump kernel is bsd under the hood, so needs to be `/dev/rumpdisk/ide/wd0` You can just convert "ide/0" to "/dev/wd0" when forwarding to the rump part. Not that I object to ide/wd0, but we can have something more hierarchical in the exposed tree than old-school unix device naming? Let's not have /dev/sda1. Instead let's have /dev/sata/0/1, but then we'd still keep the bsd names as symlinks into the *dev/rumpdisk*… tree sda sda1 good point 0 0/1 well, you can on the Hurd :D and we won't be doing that either, rumpdisk only exposes the devices, not partitions well you just implement a block device on the directory? but that would be confusing for users. I'd expect rumpdisk to only expose device nodes, like /dev/rumpdisk/ide/0, and then we'd have /dev/wd0 being a symlink to that. And /dev/wd0s1 being a storeio of type part:1:/dev/wd0 or instead of using that, you could pass that as an option to your fs, like ext2fs -T typed part:1/dev/wd0 where is the current hurd bootstrap (QuietBoot) docs hosted? here: https://git.savannah.gnu.org/cgit/hurd/web.git/plain/hurd/bootstrap.mdwn so yeah, you could do the device tree thing I'm proposing in rumpdisk, or you could leave it exposing Mach devices and have a bunch of storeios pointing to that. So anyway, let's say rumpdisk keeps exposing a single node that acts as a Mach device master and it sits on /dev/rumpdisk. Then we either need a storeio, or we could make ext2fs use that directly. So we start `/hurd/ext2fs.static -T typed part:1:@/dev/rumpdisk:wd0`. I'll drop all the logic in libdiskfs for detecting if it's the bootstrap filesystem, and starting the exec server, and spawning /hurd/startup. It'll just be a library to help create filesystems. After that the bootstrap task migrates all those translator nodes from the temporary / onto the ext2fs, broadcasts the root and cwd ports to everyone, and off we go to starting auth and proc and unix. sounds like it all would work indeed. so we're just removing libmachdev completely, right? netdde links to it too. I think it has libmachdevdde Also how would you script this thing. Like ideally we'd want the bootstrap task to follow some sort of script which would say, for example, mkdir /servers mkdir /servers/bus settrans /servers/bus/pci ${pci-task} --args-to-pci mkdir /dev settrans /dev/netdde ${netdde-task} --args-to-netdde setroot ${ext2fs-task} --args-to-ext2fs and ideally the bootstrap task would implement a REPL where you'd be able to run these commands interactively (if the existing script fails for instance). It can be like grub, where it has a predefined script, and you can do something (press a key combo?) to instead run your own commands in a repl. or if it fails, it bails out and drops you into the repl, yes. this gives you **so much more** visibility into the boot process, because currently it's all scattered across grub, libdiskfs (resuming exec, spawning /hurd/startup), /hurd/startup, and various tricky pieces of logic in all of these servers. We could call the mini-repl hurdhelper? If something fails, you're on your own, at best it prints an error message (if the failing task manages to open the mach console at that point) Perhaps we call the new bootstrap proposal Bootstrap. When/if this is ready, we'll have to remove libmachdev and port everything else to work without it. yes its a great idea. I'm not a fan of lisp either. If i keep in mind that `/` is available early, then I can just clean up the other stuff. and assume i have `/`, and the device master can be accessed with the regular glibc function, and you can printf freely (no need to open the console). Do i need to run `fsys_startup` ? yes, exactly like all translators always do. Well you probably run netfs_startup or whatever, and it calls that. you're not supposed to call fsys_getpriv or fsys_init i think my early attempts at writing translators did not use these, because i assumed i had `/`. Then i realised i didn\`t. And libmachdev was born. Yes, you should assume you have /, and just do all the regular things you would do. and if something that you would usually do doesn't work, we should think of a way to make it work by adding more stuff in the bootstrap task when it's reasonable to, of course. and please consider exposing the file tree from rumpdisk, though that's orthogonal. you mean a tree of block devices? Yes, but each device node would be just a Hurd (device) file, not a Mach device. i.e. it'd support io_read and io_write, not device_read and device_write. well I guess you could make it support both. isnt that storeio's job? if a node only implements the device RPCs, we need a storeio to turn it into a Hurd file, yes. but if you would implement the file RPCs directly, there wouldn't be a need for the intermediary storeio, not that it's important. but thats writing storeio again. thing is, i dont know at runtime which devices are exposed by rump. It auto probes them and prints them out but i cant tell programmatically which ones were detected, becuause rump knows which devices exist but doesn't expose it over API in any way. Because it runs as a kernel would with just one driver set. Rump is a decent set of drivers. It does not have better hardware support than Linux drivers (of modern Linux)? Instead Rump is netbsd in a can, and it's essentially unmaintained upstream too. However, it still is used it to test kernel modules, but it lacks makefiles to separate all drivers into modules. BUT using rump is better than updating / redoing the linux drivers port of DDE, because netbsd internal kernel API is much much more stable than linux. We would fall behind in a week with linux. No one would maintain the linux driver -> hurd port. Also, there is a framework that lets you compile the netbsd drivers as userspace unikernels: rump. Such a thing only does not exist for modern Linux. Rump is already good enough for some things. It could replace netdde. It already works for ide/sata. Rump it has its own /dev nodes on a rumpfs, so you can do something like `rump_ls` it. Rump is a minimal netbsd kernel. It is just the device drivers, and a bit of pthreading, and has only the drivers that you link. So rumpdisk only has the ahci and ide drivers and nothing else. Additionally rump can detect them off the pci bus. I will create a branch on with cleaned translators. solid_black: i almost cleaned up acpi and pci-arbiter but realised they are missing the shutdown notification when i strip out libmachdev. : "how are the device nodes on the bootstrap netfs attached to each translator?" – I don't think I understand the question, please clarify. I was wondering if the new bootstrap process can resume a fs task and have all the previous translators wake up and serve their rpcs. without needing to resume them. we have a problem with the current design, if you implement what we discussed yesterday, the IO ports wont work because they are not exposed by pci-arbiter yet. I am working on it, but its not ready. I still don't understand the problem. the bootstrap task resumes others in order. the root fs task too, eventually, but not before everything that hash to come up before the root fs task is ready. I don't think it needs to be a disk. Literally a trivfs is enough. why are I/O ports not exposed by pci-arbiter? why isn't that in issue with how it works currently then? solid_black: we are using ioperm() in userspace, but i want to refactor the io port usage to be granularly accessed. so one day gnumach can store a bitmap of all io ports and reject more than one range that overlaps ports that are in use. since only one user of any port at any time is allowed. i dont know if that will allow users to share the same io ports, but at least it will prevent users from clobbering each others hw access. damo22: (again, sorry for not understanding the hardware details), so what would be the issue? when the pci arbiter starts, doesn't it do all the things it has to do with the I/O ports? io ports are only accessed in raw method now. Any user can do ioperm(0, 0xffff, 1) and get access to all of them doesn't that require host priv or something like that? yeh probably. maybe only root can. But i want to allow unprivileged users to access io ports by requesting exclusive access to a range. I see that ioperm () in glibc uses the device master port, so yeah, root-only (good) `` first in locks the port range but you're saying that there's someting about these I/O ports that works today, but would break if we implemented what we discussed yeasterday? what is it, and why? `` well it might still work. but there's a lot of changes to be done in general let me try to ask it in a different way then i just know a few of the specifics because i worked on them. As I understand it, you're saying that 1: currently any root process can request access to any range of I/O ports, and you also want to allow **unprivileged** processes to get access to ranges of I/O ports, via a new API of the PCI arbiter (but this is not implemented yet, right?) yes 2: you're saying that something about this would break / be different in the new scheme, compared to the current scheme. i don't understand the 2, and the relation between 1 and 2. 2 not really, I may have been mistaken it probably will continue working fine. until i try to implement 1. ioperm calls `i386_io_perm_create` and `i386_io__perm_modify` in the same system call. I want to seperate these into the arbiter so the request goes into pci-arbiter and if it succeeds, then the port is returned to the caller and the caller can change the port access. yes, so what about 2 will break 1 when you try to implement it? with your new bootstrap, we need `i386_io_perm_*` to be accessible. im not sure how. is that a mach rpc? these are mach rpcs. i386_io_perm_create is an rpc that you do on device master. should be ok then i386_io_perm_modify you do on you task port. yes, I don't see how this would be problematic. : you might find this branch useful although: 1. I'm not sure whether the task itself should be wiring its memory, or if the bootstrap task should do it. 2. why do you request startup notifications if you then never do anything in `S_startup_dosync`? same for essential tasks actaully, that should probably be done by the bootstrap task and not the translator itself (but we'll see) 1. don't `mach_print`, just `fprintf (stderr, "")` 2. please always verify the return result of `mach_port_deallocate` (and similar functions), typically like this: err = mach_port_deallocate (…); assert_perror_backtrace (err); this helps catch nasty bugs. 3. I wonder why both acpi and pci have their own `pcifs_startup` and `acpifs_startup`; can't they use `netfs_startup ()`? `` 1. no idea, 2. rumpdisk needed it, but these might not 3. ACK, 4.ACK, 5. I think they couldnt use the `netfs_startup ()` before but might be able to now. Anyway, this should get you booting with your bootstrap translator (without rumpdisk). Rumpdisk seems to use the `device_* RPC` from `libmachdev` to expose its device. whereas pci and acpi dont use them for anything except `device_open` to pass their port to the next translator. I think my latest patch for io ports will work. but i need to rebuild glibc and libpciaccess and gnumach. Why does libhurduser need to be in glibc? It's quite annoying to add an rpc. I think i have done gnumach io port locking, and pciaccess, but hurd part needs work and then to merge it needs a rebuild of glibc because of hurduser Why cant libhurduser be part of the hurd package? I don't think I understnad enough of this to do a review, but I'd still like to see the patch if it's available anywhere. ok i can push to my repos glibc needs to use the Hurd RPCs (and implement some, too), and glibc cannot depend on the Hurd package because the Hurd package depends on glibc. lol ok As things currently stand, glibc depends on the Hurd **headers** (including mig defs), but not any Hurd binaries. still, the cross build process is quite convoluted. I posted about it somewhere: https://floss.social/@bugaevc/109383703992754691 the manual patching of the build system that's needed to bootstrap everything is a bit suboptimal. what if you guys submit patches upstream to glibc to add a build target to copy the headers or whatever is needed? solid_black: see [http://git.zammit.org/{libpciaccess.git,gnumach.git](http://git.zammit.org/%7Blibpciaccess.git,gnumach.git)} on fix-ioperm branches