ServerBootV2 RFC Draft
What is an OS bootstrap?
An operating system's bootstrap is the process that happens shortly after you press the power on button, as shown below:
Power-on -> Bios -> Bootloader -> OS Bootstrap -> service manager
Note that in this context the OS bootstrap is not building a distribution and packages from source code. The OS bootstrap has nothing to do with reproducible builds.
Sergey Bugaev proposed:
The Hurd's current bootstrap, Quiet-Boot (a biased and made-up name), is fragile, hard to debug, and complicated:
Quiet-boot
chokes on misspelled or missing boot arguments. When this happens, the Hurd bootstrap will likely hang and display nothing. This is tricky to debug.Quiet-Boot
is hard to change. For instance, when the Hurd developers addedacpi
, thepci-arbiter
, andrumpdisk
, they struggled to getQuiet-Boot
working again.Quiet-Boot
forces each bootstrap task to include special bootstrap logic to work. This limits what is possible during the bootstrap. For instance, it should be trivial for the Hurd to support netboot, butQuiet-Boot
makes it hard to addnfs
,pfinet
, andisofs
to the bootstrap.Quiet-Boot
hurts other Hurd distributions too. When Guix developers updated their packaged version of the Hurd, that included support for SATA drives, a simple misspelled boot argument halted their progress for a few weeks.
The alternative ServerBoot V2
proposal (which was discussed on
irc and is similar to
the previously discussed bootshell
proposal)
aims to code all or most of the bootstrap specific logic into one
single task (/hurd/serverboot
). Serverboot V2
has a number
of enticing advantages:
- It simplifies the hierarchical dependency of translators during bootstrap. Developers should be able to re-order and add new bootstrap translators with minimal work.
- It gives early bootstrap translators like
auth
andext2fs
standard input and output which lets them display boot errors. It also lets signals work. - One can trivially use most Hurd translators during the bootstrap. You just have to link them statically.
libmachdev
could be simplified to only expose hardware to userspace; it might even be possible to remove it entirely. Also thepci-arbiter
,acpi
, andrumpdisk
could be simplified.- Developers could remove any bootstrap logic from
libdiskfs
, which detects the bootstrap filesystem, starts theexec
server, and spawns/hurd/startup
. Instead,libdiskfs
would only focus on providing filesystem support. - If an error happens during early boot, the user could be dropped
into a REPL or mini-console, where he can try to debug the issue.
We might call this
Bootshell V2
, in reference to the original proposal. This could be written in lisp. Imagine having an extremely powerful programming language available during bootstrap that is only 436 bytes! - It would simplify the code for subhurds by removing the logic from each task that deals with the OS bootstrap.
Now that you know why we should use Serverboot V2
, let's get more
detailed. What is Serverboot V2
?
Serverboot V2
would be an empty filesystem dynamically populated
during bootstrap. It would use a netfs
like filesystem that will
populate as various bootstrap tasks are started. For example,
/servers/socket2
will be created once pfinet
starts. It also
temporarily pretends to be the Hurd process server, exec
, and /
filesystem while providing signals and stdio
. Let's explain how
Serverboot V2
will bootstrap the Hurd.
FIXME The rest of this needs work.
Any bootstrap that the Hurd uses will probably be a little odd,
because there is an awkward and circular startup-dance between
exec
, ext2fs
, startup
, proc
, auth
, the pci-arbiter
,
rumpdisk
, and acpi
in which each translator oddly depends on the
other during the bootstrap, as this ascii art shows.
pci-arbiter
|
acpi
|
rumpdisk
|
ex2fs -- storeio
/ \
exec startup
/ \
auth proc
This means that there is no perfect Hurd bootstrap design. Some
designs are better in some ways and worse in others. Serverboot V2
would simplify other early bootstrap tasks, but all that complicated
logic would be in one binary. One valid criticism of Serverboot V2
is that it will may be a hassle to develop and maintain. In any case,
trying to code the best Hurd bootstrap may be a waste of time. In
fact, the Hurd bootstrap has been rewritten several times already.
Our fearless leader, Samuel, feels that rewriting the Hurd bootstrap
every few years may be a waste of time. Now that you understand why
Samuel's discourages a Hurd bootstrap rewrite, let's consider why we
should develop Serverboot V2
.
How ServerBoot V2 will work
Bootstrap begins when Grub and GNU Mach start some tasks, and then GNU
Mach resumes the not-yet-written
/hurd/serverboot
. /hurd/serverboot
is the only task to accept
special ports from the kernel via command line arguments like
--kernel-task
; /hurd/serverboot
tries to implement/emulate as much
of the normal Hurd environment for the other bootstrap translators.
In particular, it provides the other translators with stdio
, which
lets them read/write without having to open the Mach console device.
This means that the various translators will be able to complain about
their bad arguments or other startup errors, which they cannot
currently do.
/hurd/serverboot
will provide a basic filesystem with netfs, which
gives the other translators a working /
directory and cwd
ports. For example, /hurd/serverboot
, would store its port at
/dev/netdde
. When /hurd/netdde
starts, it will reply to its
parent with fsys_startup ()
as normal.
/hurd/serverboot
will also emulate the native Hurd process server to
early bootstrap tasks. This will allow early bootstrap tasks to get
the privileged (device master and kernel task) ports via the normal
glibc function get_privileged_ports (&host_priv, &device_master).
Other tasks will register their message ports with the emulated
process server. This will allow signals and messaging during the
bootstrap. We can even use the existing mechanisms in glibc to set and
get init ports. For example, when we start the auth
server, we will
give every task started thus far, their new authentication port via
glibc's msg_set_init_port ()
. When we start the real proc server,
we query it for proc ports for each of the tasks, and set them the
same way. This lets us migrate from the emulated proc server to the
real one.
Fix me: Where does storeio (storeio with
device:@/dev/rumpdisk:wd0
), rumpdisk, and the pci-arbiter come
in?
Next, we start ext2fs
. We reattach all the running translators from
our netfs
bootstrap filesystem onto the new root. We then send
those translators their new root and cwd ports. This should happen
transparently to the translators themselves!
Supporting Netboot
Serverboot V2
could trivially support netboot by adding netdde
,
pfinet
(or lwip
), and isofs
as bootstrap tasks. The bootstrap
task will start the pci-arbiter
, and acpi
(FIXME add some more
detail to this sentence). The bootstrap task starts netdde
, which
will look up any eth
devices (using the device master port, which it
queries via the fake process server interface), and sends its fsys
control port to the bootstrap task in the regular fsys_startup
()
. The bootstrap task sets the fsys control port as the translator
on the /dev/netdde
node in its netfs
bootstrap fs. Then
/hurd/serverboot
resumes pfinet
, which looks up
/dev/netdde
. Then pfinet
returns its fsys
control port to the
bootstrap task, which it sets on /servers/socket/2
. Then bootstrap
resumes nfs
, and nfs
just creates a socket using the regular glibc
socket () call, and that looks up /servers/socket/2
, and it just
works. FIXME where does isofs fit in here?
Then nfs
gives its fsys
control port to /hurd/serverboot
, which
knows it's the real root filesystem, so it take the netdde's and
pfinet's fsys control ports. Then it calls file_set_translator ()
on the nfs on the same paths, so now /dev/netdde
and
/servers/socket/2
exist and are accessible both on our bootstrap fs,
and on the new root fs. The bootstrap can then take the root fs to
broadcast a root and cwd port to all other tasks via a
msg_set_init_port ()
. Now every task is running on the real root fs,
and our little bootstrap fs is no longer used.
/hurd/serverboot
can resume the exec server (which is the first
dynamically-linked task) with the real root fs. Then we just
file_set_translator ()
on the exec server to /servers/exec
, so
that nfs
doesn't have to care about this. The bootstrap can now
spawn tasks, instead of resuming ones loaded by Mach and grub, so it
next spawns the auth
and proc
servers and gives everyone their
auth
and proc
ports. By that point, we have enough of a Unix
environment to call fork()
and exec()
. Then the bootstrap tasks
would do the things that /hurd/startup
used to do, and finally
spawns (or execs) init / PID 1
.
With this scheme you will be able to use ext2fs to start to your root
fs via as /hurd/ext2fs.static /dev/wd0s1
. This eliminates boot
arguments like --magit-port
and --next-task
.
This also simplifies libmachdev
, which exposes devices to userspace
via some Mach device_*
RPC calls, which lets the Hurd contain device
drivers instead of GNU Mach. Everything that connects to hardware can
be a machdev
.
Additionally, during the Quiet Boot
bootstrap,libmachdev
awkwardly
uses libtrivfs
to create a transient /
directory, so that the
pci-arbiter
can mount a netfs on top of it at bootstrap.
libmachdev
needs /servers/bus
to mount /pci,
and it also
needs /servers
and /servers/bus
(and /dev
, and
/servers/socket
). That complexity could be moved to ServerbootV2
,
which will create directory nodes at those locations.
libmachdev
provides a trivfs that intercepts the device_open
rpc,
which the /dev
node uses. It also fakes a root filesystem node, so
you can mount a netfs
onto it. You still have to implement
device_read
and device_write
yourself, but that code runs in
userspace. An example of this can be found in
rumpdisk/block-rump.c
.
libpciaccess
is a special case: it has two modes, the first time it
runs via pci-arbiter
, it acquires the pci config IO ports and runs
as x86 mode. Every subsequent access of pci becomes a hurdish user of
pci-arbiter.
rumpdisk
exposes /dev/rumpdisk
:
$ showtrans /dev/rumpdisk
/hurd/rumpdisk
FAQ
Server Boot V2
looks like a ramdisk + a script...?
Its not quite a ramdisk, its more a netfs translator that
creates a temporary /
. Its a statically linked binary. I don't
think it differs from a multiboot module.
How are the device nodes on the bootstrap netfs attached to each translator?
How does the first non-bootstrap task get invoked?
does bootstrap resume it?
Could we just use a ram disk instead?
One could stick an unionfs on top of it to load the rest of the system after bootstrap.
It looks similar to a ramdisk in principle, i.e. it exposes a fs which
lives only in ram, but a ramdisk would not help with early bootstrap.
Namely during early bootstrap, there are no signals or console.
Passing control from from one server to the next via a bootstrap port
is a kludge at best. How many times have you seen the bootstrap
process hang and just sit there? Serverboot V2
would solve that.
Also, it would allow subhurds to be full hurds without special casing
each task with bootstrap code. It would also clean up libmachdev
,
and Damien, its author, is in full support.
A ramdisk could implement signals and stdio. Isn't that more flexible?
But if its a ramdisk essentially you have to provide it with a tar image. Having it live inside a bootstrap task only is preferable. Also the task could even exit when its done whether you use an actual ramdisk or not. You still need to write the task that boots the system. That is different than how it works currently. Also a ramdisk would have to live in mach, and we want to move things out of mach.
Additionally, the bootstrap task will be loaded as the first multiboot
module by grub. It's not a ramdisk, because a ramdisk has to contain
some fs image (with data), and we'd need to parse that format. It
might make sense to steer it more into that direction (and Samuel
seems to have preferred it), because there could potentially be some
config files, or other files that the servers may need to run. I'm not
super fond of that idea. I'd prefer the bootstrap fs to be just a
place where ports (translators) can be placed and looked up. Actually
in my current code it doesn't even use netfs
, it just implements the
RPCs directly. I'll possibly switch to netfs
later, or if the
implementation stays simple, I won't use netfs
.
Serverboot V2 just rewrites proc and exec. Why reimplement so much code?
I don't want to exactly reimplement full proc
and exec
servers in the
bootstrap task, it's more of providing very minimal emulation of some
of their functions. I want to implement the two RPCs from the
proc
interface, one to give a task the privileged ports on request and
one to let the task give me its msg port. That seems fairly simple to
me.
While we were talking of using netfs, my actual implementation doesn't even use that, it just implements the RPCs directly (not to suggest I have anything resembling a complete implementation). Here's some sample code to give you an idea of what it is like
error_t
S_proc_getprivports (struct bootstrap_task *task,
mach_port_t *host_priv,
mach_port_t *device_master)
{
if (!task)
return EOPNOTSUPP;
if (bootstrap_verbose)
fprintf (stderr, "S_proc_getprivports from %s\n", task->name);
*host_priv = _hurd_host_priv;
*device_master = _hurd_device_master;
return 0;
}
error_t
S_proc_setmsgport (struct bootstrap_task *task,
mach_port_t reply_port,
mach_msg_type_name_t reply_portPoly,
mach_port_t newmsgport,
mach_port_t *oldmsgport,
mach_msg_type_name_t *oldmsgportPoly)
{
if (!task)
return EOPNOTSUPP;
if (bootstrap_verbose)
fprintf (stderr, "S_proc_setmsgport for %s\n", task->name);
*oldmsgport = task->msgport;
*oldmsgportPoly = MACH_MSG_TYPE_MOVE_SEND;
task->msgport = newmsgport;
return 0;
}
Yes, it really is just letting tasks fetch the priv ports (so
get_privileged_ports ()
in glibc works) and set their message ports.
So much for a slippery slope of reimplementing the whole process
server
Let's bootstrap like this: initrd, proc, exec, acpi, pci, drivers,
unionfs+fs with every server executable included in the initrd tarball?
I don't see how that's better, but you would be able to try something
like that with my plan too. The OS bootstrap needs to start servers
and integrate them into the eventual full hurd system later when the
rest of the system is up. When early servers start, they're running
on bare Mach with no processes, no auth
, no files or file
descriptors, etc. I plan to make files available immediately (if not
the real fs), and make things progressively more "real" as servers
start up. When we start the root fs, we send everyone their new root
dir
port. When we start proc
, we send everyone their new proc
port. and so on. At the end, all those tasks we have started in
early boot are full real hurd proceses that are not any different to
the ones you start later, except that they're statically linked, and
not actually io map
'ed from the root fs, but loaded by Mach/grub
into wired memory.
IRC Logs
<damo22> showtrans /dev/wd0 and you can open() that node and it will
act as a device master port, so you can then `device_open` () devices
(like wd0) inside of it, right?
oh it's a storeio, that's… cute. that's another translator we'd need
in early boot if we want to boot off /hurd/ext2fs.static /dev/wd0
<damo22> We implemented it as a storeio with
device:@/dev/rumpdisk:wd0
so the `@` sign makes it use the named file as the device master, right?
<damo22> the `@` symbol means it looks up the file as the device
master yes. Instead of mach, but the code falls back to looking up
mach, if it cant be found.
I see it's even implemented in libstore, not in storeio, so it just
does `file_name_lookup ()`, then `device_open` on that.
<damo22> pci-arbiter also needs acpi because the only way to know the
IRQ of a pci device reliably is to use ACPI parser, so it totally
implements the Mach `device_*` functions. But instead of handling the
RPCs directly, it sets the callbacks into the
`machdev_device_emulations_ops` structure and then libmachdev calls
those. Instead of implementing the RPCs themselves, It abstracts them,
in case you wanted to merge drivers. This would help if you wanted
multiple different devices in the same translator, which is of course
the case inside Mach, the single kernel server does all the devices.
but that shouldn't be the case for the Hurd translators, right? we'd
just have multiple different translators like your thing with rumpdisk
and rumpusb.
`<damo22>` i dont know
ok, so other than those machdev emulation dispatch, libmachdev uses
trivfs and does early bootstrap. pci-arbiter uses it to centralize the
early bootstrap so all the machdevs can use the same code. They chain
together. pci-arbiter creates a netfs on top of the trivfs. How
well does this work if it's not actually used in early bootstrap?
<damo22> and rumpdisk opens device ("pci"), when each task is resumed,
it inherits a bootstrap port
and what does it do with that? what kind of device "pci" is?
<damo22> its the device master for pci, so rumpdisk can call
pci-arbiter rpcs on it
hm, so I see from the code that it returns the port to the root of its
translator tree actually. Does pci-arbiter have its own rpcs? does it
not just expose an fs tree?
<damo22> it has rpcs that can be called on each fs node called
"config" per device: hurd/pci.defs. libpciaccess uses these.
how does that compare to reading and writing the fs node with regular read and write?
<damo22> so the second and subsequent instances of pciaccess end up
calling into the fs tree of pci-arbiter. you can't call read/write on
pci memory its MMIO, and the io ports need `inb`, `inw`, etc. They
need to be accessed using special accessors, not a bitstream.
but I can do $ hexdump /servers/bus/pci/0000/00/02/0/config
<damo22> yes you can on the config file
how is that different from `pci_conf_read` ? it calls that.
<damo22> the `pci fs` is implemented to allow these things.
why is there a need for `pci_conf_read ()` as an RPC then, if you can
instead use `io_read` on the "config" node?
<damo22> i am not 100% sure. I think it wasn't fully implemented from
the beginning, but you definitely cannot use `io_read ()` on IO
ports. These have explicit x86 instructions to access them
MMIO. maybe, im not sure, but it has absolute physical addressing.
I don't see how you would do this via `pci.defs` either?
<damo22> We expose all the device tree of pci as a netfs
filesystem. It is a bus of devices. you may be right. It would be best
to implement pciaccess to just read/write from the filesystem once its
exposed on the netfs.
yes, the question is:
1 is there anything that you can do by using the special RPCs from
pci.defs that you cannot do by using the regular read/write/ls/map
on the exported filsystem tree,
2 if no, why is there even a need for `pci.defs`, why not always use
the fs? But anyway, that's irrelevant for the question of bootstrap
and libmachdev
<damo22> There is a need for rpcs for IO ports.
Could you point me to where rumpdisk does `device_open ("pci")`? grep
doesn't show anything. which rpcs are for the IO ports?
<damo22> They're not implemented yet we are using raw access I
think. The way it works, libmachdev uses the next port, so it all
chains together: `libmachdev/trivfs_server.c`.
but where does it call `device_open ("pci")` ?
<damo22> when the pci task resumes, it has a bootstrap port, which is
passed from previous task. There is no `device_open ("pci")`. or if
its the first task to be resumed, it grabs a bootstrap port from
glibc? im not sure
ok, so if my plan is implemented how much of `libmachdev` functionality
will still be used / useful?
<damo22> i dont know. The mach interface? device interface\*. maybe
it will be useless.
I'd rather you implemented the Mach device RPCs directly, without the
emulation structure, but that's an unrelated change, we can leave that
in for now.
<damo22> I kind of like the emulation structure as a list of function
pointers, so i can see what needs to be implemented, but that's
neither here nor there. `libmachdev` was a hack to make the bootstrap
work to be honest.…and we'd no longer need that. I would be happy if
it goes away. the new one would be so much better.
is there anything else I should know about this all? What else could
break if there was no libmachdev and all that?
<damo22> acpi, pci-arbiter, rumpdisk, rumpusbdisk
right, let's go through these
<damo22> The pci-arbiter needs to start first to claim the x86 config
io ports. Then gnumach locks these ports. No one else can use them.
so it starts and initializes **something** what does it need? the
device master port, clearly, right? that it will get through the
glibc function / the proc API
<damo22> it needs a /servers/bus and the device master
<solid_black>
right, so then it just does fsys_startup, and the bootstrap task
places it onto `/servers/bus` (it's not expected to do
`file_set_translator ()` itself, just as when running as a normal
translator)
<damo22> it exposes a netfs on `/servers/bus/pci`
<solid_black> so will pci-arbiter still expose mach devices? a mach
device master? or will it only expose an fs tree + pci.defs?
<damo22> i think just fs tree and pci.defs. should be enough
<solid_black> ok, so we drop mach dev stuff from pci-arbiter
completely. then acpi starts up, right? what does it need?
<damo22> It needs access to `pci.defs` and the pci tree. It
accesses that via libpciaccess, which calls a new mode that
accesses the fstree. It looks up `servers/bus/pci`.
ok, but how does that work now then?
<damo22> It looks up the right nodes and calls pci.defs on them.
<solid_black> looks up the right node on what? there's no root
filesystem at that point (in the current scheme)
`<damo22>` It needs pci access
that's why I was wondering how it does `device_open ("pci")`
<damo22> I think libmachdev from pci gives acpi the fsroot. there is a
doc on this.
so does it set the root node of pci-arbiter as the root dir of acpi?
as in, is acpi effectively chrooted to `/servers/bus/pci`?
<damo22> i think acpi is chrooted to the parent of /servers. It shares
the same root as pci's trivfs.
i still don't quite understand how netfs and trivfs within pci-arbiter interact.
<damo22> you said there would be a fake /. Can't acpi use that?
<solid_black> yeah, in my plan / the new bootstrap scheme, there'll be
a / from the very start.
<damo22> ok so acpi can look up /servers/bus/pci, and it will exist.
and pci-arbiter can really sit on `/servers/bus/pci` (no need for
trivfs there at all) and acpi will just look up
`/servers/bus/pci`. And we do not need to change anything in acpi to
get it to do that.
And how does it do it now? maybe we'd need to remove some
no-longer-required logic from acpi then?
<damo22> it looks up device ("pci") if it exists, otherwise it falls
back to `/servers/bus/pci`.
Ah hold on, maybe I do understand now. currently pci-arbiter exposes
its mach dev master as acpi-s mach dev master. So it looks up
device("pci") and finds it that way.
<damo22> correct, but it doesnt need that if the `/` exists.
yeah, we could remove this in the new bootstrap scheme, and just
always open the fs node (or leave it in for compatibility, we'll see
about that). acpi just sits on `/servers/acpi/tables`.
`rumpdisk` runs next and it needs `/servers/bus/pci`, `pci.defs`, and
`/servers/acpi/tables`, and `acpi.defs`. It exposes `/dev/rumpdisk`.
Would it make sense to make rumpdisk expose a tree/directory of Hurd
files and not Mach devices? This is not necessary for anything, but
just might be a nice little cleanup.
<damo22> well, it could expose a tree of block devices, like
`/dev/rumpdisk/ide/1`.
<solid_black> and then `ln -s /rumpdisk/ide/1 /dev/wd1`. and no need
for an intermediary storeio. plus the Hurd file interface is much
richer than Mach device, you can do fsync for instance.
<damo22> the rump kernel is bsd under the hood, so needs to be
`/dev/rumpdisk/ide/wd0`
<solid_black> You can just convert "ide/0" to "/dev/wd0" when
forwarding to the rump part. Not that I object to ide/wd0, but we can
have something more hierarchical in the exposed tree than old-school
unix device naming? Let's not have /dev/sda1. Instead let's have
/dev/sata/0/1, but then we'd still keep the bsd names as symlinks into
the *dev/rumpdisk*… tree
<damo22> sda sda1
<solid_black> good point
<damo22> 0 0/1
<solid_black> well, you can on the Hurd :D and we won't be doing that
either, rumpdisk only exposes the devices, not partitions
<damo22> well you just implement a block device on the directory? but
that would be confusing for users.
<solid_black> I'd expect rumpdisk to only expose device nodes, like
/dev/rumpdisk/ide/0, and then we'd have /dev/wd0 being a symlink to
that. And /dev/wd0s1 being a storeio of type part:1:/dev/wd0 or
instead of using that, you could pass that as an option to your fs,
like ext2fs -T typed part:1/dev/wd0
<damo22> where is the current hurd bootstrap (QuietBoot) docs hosted?
here:
https://git.savannah.gnu.org/cgit/hurd/web.git/plain/hurd/bootstrap.mdwn
<solid_black> so yeah, you could do the device tree thing I'm
proposing in rumpdisk, or you could leave it exposing Mach devices and
have a bunch of storeios pointing to that. So anyway, let's say
rumpdisk keeps exposing a single node that acts as a Mach device
master and it sits on /dev/rumpdisk.
<solid_black> Then we either need a storeio, or we could make ext2fs
use that directly. So we start `/hurd/ext2fs.static -T typed
part:1:@/dev/rumpdisk:wd0`.
<solid_black> I'll drop all the logic in libdiskfs for detecting if
it's the bootstrap filesystem, and starting the exec server, and
spawning /hurd/startup. It'll just be a library to help create
filesystems.
<solid_black> After that the bootstrap task migrates all those
translator nodes from the temporary / onto the ext2fs, broadcasts the
root and cwd ports to everyone, and off we go to starting auth and
proc and unix. sounds like it all would work indeed. so we're just
removing libmachdev completely, right?
<damo22> netdde links to it too. I think it has libmachdevdde
<solid_black> Also how would you script this thing. Like ideally we'd
want the bootstrap task to follow some sort of script which would say,
for example,
mkdir /servers
mkdir /servers/bus
settrans /servers/bus/pci ${pci-task} --args-to-pci
mkdir /dev
settrans /dev/netdde ${netdde-task} --args-to-netdde
setroot ${ext2fs-task} --args-to-ext2fs
<solid_black> and ideally the bootstrap task would implement a REPL
where you'd be able to run these commands interactively (if the
existing script fails for instance). It can be like grub, where it has
a predefined script, and you can do something (press a key combo?) to
instead run your own commands in a repl. or if it fails, it bails out
and drops you into the repl, yes. this gives you **so much more**
visibility into the boot process, because currently it's all scattered
across grub, libdiskfs (resuming exec, spawning /hurd/startup),
/hurd/startup, and various tricky pieces of logic in all of these
servers.
<solid_black> We could call the mini-repl hurdhelper? If something
fails, you're on your own, at best it prints an error message (if the
failing task manages to open the mach console at that point) Perhaps
we call the new bootstrap proposal Bootstrap.
<solid_black> When/if this is ready, we'll have to remove libmachdev
and port everything else to work without it.
<damo22> yes its a great idea. I'm not a fan of lisp either. If i
keep in mind that `/` is available early, then I can just clean up the
other stuff. and assume i have `/`, and the device master can be
accessed with the regular glibc function, and you can printf freely
(no need to open the console). Do i need to run `fsys_startup` ?
yes, exactly like all translators always do. Well you probably run
netfs_startup or whatever, and it calls that. you're not supposed to
call fsys_getpriv or fsys_init
<damo22> i think my early attempts at writing translators did not use
these, because i assumed i had `/`. Then i realised i didn\`t. And
libmachdev was born.
<solid-black> Yes, you should assume you have /, and just do all the
regular things you would do. and if something that you would usually
do doesn't work, we should think of a way to make it work by adding
more stuff in the bootstrap task when it's reasonable to, of
course. and please consider exposing the file tree from rumpdisk,
though that's orthogonal.
<damo22> you mean a tree of block devices?
<solid_black> Yes, but each device node would be just a Hurd (device)
file, not a Mach device. i.e. it'd support io_read and io_write, not
device_read and device_write. well I guess you could make it support
both.
<damo22> isnt that storeio's job?
<solid_black> if a node only implements the device RPCs, we need a
storeio to turn it into a Hurd file, yes. but if you would implement
the file RPCs directly, there wouldn't be a need for the intermediary
storeio, not that it's important.
<damo22> but thats writing storeio again. thing is, i dont know at
runtime which devices are exposed by rump. It auto probes them and
prints them out but i cant tell programmatically which ones were
detected, becuause rump knows which devices exist but doesn't expose
it over API in any way. Because it runs as a kernel would with just
one driver set.
<damo22> Rump is a decent set of drivers. It does not have better
hardware support than Linux drivers (of modern Linux)? Instead Rump is
netbsd in a can, and it's essentially unmaintained upstream
too. However, it still is used it to test kernel modules, but it lacks
makefiles to separate all drivers into modules. BUT using rump is
better than updating / redoing the linux drivers port of DDE, because
netbsd internal kernel API is much much more stable than linux. We
would fall behind in a week with linux. No one would maintain the
linux driver -> hurd port. Also, there is a framework that lets you
compile the netbsd drivers as userspace unikernels: rump. Such a
thing only does not exist for modern Linux. Rump is already good
enough for some things. It could replace netdde. It already works for
ide/sata.
<damo22> Rump it has its own /dev nodes on a rumpfs, so you can do
something like `rump_ls` it.
<damo22> Rump is a minimal netbsd kernel. It is just the device
drivers, and a bit of pthreading, and has only the drivers that you
link. So rumpdisk only has the ahci and ide drivers and nothing
else. Additionally rump can detect them off the pci bus.
<damo22> I will create a branch on
<http://git.zammit.org/hurd-sv.git> with cleaned translators.
<damo22> solid_black: i almost cleaned up acpi and pci-arbiter but
realised they are missing the shutdown notification when i strip out
libmachdev.
<solid-black>: "how are the device nodes on the bootstrap netfs attached to
each translator?" – I don't think I understand the question, please
clarify.
<damo22> I was wondering if the new bootstrap process can resume a fs
task and have all the previous translators wake up and serve their
rpcs. without needing to resume them. we have a problem with the
current design, if you implement what we discussed yesterday, the IO
ports wont work because they are not exposed by pci-arbiter yet. I am
working on it, but its not ready.
<solid_black> I still don't understand the problem. the bootstrap
task resumes others in order. the root fs task too, eventually, but
not before everything that hash to come up before the root fs task is
ready.
<damo22> I don't think it needs to be a disk. Literally a trivfs is enough.
<solid_black> why are I/O ports not exposed by pci-arbiter? why isn't
that in issue with how it works currently then?
<damo22> solid_black: we are using ioperm() in userspace, but i want
to refactor the io port usage to be granularly accessed. so one day
gnumach can store a bitmap of all io ports and reject more than one
range that overlaps ports that are in use. since only one user of any
port at any time is allowed. i dont know if that will allow users to
share the same io ports, but at least it will prevent users from
clobbering each others hw access.
<solid_black> damo22: (again, sorry for not understanding the hardware
details), so what would be the issue? when the pci arbiter starts,
doesn't it do all the things it has to do with the I/O ports?
<damo22> io ports are only accessed in raw method now. Any user can do
ioperm(0, 0xffff, 1) and get access to all of them
<solid_black> doesn't that require host priv or something like that?
<damo22> yeh probably. maybe only root can. But i want to allow
unprivileged users to access io ports by requesting exclusive access
to a range.
<solid_black> I see that ioperm () in glibc uses the device master
port, so yeah, root-only (good)
`<damo22>` first in locks the port range
<solid_black> but you're saying that there's someting about these I/O
ports that works today, but would break if we implemented what we
discussed yeasterday? what is it, and why?
`<damo22>` well it might still work. but there's a lot of changes to
be done in general
<solid_black> let me try to ask it in a different way then
<damo22> i just know a few of the specifics because i worked on them.
<solid_black> As I understand it, you're saying that 1: currently any
root process can request access to any range of I/O ports, and you
also want to allow **unprivileged** processes to get access to ranges
of I/O ports, via a new API of the PCI arbiter (but this is not
implemented yet, right?)
<damo22> yes
<solid_black> 2: you're saying that something about this would break /
be different in the new scheme, compared to the current scheme. i
don't understand the 2, and the relation between 1 and 2.
<damo22> 2 not really, I may have been mistaken it probably will
continue working fine. until i try to implement 1. ioperm calls
`i386_io_perm_create` and `i386_io__perm_modify` in the same system
call. I want to seperate these into the arbiter so the request goes
into pci-arbiter and if it succeeds, then the port is returned to the
caller and the caller can change the port access.
<solid_black> yes, so what about 2 will break 1 when you try to implement it?
<damo22> with your new bootstrap, we need `i386_io_perm_*` to be
accessible. im not sure how. is that a mach rpc?
<solid_black> these are mach rpcs. i386_io_perm_create is an rpc that
you do on device master.
<damo22> should be ok then
<solid_black> i386_io_perm_modify you do on you task port. yes, I
don't see how this would be problematic.
<damo22>: you might find this branch useful
<http://git.zammit.org/hurd-sv.git/log/?h=feat-simplify-bootstrap>
<solid_black> although:
1. I'm not sure whether the task itself should be wiring its memory,
or if the bootstrap task should do it.
2. why do you request startup notifications if you then never do
anything in `S_startup_dosync`?
<solid_black> same for essential tasks actaully, that should probably
be done by the bootstrap task and not the translator itself (but we'll
see)
<solid_black> 1. don't `mach_print`, just `fprintf (stderr, "")`
<solid_black> 2. please always verify the return result of
`mach_port_deallocate` (and similar functions),
typically like this:
err = mach_port_deallocate (…);
assert_perror_backtrace (err);
this helps catch nasty bugs.
<solid_black> 3. I wonder why both acpi and pci have their own
`pcifs_startup` and `acpifs_startup`; can't they use `netfs_startup
()`?
`<damo22>` 1. no idea, 2. rumpdisk needed it, but these might
not 3. ACK, 4.ACK, 5. I think they couldnt use the `netfs_startup ()`
before but might be able to now. Anyway, this should get you booting
with your bootstrap translator (without rumpdisk). Rumpdisk seems to
use the `device_* RPC` from `libmachdev` to expose its device.
whereas pci and acpi dont use them for anything except `device_open`
to pass their port to the next translator. I think my latest patch
for io ports will work. but i need to rebuild glibc and libpciaccess
and gnumach. Why does libhurduser need to be in glibc? It's quite
annoying to add an rpc.
I think i have done gnumach io port locking, and pciaccess, but hurd
part needs work and then to merge it needs a rebuild of glibc because
of hurduser
<damo22> Why cant libhurduser be part of the hurd package?
I don't think I understnad enough of this to do a review, but I'd
still like to see the patch if it's available anywhere.
<damo22> ok i can push to my repos
<solid_black> glibc needs to use the Hurd RPCs (and implement some,
too), and glibc cannot depend on the Hurd package because the Hurd
package depends on glibc.
<damo22> lol ok
<solid_black> As things currently stand, glibc depends on the Hurd
**headers** (including mig defs), but not any Hurd binaries. still,
the cross build process is quite convoluted. I posted about it
somewhere: https://floss.social/@bugaevc/109383703992754691
<jpoiret> the manual patching of the build system that's needed to
bootstrap everything is a bit suboptimal.
<damo22> what if you guys submit patches upstream to glibc to add a
build target to copy the headers or whatever is needed? solid_black:
see
[http://git.zammit.org/{libpciaccess.git,gnumach.git](http://git.zammit.org/%7Blibpciaccess.git,gnumach.git)}
on fix-ioperm branches