From 8cc3d6477bbb6799e96fe8be25271e49f4b76c46 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 13:36:17 +0000 Subject: --- user/Etenil.mdwn | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 user/Etenil.mdwn (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn new file mode 100644 index 00000000..a9bc47ff --- /dev/null +++ b/user/Etenil.mdwn @@ -0,0 +1,14 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + + +## Current task + +Write a pagein (prefetching) mechanism in Mach. -- cgit v1.2.3 From d19b0b9c8a0f5fb97e310bf3aab21a42c49ec30c Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 13:41:27 +0000 Subject: --- user/Etenil.mdwn | 2 ++ 1 file changed, 2 insertions(+) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index a9bc47ff..0d78b042 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -12,3 +12,5 @@ License|/fdl]]."]]"""]] ## Current task Write a pagein (prefetching) mechanism in Mach. + +-- More to come. -- cgit v1.2.3 From b4b7e692b1e6395698814ad3bb68f51706307120 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 19:21:14 +0000 Subject: --- user/Etenil.mdwn | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index 0d78b042..9eff7ea7 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -8,9 +8,23 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] +[[!toc]] ## Current task -Write a pagein (prefetching) mechanism in Mach. +Write a clusterized pagein (prefetching) mechanism in Mach. + +## General information on system architecture + +In order to implement the pagein properly, it was necessary for me to get a general idea of the I/O path that data follows in the Hurd/Mach. To accomplish this, I've investigated top-down from the [[ext2fs]] translator to Mach. This section contains the main nodes that data passes through. + +This is based on my understanding of the system and is probably imprecise. Refer to the manuals of both Hurd and Mach for more detailed information. + +### Pagers +Pagers are implemented in libpager and provide abstracted access to Mach's [[VM]]. A pager is a struct that contains callback function references. These are used to actually access the storage. In the case of FS translators, like ext2fs, the pager uses libstore to acess the underlying hardware. + +### Libstore +Libstore provides abstracted access to Mach's storage access. + +I am currently looking at the way the stores call Mach, especially for memory allocation. My intuition is that memory is allocated in Mach when the function *store_create()*. I am currently investigating this to see where in Mach would the prefetcher fit. --- More to come. -- cgit v1.2.3 From 5c53b3d3c90d5da55a7859f1a44e7eaeaa2f12f9 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 19:31:51 +0000 Subject: --- user/Etenil.mdwn | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index 9eff7ea7..e96ac699 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -18,6 +18,8 @@ Write a clusterized pagein (prefetching) mechanism in Mach. In order to implement the pagein properly, it was necessary for me to get a general idea of the I/O path that data follows in the Hurd/Mach. To accomplish this, I've investigated top-down from the [[ext2fs]] translator to Mach. This section contains the main nodes that data passes through. +This section is probably unnecessary to implement the prefetcher in Mach, however it is always interesting to understand how things work so we can notice when they get broken. + This is based on my understanding of the system and is probably imprecise. Refer to the manuals of both Hurd and Mach for more detailed information. ### Pagers @@ -26,5 +28,17 @@ Pagers are implemented in libpager and provide abstracted access to Mach's [[VM] ### Libstore Libstore provides abstracted access to Mach's storage access. -I am currently looking at the way the stores call Mach, especially for memory allocation. My intuition is that memory is allocated in Mach when the function *store_create()*. I am currently investigating this to see where in Mach would the prefetcher fit. +I am currently looking at the way the stores call Mach, especially for memory allocation. My intuition is that memory is allocated in Mach when the function *store_create()*. I am currently investigating this to see how the memory allocation process happens in practice. + +### Mach +VM allocation happens with a call to: + + kern_return_t vm_allocate (vm_task_t target_task, vm_address_t *address, vm_size_t size, boolean_t anywhere) + + +## Implementation idea +To start of with, I will toy with the VM (even if it breaks stuff). My initial intent is to systematically allocate more memory than requested in the hope that the excess will be manipulated by the task in the near future, thus saving on future I/O requests. + +I'd also need to keep track of the pre-allocated memory and so that I can pass it on to the task on demand and prefetch even more. I could also possibly time the prefetched data and unallocate it if it's not requested after a while, but that's just an idea. +The tricky part is to understand how the memory allocation works in Mach and to create an additional struct for the prefetched data. -- cgit v1.2.3 From d47700de2813d7de38b332b65c9249a15f8bea01 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 19:38:23 +0000 Subject: --- user/Etenil.mdwn | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index e96ac699..a19aacd8 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -14,6 +14,8 @@ License|/fdl]]."]]"""]] Write a clusterized pagein (prefetching) mechanism in Mach. +- - - + ## General information on system architecture In order to implement the pagein properly, it was necessary for me to get a general idea of the I/O path that data follows in the Hurd/Mach. To accomplish this, I've investigated top-down from the [[ext2fs]] translator to Mach. This section contains the main nodes that data passes through. @@ -35,8 +37,9 @@ VM allocation happens with a call to: kern_return_t vm_allocate (vm_task_t target_task, vm_address_t *address, vm_size_t size, boolean_t anywhere) +- - - -## Implementation idea +## Implementation plan To start of with, I will toy with the VM (even if it breaks stuff). My initial intent is to systematically allocate more memory than requested in the hope that the excess will be manipulated by the task in the near future, thus saving on future I/O requests. I'd also need to keep track of the pre-allocated memory and so that I can pass it on to the task on demand and prefetch even more. I could also possibly time the prefetched data and unallocate it if it's not requested after a while, but that's just an idea. -- cgit v1.2.3 From 28ceff654b028b2ae816a4131b7e6da46548cf83 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 20:08:15 +0000 Subject: --- user/Etenil.mdwn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index a19aacd8..f19a5697 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -18,14 +18,14 @@ Write a clusterized pagein (prefetching) mechanism in Mach. ## General information on system architecture -In order to implement the pagein properly, it was necessary for me to get a general idea of the I/O path that data follows in the Hurd/Mach. To accomplish this, I've investigated top-down from the [[ext2fs]] translator to Mach. This section contains the main nodes that data passes through. +In order to implement the pagein properly, it was necessary for me to get a general idea of the I/O path that data follows in the Hurd/Mach. To accomplish this, I've investigated top-down from the [[hurd/translator/ext2fs]] translator to Mach. This section contains the main nodes that data passes through. This section is probably unnecessary to implement the prefetcher in Mach, however it is always interesting to understand how things work so we can notice when they get broken. This is based on my understanding of the system and is probably imprecise. Refer to the manuals of both Hurd and Mach for more detailed information. ### Pagers -Pagers are implemented in libpager and provide abstracted access to Mach's [[VM]]. A pager is a struct that contains callback function references. These are used to actually access the storage. In the case of FS translators, like ext2fs, the pager uses libstore to acess the underlying hardware. +Pagers are implemented in libpager and provide abstracted access to Mach's [[microkernel/mach/virtual address space]]. A pager is a struct that contains callback function references. These are used to actually access the storage. In the case of FS translators, like ext2fs, the pager uses libstore to acess the underlying hardware. ### Libstore Libstore provides abstracted access to Mach's storage access. -- cgit v1.2.3 From 166b6203876f583308fcfbd17df3123f90271ecf Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Tue, 15 Feb 2011 20:19:47 +0000 Subject: --- user/Etenil.mdwn | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index f19a5697..3dd41071 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -40,8 +40,16 @@ VM allocation happens with a call to: - - - ## Implementation plan + +### Ideas To start of with, I will toy with the VM (even if it breaks stuff). My initial intent is to systematically allocate more memory than requested in the hope that the excess will be manipulated by the task in the near future, thus saving on future I/O requests. I'd also need to keep track of the pre-allocated memory and so that I can pass it on to the task on demand and prefetch even more. I could also possibly time the prefetched data and unallocate it if it's not requested after a while, but that's just an idea. The tricky part is to understand how the memory allocation works in Mach and to create an additional struct for the prefetched data. + +### Foreseeable difficulties +* Tracking the prefetched memory +* Unallocating prefetched memory along with the requested memory +* Shared prefetched memory (i.e. a task requested memory, some more was prefetched and a second task used the prefetched memory) +* Page faults -- cgit v1.2.3 From 5aef0778f741625959e4d474cac3e6c783c78175 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Wed, 16 Feb 2011 10:12:04 +0000 Subject: --- user/Etenil.mdwn | 3 +++ 1 file changed, 3 insertions(+) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index 3dd41071..6f559154 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -37,6 +37,9 @@ VM allocation happens with a call to: kern_return_t vm_allocate (vm_task_t target_task, vm_address_t *address, vm_size_t size, boolean_t anywhere) + +*vm_allocate()* looks more and more like a red herring. What I'm trying to prefetch is data on hard drives. I'll rather look at the devices in Mach. + - - - ## Implementation plan -- cgit v1.2.3 From 10f09a840a214787e1d8d39807866849e88aeada Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Fri, 18 Feb 2011 19:11:15 +0000 Subject: --- user/Etenil.mdwn | 48 +++++++++++++++--------------------------------- 1 file changed, 15 insertions(+), 33 deletions(-) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index 6f559154..603bbdec 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -12,47 +12,29 @@ License|/fdl]]."]]"""]] ## Current task -Write a clusterized pagein (prefetching) mechanism in Mach. +Implement clustered paging in GNU Mach - - - -## General information on system architecture +## What the problem is +In Mach, memory access is ensured by the VM, an abstraction in the kernel. The VM is mapped by pages, which size is arbitrary and defined based on hardware specs. A single block of memory can then span over many pages, i.e. a file on a file system can represent a lot of pages. -In order to implement the pagein properly, it was necessary for me to get a general idea of the I/O path that data follows in the Hurd/Mach. To accomplish this, I've investigated top-down from the [[hurd/translator/ext2fs]] translator to Mach. This section contains the main nodes that data passes through. +When a process attempts to access pages that don't reside in the physical memory (RAM), the MMU detects this and triggers a page fault. Page faults are then handled and the kernel calls down the process associated with the memory pages on a *one by one basis*. -This section is probably unnecessary to implement the prefetcher in Mach, however it is always interesting to understand how things work so we can notice when they get broken. +This is where the problem lies. Hard disks are inherently efficient at sequentially writing large chunks of data whereas they cope badly with random access, plus the kernel wastes time writing/reading a page and handling the next page. All of these make for slow I/O in Mach. -This is based on my understanding of the system and is probably imprecise. Refer to the manuals of both Hurd and Mach for more detailed information. +## Solutions +There are a couple of ways I could think of to solve this problem. Pages could be enlarged, but that would cause a lot more problems. Or pages must be handled by groups instead of one by one. This means the changes will also need to be applied in the way user-space processes talk to Mach. -### Pagers -Pagers are implemented in libpager and provide abstracted access to Mach's [[microkernel/mach/virtual address space]]. A pager is a struct that contains callback function references. These are used to actually access the storage. In the case of FS translators, like ext2fs, the pager uses libstore to acess the underlying hardware. +## What's already been done +[[hurd/user/KAM]] has already made a patch that provides basic page clustering. I have yet to understand it completely, but there are troubling changes in the patch, most notably the removal of continuations in *vm_fault* and *vm_fault_page*. -### Libstore -Libstore provides abstracted access to Mach's storage access. +So far, what I can tell is that KAM seems to have modified the memory objects in Mach so that they handle clusters of pages. -I am currently looking at the way the stores call Mach, especially for memory allocation. My intuition is that memory is allocated in Mach when the function *store_create()*. I am currently investigating this to see how the memory allocation process happens in practice. +## What I intend to do +Starting from KAM's work, I'll try and at least proxy the current behaviour in the kernel so as to keep backwards compatibility, at least until all user-space processes are converted (maybe some sort of deprecation warning would help porting). I'll also need to modify ext2fs to make it use the clustered paging feature, hopefully it'll improve performance quite a bit. -### Mach -VM allocation happens with a call to: +## Problems +As *braunr* and *antrik* pointed out on IRC, I seriously lack knowledge about kernel programming, and this is quite a big task. I also don't fully understand the inner workings of the kernel yet, even though *braunr* helped me a lot to understand the VM and page handling. - kern_return_t vm_allocate (vm_task_t target_task, vm_address_t *address, vm_size_t size, boolean_t anywhere) - - -*vm_allocate()* looks more and more like a red herring. What I'm trying to prefetch is data on hard drives. I'll rather look at the devices in Mach. - -- - - - -## Implementation plan - -### Ideas -To start of with, I will toy with the VM (even if it breaks stuff). My initial intent is to systematically allocate more memory than requested in the hope that the excess will be manipulated by the task in the near future, thus saving on future I/O requests. - -I'd also need to keep track of the pre-allocated memory and so that I can pass it on to the task on demand and prefetch even more. I could also possibly time the prefetched data and unallocate it if it's not requested after a while, but that's just an idea. - -The tricky part is to understand how the memory allocation works in Mach and to create an additional struct for the prefetched data. - -### Foreseeable difficulties -* Tracking the prefetched memory -* Unallocating prefetched memory along with the requested memory -* Shared prefetched memory (i.e. a task requested memory, some more was prefetched and a second task used the prefetched memory) -* Page faults +I'll do what I can and keep maintaining this page so others may pickup where I left if I were to give up. -- cgit v1.2.3 From d22a3b299d00ce757237f9aee9794d0d4f2758e2 Mon Sep 17 00:00:00 2001 From: "http://etenil.myopenid.com/" Date: Fri, 18 Feb 2011 19:50:45 +0000 Subject: --- user/Etenil.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'user') diff --git a/user/Etenil.mdwn b/user/Etenil.mdwn index 603bbdec..a1a3373b 100644 --- a/user/Etenil.mdwn +++ b/user/Etenil.mdwn @@ -27,7 +27,7 @@ This is where the problem lies. Hard disks are inherently efficient at sequentia There are a couple of ways I could think of to solve this problem. Pages could be enlarged, but that would cause a lot more problems. Or pages must be handled by groups instead of one by one. This means the changes will also need to be applied in the way user-space processes talk to Mach. ## What's already been done -[[hurd/user/KAM]] has already made a patch that provides basic page clustering. I have yet to understand it completely, but there are troubling changes in the patch, most notably the removal of continuations in *vm_fault* and *vm_fault_page*. +[[user/KAM]] has already made a patch that provides basic page clustering. I have yet to understand it completely, but there are troubling changes in the patch, most notably the removal of continuations in *vm_fault* and *vm_fault_page*. So far, what I can tell is that KAM seems to have modified the memory objects in Mach so that they handle clusters of pages. -- cgit v1.2.3