diff options
-rw-r--r-- | hurd/translator/httpfs.mdwn | 73 | ||||
-rw-r--r-- | hurd/translator/xmlfs.mdwn | 74 |
2 files changed, 147 insertions, 0 deletions
diff --git a/hurd/translator/httpfs.mdwn b/hurd/translator/httpfs.mdwn index 8b02aa06..0fc6fbbd 100644 --- a/hurd/translator/httpfs.mdwn +++ b/hurd/translator/httpfs.mdwn @@ -12,6 +12,79 @@ License|/fdl]]."]]"""]] While the httpfs translator works, it is only suitable for very simple use cases: it just provides the actual file contents downloaded from the URL, but no additional status information that are necessary for interactive use. (Progress indication, error codes, HTTP redirects etc.) +# Intro +INTRODUCTION: + +Here we describe the structure of the /http filesystem for the Hurd. +Under the Hurd, we provide a translator called 'httpfs' which is intended +to provide the filesystem structure. + +The httpfs translator accepts an "http:// URL" as an argument. The underlying +node of the translator can be a file or directory. This is guided by the --mode +command lineoption. Default is a directory. + +If its a file, only file system read requests are supported on that node. If +its a directory, we can cd into that directory and ls would list the files in +the web server. A web server may provide a directory listing or it may not +provide, whatever it be the case the web server always returns an HTML stream +for an user request (GET command). So to get the files residing in the web +server, we have to parse the incoming HTML stream to find out the anchor +tags. These anchor tags point to different pages or files in the web +server. These file name are extracted and filled into the node of the +translator. An anchor tag can also be a pointer to an external URL, in such a +case we just show that URL as a regular file so that the user can make file +system read requests on that URL. In case the file is a URL, we change the name +of URL by converting all the /'s with .'s so that it can be displayed in the +file system. + +Only the root node is filled when the translator is set, subdirectories inside +that are filled as on demand, i.e. when a cd or ls occurs on that particular sub +directory. + +The File size is now displayed as 0. One way of getting individual file sizes is +sending a GET request for each file and cull the file size from Content-Length +field of an HTTP response. But this may put a very heavy burden on the network, +So as of now we have not incorporated this method with this http translator. + +The translator uses the libxml2 library for doing the parsing of HTML +stream. The libxml2 provides SAX interfaces for the parser which are used for +finding the begining of anchor tags <A href="i.html>. So the translator has +dependency on the libxml2 library. + +If the connection to the Internet through a proxy, then the user must explicitly +give the IP address and port of the proxy server by using the command line +options --proxy and --port. + + +# How to Use httpfs + + # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html + +<Remember to give the / at the end of the URL, unless you are specifying a specific file like www.hurd-project.com/httpfs.html > + + # cd tmp/ + + # ls -l + + # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html --proxy=192.168.1.103 + --port=3126 + +The above command should be used in case if the access to the Internet is +through a proxy server, substitute your proxies IP and port no.s + +# TODO + +- https:// support +- scheme-relative URL support (eg. "//example.com/") +- query-string and fragment support +- HTTP/1.1 support +- HTTP/2 support +- HTTP/3 support +- Teach httpfs to understand HTTP status codes like re-directs, 404 not found, + etc. +- Teach httpfs to look for "sitemaps". Many sites offer a sitemap, and this + would be a nifty way for httpfs to allow grep-ing the entire site's contents. + # Source <http://www.nongnu.org/hurdextras/#httpfs> diff --git a/hurd/translator/xmlfs.mdwn b/hurd/translator/xmlfs.mdwn index a4de1668..bde5960b 100644 --- a/hurd/translator/xmlfs.mdwn +++ b/hurd/translator/xmlfs.mdwn @@ -11,6 +11,80 @@ License|/fdl]]."]]"""]] `xmlfs` is a translator that provides access to XML documents through the filesystem. +# How to Use xmlfs + + xmlfs - a translator for accessing XML documents + +This is only an alpha version. It works in read only. It supports +text nodes and attributes. It doesn't do anything fancy like size +computing, though. Here is an example of how to use it: + + $ wget http://cvs.savannah.nongnu.org/viewvc/*checkout*/hurdextras/xmlfs/example.xml?content-type=text%2Fplain; + $ settrans -ca xml /hurd/xmlfs example.xml #the website says to use ./xmlfs + $ cd xml; ls + library0 library1 + $ cd library0; ls -A + .text1 .text2 @name book0 book1 book2 sub-library0 sub-library1 + $ cat .text2 + +CDATA, again ! + + $ cat book0 + <book> + <author>Mark Twain</author> + <title>La case de l'oncle Tom</title> + <isbn>4242</isbn> + </book> + $ cat book0/author/.text + Mark Twain + +As you can see, text nodes are named .textN, with N an integer +starting from 0. Sorting is supposed to be stable, so you get the same +N every time you access the same file. If there is only one text node +at this level, N is ommitted. Attributes are prefixed with @. + +An example file, example.xml, is provided. Of course, it does not +contain anything useful. xmlfs has been tested on several-megabytes +XML documents, though. + +Comments are welcome. + + -- Manuel Menal <mmenal@hurdfr.org> + +# TODO +- Handle memory usage in a clever way: + - do not dump the nodes at each read, try to guess if read() + is called in a sequence of read() operations (e.g. cat reads + 8192 bytes by 8192 bytes) and if it is, cache the node + contents. That'd need a very small ftpfs-like GC. + - perhaps we shouldn't store the node informations from + first access to end and have a pool of them. That might come + with next entries though. + - Handle changes of the backing store (XML document) while running. + (Idea: we should probably attach to the XML node and handle + read()/write() operations ourselves, with libxml primitives.) + - Write support. Making things like echo >, sed and so on work is + quite obvious. Editing is not -that- simple, 'cause we could + want to save a not XML well-formed, and libxml will just return + an error. Perhaps we should use something like 'sync'. + - Handle error cases in a more clever way ; there are many error + conditions that will just cause xmlfs to crash or do strange + things. We should review them. + - Make sorting *really* stable. + +# TODO WISHLIST +-------- + + - Kilobug suggested a --xslt option that would make xmlfs provide + a tree matching the XSLT-modified document. + (Problem: In this case we cannot attach easily to the .xml 'cause + the user would loose access to theirs original document. Perhaps + we should allow an optional "file.xml" argument and check if it + is not the same as the file we are attaching to when --xslt is + specified.) + - DTD support ; perhaps XML schema/RelaxNG when I'm sure I understand + them ;-) + # Source |