[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] `httpfs` is a virtual filesystem allowing you to access web pages and files. While the httpfs translator works, it is only suitable for very simple use cases: it just provides the actual file contents downloaded from the URL, but no additional status information that are necessary for interactive use. (Progress indication, error codes, HTTP redirects etc.) # Introduction Here we describe the structure of the /http filesystem for the Hurd. Under the Hurd, we provide a translator called 'httpfs' which is intended to provide the filesystem structure. The httpfs translator accepts an "http:// URL" as an argument. The underlying node of the translator can be a file or directory. This is guided by the --mode command lineoption. Default is a directory. If its a file, only file system read requests are supported on that node. If its a directory, we can cd into that directory and ls would list the files in the web server. A web server may provide a directory listing or it may not provide, whatever it be the case the web server always returns an HTML stream for an user request (GET command). So to get the files residing in the web server, we have to parse the incoming HTML stream to find out the anchor tags. These anchor tags point to different pages or files in the web server. These file name are extracted and filled into the node of the translator. An anchor tag can also be a pointer to an external URL, in such a case we just show that URL as a regular file so that the user can make file system read requests on that URL. In case the file is a URL, we change the name of URL by converting all the /'s with .'s so that it can be displayed in the file system. Only the root node is filled when the translator is set, subdirectories inside that are filled as on demand, i.e. when a cd or ls occurs on that particular sub directory. The File size is now displayed as 0. One way of getting individual file sizes is sending a GET request for each file and cull the file size from Content-Length field of an HTTP response. But this may put a very heavy burden on the network, So as of now we have not incorporated this method with this http translator. The translator uses the libxml2 library for doing the parsing of HTML stream. The libxml2 provides SAX interfaces for the parser which are used for finding the begining of anchor tags ``. So the translator has dependency on the libxml2 library. If the connection to the Internet through a proxy, then the user must explicitly give the IP address and port of the proxy server by using the command line options --proxy and --port. # How to Use httpfs # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html # cd tmp/ # ls -l # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html --proxy=192.168.1.103 --port=3126 The above command should be used in case if the access to the Internet is through a proxy server, substitute your proxies IP and port no.s # TODO - https:// support - scheme-relative URL support (eg. "//example.com/") - query-string and fragment support - HTTP/1.1 support - HTTP/2 support - HTTP/3 support (there may exist a C library that provides HTTP/[123] support). - Teach httpfs to understand HTTP status codes like re-directs, 404 not found, etc. - Teach httpfs to look for "sitemaps". Many sites offer a sitemap, and this would be a nifty way for httpfs to allow grep-ing the entire site's contents. [[sitemaps.org|https://www.sitemaps.org]] is a great resource for this. - Teach httpfs to check if the computer has an internet connection at startup and during operation. The translator causes 30 second pauses on commands like "ls", when the internet is down. # Source # Documentation ## IRC, freenode, #hurd, 2013-09-03 [[!tag open_issue_documentation]] hi, why I can't cd to /http:/richtlijn.be/~larstiq/hurd/ to do grep? this is not ftp it works for other file ? I can't cd to ~larstiq, I don't know why http is not a filesystem protocol while httpfs could try in representing it as such, it is not something reliable ok, it's not reliable I expect it can expose dir like browser so, the translator just know href from home page, and one by one uh? if ...:80/a/b/c.png exits, but not has a href in homepage, so I can't cd to a, right? you are looking things from the wrong point of view a web server can do anything with URLs, including redirecting, URL rewriting and whatever else so, how to understand httpfs's idea? how httpfs list dir? check its code en, no need it's not reliable it's not work, it's enough I have an idea, for the file system, we explore dir level by level, but for http, we change full path one once time maybe can allow user to cd any directory, and just mark as some special color to make user know the translator was not sure, file exist or not once the file exits, mark all the parent directory as normal color? congzhang: you can find more info about httpfs here: http://nongnu.org/hurdextras/ congzhang: you're still looking at http from the wrong point of view there are no directories nor files you start a request for a URL, and you get some content back (in case there's no error) you mean httpfs just for kidding? that the content is a web page listing a directory on the filesystem of the web server machine, or a file sent back via the connection, or a complex web page, it's the same congzhang: you can only get a list of files if the web server responds with an index of files "files" The readme explains how httpfs does its thing: http://cvs.savannah.gnu.org/viewvc/*checkout*/httpfs/README?root=hurdextras if I can't cd to /http:/host/a/b how to get /http:/host/a/b/c.html, even the file exist? you don't cd in http cd is for changing directory, which does not apply to a protocol like http which is not fs-oriented yes, I agree with you, http was not fs-oriented so httpfs was not so useful You can access the document directly, though, can't you? rekado: I try once more I can't so, the httpfs need some extend, http protocol was not fs oriented, so need some extend to make it work with file system http is not designed for file system usage, so extending it is useless or, httpfs was not so useful there are many other protocols for file systems I don't think so i do if we can't make it more useful, remove it from hurd rep, or extend it more useful add some more rule, to make it work with file system no some paradox in it which paradox? for http vs file system ??? tree oriented and star topology oriented? you don't make any sense