From d149a746478ae0178e16983ac61bb255dd3d7205 Mon Sep 17 00:00:00 2001
From: Joshua Branson <jbranso@dismail.de>
Date: Thu, 10 Sep 2020 09:50:28 -0400
Subject: Better introduce httpfs and xmlfs

hurd/translator/httpfs.mdwn: I added a Intro, how to use, and TODO section.
hurd/translator/xmlfs.mdwn: I added a How to use and TODO wishlist section.

I copied most of the text from the Hurd extras repos.
Message-Id: <20200910135028.27288-1-jbranso@dismail.de>
---
 hurd/translator/httpfs.mdwn | 73 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

(limited to 'hurd/translator/httpfs.mdwn')
diff --git a/hurd/translator/httpfs.mdwn b/hurd/translator/httpfs.mdwn
index 8b02aa06..0fc6fbbd 100644
--- a/hurd/translator/httpfs.mdwn
+++ b/hurd/translator/httpfs.mdwn
@@ -12,6 +12,79 @@ License|/fdl]]."]]"""]]
 
 While the httpfs translator works, it is only suitable for very simple use cases: it just provides the actual file contents downloaded from the URL, but no additional status information that are necessary for interactive use. (Progress indication, error codes, HTTP redirects etc.)
 
+# Intro
+INTRODUCTION:
+
+Here we describe the structure of the /http filesystem for the Hurd.
+Under the Hurd, we provide a translator called 'httpfs' which is intended
+to provide the filesystem structure.
+
+The httpfs translator accepts an "http:// URL" as an argument. The underlying
+node of the translator can be a file or directory. This is guided by the --mode
+command lineoption. Default is a directory.
+
+If its a file, only file system read requests are supported on that node.  If
+its a directory, we can cd into that directory and ls would list the files in
+the web server. A web server may provide a directory listing or it may not
+provide, whatever it be the case the web server always returns an HTML stream
+for an user request (GET command). So to get the files residing in the web
+server, we have to parse the incoming HTML stream to find out the anchor
+tags. These anchor tags point to different pages or files in the web
+server. These file name are extracted and filled into the node of the
+translator. An anchor tag can also be a pointer to an external URL, in such a
+case we just show that URL as a regular file so that the user can make file
+system read requests on that URL. In case the file is a URL, we change the name
+of URL by converting all the /'s with .'s so that it can be displayed in the
+file system.
+
+Only the root node is filled when the translator is set, subdirectories inside
+that are filled as on demand, i.e. when a cd or ls occurs on that particular sub
+directory.
+
+The File size is now displayed as 0. One way of getting individual file sizes is
+sending a GET request for each file and cull the file size from Content-Length
+field of an HTTP response. But this may put a very heavy burden on the network,
+So as of now we have not incorporated this method with this http translator.
+
+The translator uses the libxml2 library for doing the parsing of HTML
+stream. The libxml2 provides SAX interfaces for the parser which are used for
+finding the begining of anchor tags <A href="i.html>. So the translator has
+dependency on the libxml2 library.
+
+If the connection to the Internet through a proxy, then the user must explicitly
+give the IP address and port of the proxy server by using the command line
+options --proxy and --port.
+
+
+# How to Use httpfs
+
+    # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html
+
+<Remember to give the / at the end of the URL, unless you are specifying a specific file like www.hurd-project.com/httpfs.html >
+
+    # cd tmp/
+
+    # ls -l
+
+    # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html --proxy=192.168.1.103
+				--port=3126
+
+The above command should be used in case if the access to the Internet is
+through a proxy server, substitute your proxies IP and port no.s
+
+# TODO
+
+- https:// support
+- scheme-relative URL support (eg. "//example.com/")
+- query-string and fragment support
+- HTTP/1.1 support
+- HTTP/2 support
+- HTTP/3 support
+- Teach httpfs to understand HTTP status codes like re-directs, 404 not found,
+  etc.
+- Teach httpfs to look for "sitemaps".  Many sites offer a sitemap, and this
+  would be a nifty way for httpfs to allow grep-ing the entire site's contents.
+
 # Source
 
 <http://www.nongnu.org/hurdextras/#httpfs>
-- 
cgit v1.2.3