Automatically Generating an HTML5-style Cache Manifest from the Command Line

HTML5 introduces the ability to cache content client-side so that often-used resources can be used without re-downloading them. This also enables a site to be viewed from the client when no network connection is available (i.e., offline viewing of the site).

In order for this to work, there are a few things one must do:

  1. Create a plain text file listing all of the resources that should be cached by the user agent (e.g., a web browser)— the cache manifest.
  2. Refer to that file in the opening html tag of every page that will use cached resources.
  3. Configure the web server so that the file is sent to the user agent with a specific MIME type: text/cache-manifest
  4. Regenerate the cache manifest any time you change the files in your site.
Once everything is setup properly, you can visit the site using your favorite web browser. Then, to test whether the caching has worked, you can turn off the network connection to your web browser’s computer and try reloading the page.


Step 1: Create the Cache Manifest

The cache manifest is a plain text file that lists each resource to cache. This file can be created with any plain text editor. A basic manifest contains a required header line indicating that it’s a cache manifest, followed by one resource per line.

CACHE MANIFEST
index.html
main.css
scripts/main.js
images/logo.png
images/banner.png

For sites using JavaScript libraries, such as jQuery, that can be a long list of files.

For simple manifests, where all of the files in a given subdirectory need to be cached, first change directory (cd) to the root folder of your site (where the index.html file is located). Then, the manifest can be created from the OS X terminal/Unix/Linux command prompt with this one-line command:

echo "CACHE MANIFEST" > cache.manifest; find . -type f | sed "s#^\./##" | grep -vi "ds_store" >> cache.manifest

Taking each part in turn:

echo "CACHE MANIFEST" > cache.manifest;

Output the string literal CACHE MANIFEST and direct the output to a new file named cache.manifest

find . -type f

Use the find command to identify all files (-type f), starting from the current directory (.)

| sed "s#^\./##"

Pipe (that is, “direct the output from the previous command”) the result into the stream editor command: sed. Using sed, substitute every occurrence of a literal period (.) followed by a forward slash (/) that occurs at the beginning of a line (^) with the empty string (##). NB: the empty string is denoted by the lack of content between the octothorps, which act here as delimiters to separate the command (s) from the search pattern (^./) from the replacement pattern (which is the empty string).

| grep -vi "ds_store"

Pipe the result into the Global Regular Expression Parser: grep. Using grep, find all lines that do NOT contain (-v) the case-insensitive (-i) string literal “ds_store”. Mac OS X stores metadata about files and folders in a hidden file named .DS_STORE. This step removes those files from the manifest list.

>> cache.manifest

and append (>>) the results to the file named cache.manifest


Step 2: Refer to the Cache Manifest in your html opening tag

To do this, simply add to each of your HTML pages that should use cached resources, in the opening html tag, a manifest attribute with a value equal to the path to and filename of the manifest:

<html manifest="cache.manifest">

Step 3: Configure the web server so that the file is sent to the user agent with a specific MIME type: text/cache-manifest

This requires either writing a script on the server side in a language of your choosing or configuring the web server to handle the new MIME type. If you have control over the hidden (or possibly not yet created) .htaccess file in your site’s root folder (where the index.html file is located), then try adding the following line to that plain text file:
AddType text/cache-manifest appcache manifest
 You can test what MIME type is being returned using a simple curl command from the terminal:
curl --head http://yourhost/cache.manifest

HTTP/1.1 200 OK
Date: Sun, 17 Jun 2012 17:06:22 GMT
Server: Apache/2.2.21 (Unix) DAV/2
Last-Modified: Sat, 16 Jun 2012 22:26:36 GMT
ETag: "a48bbe-2a-4c29e6cfd3f00"
Accept-Ranges: bytes
Content-Length: 42
Content-Type: text/cache-manifest
Note the “Content-Type” entry. You want to see “text/cache-manifest” here. If you see any other content type, the caching won’t work as planned.

Step 4: Regenerate the cache manifest any time you change the files in your site

You must regenerate and reload the site in your web browser any time you want to make changes to the content of the site: new graphics, typo corrections, new content, etc. In fact, if a file is cached locally using these HTML5 techniques, then the only ways to get it to refresh in your web browser is either to empty your browser’s cache or to change the contents of the cache manifest. Simply changing the content in a file on your site (say, index.html) won’t cause the new content to be delivered to the web browser, since the cache manifest says not to re-download that file.

One thought on “Automatically Generating an HTML5-style Cache Manifest from the Command Line”

  1. Don’t forget that IOS devices don’t like spaces in their cache manifest file names.

    The command below extends what you’ve proposed to replace each space with a %20

    echo “CACHE MANIFEST” > cache.manifest; echo “# generated: “$(date)>> cache.manifest;find . -type f | sed “s#^\./##” | sed “s/ /%20/g” | grep -vi “ds_store” >> cache.manifest

    Appreciate this is an old post, but its also the first hit if you’re searching for how to automatically generate an html5 cache manifest (which I was).

Comments are closed.