Closed Bug 367447 Opened 17 years ago Closed 17 years ago

Add an API for putting resources in an offline cache

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dcamp, Assigned: dcamp)

References

()

Details

Attachments

(1 file, 8 obsolete files)

 
This patch does three things:

a) It adds a new cache device, CACHE_OFFLINE.  This cache is only searched if explicitly asked for.  It's currently implemented as a second disk cache, but would eventually be implemented with a new (probably sqlite-based) device

b) It adds a cacheOffline attribute to nsICachingChannel, and implements it in nsHttpChannel.  Setting cacheOffline on the channel after creation will cause the channel to update the offline cache.

c) It adds a PrefetchURIOffline method to nsIPrefetchService, and calls it for <link rel="offline-resource">.
Assignee: nobody → dcamp
Status: NEW → ASSIGNED
Doesn't seem like this patch sets a special header for "offline-resource" fetches.   Servers should be able to differentiate the requests via a header.
Indeed, fixed in my working copy.

One problem I'm facing with converting an application to enable offline usage is dealing with pages that process query parameters. For example, I have an 'edit' page that displays a form for editing data. When online this is retrieved with:

  http://site.com/edit?key=foo

The page contains javascript which, if online, does an ajax request to populate the form. If offline it retrieves the data from globalStorage instead.

In my main page for the application I have a link for the offline resource:

  <link rel="offline-resource" href="/edit"/>

With this patch, this is prefetched and put in the offline cache fine. But any links in the application for this page with query parameters fail to retrieve it from the cache as the cache item is keyed by the URL including the query parameters. For example, once offline, this works:

  http://site.com/edit

But this doesn't:

  http://site.com/edit?key=foo

If the URL with the exact query parameters was visited by the user while online, and it exists in the normal cache, then it is successfully retrieved while offline.

Is there a way to do this sort of thing with the offline-resource method? Should the offline cache ignore query parameters when finding the cached resource?

A workaround is to put the data that the page looks for to do the ajax request or globalStorage retrieval encoded in the fragment identifier:

  http://site.com/edit#?key=foo

This works perfectly in that the fragment identifier is not part of the key for identifying the cached resource. It would be nice though if such a workaround wasn't required.
Another way of stating the problem: if you have an application where the user navigates from page to page, when you're online you can pass data from one page to the next using query parameters in the URL. This doesn't work for offline apps, but we still need some mechanism to pass that data. Two alernatives that we've thought of so far:
1) Pass it in the fragment identifier, like Chris said
2) Modify the offline cache lookup to ignore query parameters

#1 is ugly. #2 sounds risky and it will possibly confuse people to have the caches using different policies.

Any better ideas? :-)
Add an optional relation that provides the script with the equivalent of the path-related CGI variables (PATH_INFO, QUERY_STRING, etc).

Observe how env.cgi behaves here:
<http://www.franklinmint.fm/cgi-bin/env.cgi/foo/doesnt-exist.html?bar=baz>

Use 

<link rel="offline-resource path-info" href="/edit"/>

to get this effect.
Seems to me that won't help unless we also go with option #2 (with its potential downsides), and if we do that, the script can already get that information via the DOM 'location' object.
http://lxr.mozilla.org/seamonkey/source/dom/public/idl/base/nsIDOMLocation.idl
it can get its path, but not its path_info. Links of that form would not be very resilient if distributed as a php script, for instance.

Suggestion #2 fixates on one syntactic property of URIs, but more than just the query string is of interest. Most systems that do this sort of thing evolve towards rule systems, and combine application-specific knowledge of the URI layout with matching algorithms. Complicated, I know.

Ignoring query parameters in general has so far turned out to be impractical for HTTP intermediaries (Squid et al. used to do this).
OK then, I'm not really sure what you're proposing. Anyway, this is slightly off topic for this bug so I'll start a thread in mozilla.dev.platform.
(In reply to comment #8)
> 
> Ignoring query parameters in general has so far turned out to be impractical
> for HTTP intermediaries (Squid et al. used to do this).

Correction: they used to treat anything with a query string as a cache miss, not a hit for something else. I still maintain we need a separate indication that the cached resource can administrate the paths below it, if we want to do this at all. Using #1 breaks the intended use of fragment identifiers.
Attached patch version 2 (obsolete) — Splinter Review
This adds the offline-resource header and moves around the offline cache entry initialization a bit.
Attachment #251999 - Attachment is obsolete: true
I think we should try to land this sooner rather than later.

One comment on the patch: I don't think we need to support this as a configure option with #ifdefs. It can be turned off with prefs and small mobile devices are where this could be very useful...
I had to make a change to get this (and the original patch) to build on Windows. 

nsPrefetchService::Prefetch is declared as having a return type of NS_IMETHODIMP in the .cpp file but nsresult in the header. Changing to nsresult enabled it to build. This is in file uriloader/prefetch/nsPrefetchService.cpp
Attached patch v3 (obsolete) — Splinter Review
Attachment #252680 - Attachment is obsolete: true
Attachment #252787 - Flags: review?(darin.moz)
Comment on attachment 252787 [details] [diff] [review]
v3

>Index: content/base/src/nsContentSink.cpp

>+  // fetch href into the offline cache if relation is "offline-resource"
>+  if (linkTypes.IndexOf(NS_LITERAL_STRING("offline-resource")) != -1) {
>+    FetchOfflineHref(aHref, PR_TRUE);
>+  }

I presume there is some kind of spec for this someplace?  Sorry, I haven't
been keeping up on this topic.



>Index: modules/libpref/src/init/all.js

>+pref("browser.cache.offline.enable",           true);
>+pref("browser.cache.offline.capacity",         51200);

Add a comment explaining the units of capacity.  Kilobytes?


>Index: netwerk/base/public/nsICachingChannel.idl

>     /**
>+     * Specifies whether or not the data should be placed in the offline cache.
>+     * This may fail if the disk cache is not present.  The value of this
>+     * attribute should be set before opening the channel.
>+     */
>+    attribute boolean cacheOffline;

Whenever you change an interface, you _must_ change the UUID
property or else you will cause extensions and plug-ins to 
crash Firefox because they will be using the old version of
the interface.


>Index: netwerk/cache/public/nsICache.idl

>+     * STORE_ANYWHERE        - Allows the cache entry to be stored in any device
>+     *                         but the offline store.
>      *                         The cache service decides which cache device to use
>      *                         based on "some resource management calculation."

STORE_ANYWHERE becomes a confusing name, doesn't it?


>Index: netwerk/cache/src/nsCacheService.cpp

>+#if 0
>+        } else if (!strcmp(DISK_OFFLINE_DIR_PREF, data.get())) {
>+            // XXX We probaby don't want to respond to this pref except after
>+            // XXX profile changes.  Ideally, there should be somekind of user
>+            // XXX notification that the pref change won't take effect until
>+            // XXX the next time the profile changes (browser launch)
>+#endif

s/somekind/some kind/


>+    mOfflineDevice = new nsDiskCacheDevice;

hmm... the nsDiskCacheDevice is lossy.  it does not handle hash
conflicts well.  are you sure you want to use it for offline
store?  seems like the wrong choice.  also, this device does not
have a way to serialize SSL certificates (nsIChannel::securityInfo),
so if you want to use this system to make HTTPS content available
offline, then much more work needs to be done.


>+#if DEBUG
>+        printf("###\n");
>+        printf("### mOfflineDevice->Init() failed (0x%.8x)\n", rv);
>+        printf("###    - disabling offline cache for this session.\n");
>+        printf("###\n");
>+#endif

You should prefer using PR_LOG instead of printf.


>Index: netwerk/cache/src/nsCacheSession.h

>     enum SessionInfo {
>-        eStoragePolicyMask        = 0x000000FF,
>-        eStreamBasedMask          = 0x00000100,
>-        eDoomEntriesIfExpiredMask = 0x00001000
>+        eStoragePolicyMask        = 0x00000FFF,
>+        eStreamBasedMask          = 0x00001000,
>+        eDoomEntriesIfExpiredMask = 0x00010000
>     };

Why this change?


>Index: netwerk/protocol/http/src/nsHttpChannel.cpp
>Index: uriloader/prefetch/nsIPrefetchService.idl

>+    
>+    /**
>+     * Enqueue a request to prefetch the specified URI, making sure it's in the
>+     * offline cache.
>+     *
>+     * @param aURI the URI of the document to prefetch
>+     * @param aReferrerURI the URI of the referring page
>+     * @param aExplicit the link element has an explicit offline link type
>+     */
>+    void prefetchOfflineURI(in nsIURI aURI, in nsIURI aReferrerURI, in boolean aExplicit);

ding!  you need to change the interface's UUID property.


>Index: uriloader/prefetch/nsPrefetchService.cpp

>+                if (NS_FAILED(cachingChannel->SetCacheOffline(PR_TRUE))) {

By the way, maybe this method should be called "SetCacheForOfflineUse" or
something like that.
Attachment #252787 - Flags: review?(darin.moz) → review-
This is only an interim solution. There isn't a real nailed down spec and it's premature to write one without some implementation and usage experience; the psuedo-spec we have is described in http://wiki.mozilla.org/OfflineAppsSummit2006 which is basically the same as the proposal in the thread starting here: http://groups.google.com/group/mozilla.dev.platform/msg/a298294c27b9380a. Seems to us that we probably won't need to change it much.

Similarly, the disk cache device is not a viable long term solution but it's good enough to develop offline apps with. Furthermore changing to a better cache device won't impact apps or the spec.

We could have everyone who develops an offline app patch their Firefox build, but it seems easier to just put the patch on trunk.
Attached patch v4 (obsolete) — Splinter Review
This version changes STORE_ANYWHERE to check the offline cache if the client is offline.  This makes STORE_ANYWHERE consistent.

The HTTP channel still uses a different cache session name to place items in the offline cache, so in practice the STORE_ANYWHERE check will never turn up anything useful.

The other issues should be fixed.
Attachment #252787 - Attachment is obsolete: true
Attachment #253272 - Flags: review?(darin.moz)
If a resource already exists in the normal cache, and I have an "offline-resource" request for it, it doesn't appear to place the resource in the offline cache. The situation I have is the following in a page:

<link rel="offline-resource" href="/ajax.jar">
<script type="text/javascript" src="jar:/ajax.jar!/file.js"></script> 

The script request is executed first which places the ajax.jar in the normal cache. The offline-resource request is then later processed but doesn't place it in the offline cache I'm assuming because it exists in the normal cache.

If I remove the <script> line then ajax.jar is put in the offline cache.

Should the existence of the 'offline-resource' request always put the resource in the offline cache?
Attached patch v5 (obsolete) — Splinter Review
Fixed the bug with already-cached resources.
Attachment #253272 - Attachment is obsolete: true
Attachment #253695 - Flags: review?(darin.moz)
Attachment #253272 - Flags: review?(darin.moz)
Attached patch v6 (obsolete) — Splinter Review
This version fixes a problem when the offline cache was checked while disabled, and updates some naming in the prefetch service/content sink.
Attachment #253695 - Attachment is obsolete: true
Attachment #254845 - Flags: review?(darin.moz)
Attachment #253695 - Flags: review?(darin.moz)
Blocks: 370195
Attachment #254845 - Flags: review?(darin.moz) → review?(cbiesinger)
What about using this feature in distant XUL pages ? There is no <link> tag in XUL, but it could be very useful to use this feature. why not adding a processing instruction like <?xul-cache?> for it ? What do you think about it ?

Note : the v6 patch doesn't content any more, patch on nsContentSink.cpp, nsContentSink.h,nsHTMLContentSink.cpp (so nothing about the <link> element). Is it normal ?
Attached patch v6, for real (obsolete) — Splinter Review
urk, mixed up the sql device patch from another bug;  this is the correct v6.
Attachment #254845 - Attachment is obsolete: true
Attachment #255367 - Flags: review?(cbiesinger)
Attachment #254845 - Flags: review?(cbiesinger)
Blocks: 372969
Blocks: 372970
Blocks: 373231
Comment on attachment 255367 [details] [diff] [review]
v6, for real

In nsContentSink, have you considered just adding a third parameter to
PrefetchHref? The two functions seem exactly identical except for which of the
two functions they call on the prefetchService, seems like a waste to have to
duplicate the rest of the code.

netwerk/base/public/nsICachingChannel.idl

Would be nice to document if cacheForOfflineUse places the file in the offline
cache in addition to the normal cache or instead of it.

netwerk/cache/src/nsCacheService.cpp
         prefs->RemoveObserver(DISK_CACHE_DIR_PREF, this);
+        
+        // remove Offline cache pref observers

please don't add trailing whitespace (second line here, and other places too)

+    mIOService = do_GetService("@mozilla.org/network/io-service;1", &rv);

could use do_GetIOService() from nsNetUtil.h here

netwerk/cache/src/nsDiskCacheDevice.cpp

Couldn't you now change SetCacheParentDirectory to use
SetCacheParentDirectoryAndName to avoid some code duplication? Or, did you
want to avoid that because the offline cache will soon use a different
implementation?

netwerk/protocol/http/src/nsHttpChannel.cpp
         }
- 
+        
         if (NS_SUCCEEDED(rv) && delayed)

Please don't add trailing whitespace :) (in a few other places in this file
too)

+    // if cacheForOfflineUse has been set, open up an offline cache entry to update
+    if (mCacheForOfflineUse) {
+        rv = OpenOfflineCacheEntry();
+        if (NS_FAILED(rv)) return rv;
+    }

should that really be outside the firstTime if?

I think you should rename some of the offline cache functions, they aren't
quite the same as the "normal" cache functions. Something like:
  CheckOfflineCacheEntry -> ShouldUpdateOfflineCacheEntry
  OpenOfflineCacheEntry  -> OpenOfflineCacheEntryForWriting

+    nsCAutoString cacheKey;
+    
+    GenerateCacheKey(cacheKey);

please remove that empty line

+    if (mOfflineCacheEntry && !(mCacheAccess & nsICache::ACCESS_READ)) {

Shouldn't the second part really refer to mOfflineCacheAccess? Also, shouldn't
it compare to == ACCESS_WRITE? (since it's not enough to check for lack of
read access, you also need to have write access...)

+    nsresult rv = mCacheEntry->GetLastModified(&cacheLastModifiedTime);

Is there anything that guarantees a non-null mCacheEntry at this point?
It might be better to check the server-sent last modified date here
(mResponseHead->GetLastModifiedValue)

+    if (mResponseHead->NoStore()) {
+        CloseOfflineCacheEntry();

What about NoCache()? Also... perhaps this should use MustValidate(), the
argument being that if the author asks for the page not to be cached/always
validated, we should honor that? (I'm open to arguments against this)

+nsHttpChannel::AddCacheEntryHeaders(nsICacheEntryDescriptor *entry)
+{
+    nsresult rv;
+    
+    if (!entry)
+        return NS_OK;

Can entry ever be null here?

+    LOG(("Preparing to write data into the offline cache [uri=%2]\n",
+         mSpec.get()));

%2?


uriloader/prefetch/nsIPrefetchService.idl

Please use spaces rather than tabs

uriloader/prefetch/nsPrefetchService.cpp
+    nsCOMPtr<nsICachingChannel> oldCachingChannel(do_QueryInterface(aOldChannel));
+    if (NS_SUCCEEDED(oldCachingChannel->GetCacheForOfflineUse(&offline)) && offline) {
+        nsCOMPtr<nsICachingChannel> newCachingChannel(do_QueryInterface(aOldChannel));
+        newCachingChannel->SetCacheForOfflineUse(PR_TRUE);

I don't think you should assume that newCachingChannel QIs to
nsICachingChannel before the scheme check. Seems like a crash waiting to
happen.

+                    nsCOMPtr<nsICachingChannel> cachingChannel(do_QueryInterface(mCurrentChannel, &rv));

please limit your lines to 80 characters, e.g.:
                    nsCOMPtr<nsICachingChannel> cachingChannel =
                        do_QueryInterface(mCurrentChannel, &rv);

-    return EnqueueURI(aURI, aReferrerURI);
+    
+    return EnqueueURI(aURI, aReferrerURI, aOffline);

trailing whitespace again


As mentioned on IRC: Perhaps, instead of getting the IO service in various
places, you could use the gIOService variable from nsIOService.h and add a
PRBool IsOffline() accessor.
Attachment #255367 - Flags: review?(cbiesinger) → review-
OS: Mac OS X → All
Hardware: PC → All
(In reply to comment #24)

> Couldn't you now change SetCacheParentDirectory to use
> SetCacheParentDirectoryAndName to avoid some code duplication? Or, did you
> want to avoid that because the offline cache will soon use a different
> implementation?

Yeah, I figured I'd just drop SetCacheParentDirectoryAndName after adding
the new device.

> What about NoCache()? Also... perhaps this should use MustValidate(), the
> argument being that if the author asks for the page not to be cached/always
> validated, we should honor that? (I'm open to arguments against this)

As I understand it, MustValidate() and NoCache() don't affect the cache entry going in, they're checked when the entry is being read from the cache.  Both are ignored when the browser is offline.

MustValidate() does affects the stored expiration time, but offline entries don't expire (the expiration is only checked when the browser is online).
Attached patch v7 (obsolete) — Splinter Review
Fixes for the other comments.
Attachment #255367 - Attachment is obsolete: true
Attachment #258386 - Flags: review?(cbiesinger)
Comment on attachment 258386 [details] [diff] [review]
v7

+    rv = mOfflineCacheEntry->GetLastModified(&offlineLastModifiedTime);
+    NS_ENSURE_SUCCESS(rv, rv);

Hm... this will fail when the server didn't send a last-modified header. Isn't the correct behaviour for that case to update the cache entry, since we don't know if it did change?
Attachment #258386 - Flags: review?(cbiesinger) → review+
Attached patch v8Splinter Review
Replaced with 

+    nsresult rv = mResponseHead->GetLastModifiedValue(&docLastModifiedTime);
+    if (NS_FAILED(rv)) {
+        *shouldCacheForOfflineUse = PR_TRUE;
+        return NS_OK;
+    }
Attachment #258386 - Attachment is obsolete: true
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Let me know about any additional functionality you need to be able to test this using the HTTP server, and file bugs as needed:

http://developer.mozilla.org/en/docs/HTTP_server_for_unit_tests

Bug 370245 is current state-of-the-art for future directions for the server, but the latest patch there exposes some fundamental problems in threads and XPConnect that mean that it's conceivable that what's there may not be committed in anything approaching its current form.
Flags: in-testsuite?
Blocks: 379590
this functionality is normally used from 'file:///' uris or web server on localhost when offline?
Blocks: 398161
Blocks: 727579
No longer blocks: 727579
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: