option to make Request cached

RESOLVED WONTFIX

Status

P3
enhancement
RESOLVED WONTFIX
8 years ago
4 years ago

People

(Reporter: mcepl, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

8 years ago
mcepl: what do you think about "cached" Boolean parameter to Request, which would (using window.applicationCache or localStorage or what) made the resource always available (even when the remote server is down, client computer is offline, etc.)?
myk: mcepl: hmm, how would that work if the network is down the first time you issue the request?
mcepl: then you are out of luck of course
mcepl: I just experienced last week that my (supposedly) very very reliable remote server (fedorapeople.org) was down, and I have my main configuration in JSON hosted on the remote server.
myk: mcepl: hmm, so the idea is that "cached" would be much simpler than writing the data into simple storage and then falling back to simple storage if the request fails
mcepl: and caring whether the cache expired, and whether we are currently actually online or offline, and ...
OS: Linux → All
Priority: -- → P3
Hardware: x86_64 → All
Target Milestone: --- → Future
(Reporter)

Comment 1

7 years ago
Created attachment 538889 [details] [diff] [review]
first draft of one possible way how to resolve the issue

There are two possible ways how to go about this bug and I don't like one, but I am not able to make it with other way:

1) Something like what's drafted in this patch: using localStorage do my own caching. I don't like it much, because I am afraid of all bugs which were successfully resolved with caching and which I would have to effectively backport to my script, work which was already doen down in XULRunner code.

2) Somehow connecting myself to the prefetchURIForOfflineUse or nsIOfflineCacheUpdate (see for example https://groups.google.com/d/msg/mozilla.dev.tech.network/PAgz7vHueOo/-djEenBTvIAJ for more), but it seems that this interface is still way below XMLHttpRequest and to get from that all the way to Jetpack's request would be quite a way to go. Also these methods seems not be documented anywhere much.

Any ideas?
Attachment #538889 - Flags: feedback?
Hmm, sounds like the discussion would benefit from some Gecko engineering input.  Cc:ing bz and sicking, who have helped me with XMLHttpRequest issues in the past.

Boris, Jonas: any thoughts on the best way to implement this functionality?
What functionality are we implementing?  What's "Request"?  What kind of cache semantics are we looking for here?
(Reporter)

Comment 4

7 years ago
Just the record of todays chat on #jetpack on this theme which may explain what is this bug all about:

[16:05:07] mcepl: hmm ... anybody any comments on https://bugzilla.mozilla.org/show_bug.cgi?id=655749#c1, please?
[16:14:52] alexp_: mcepl: I'd prefer this feature to be on top of Request instead of being inside of it. i.e CachedRequest that delegate to Request
[16:15:33] mcepl: OK, that's possible ... and what about my dilema between localStorage and nsIOfflineCacheUpdate (or prefetchURIForOfflineUse)?
[16:15:51] mcepl: (neither of which has any documentation it seems)
[16:17:26] mcepl: meaning the offline cache doesn't ... localStorage is obviously very well documented.
[16:17:49] alexp_: I didn't knew about nsIOfflineCacheUpdate
[16:18:08] mcepl: :)
[16:18:36] mcepl: I am really afraid that I would have to reimplement all the horrors of caching myself.
[16:19:02] mcepl: also, I spent two days googling for it ;)
[16:24:03] alexp_: mcepl: by reading source code, nsIOfflineCacheUpdate seems quite complicated and heavily linked to a manifest, may be cache manifest ? http://diveintohtml5.org/offline.html
[16:24:36] mcepl: yes, that's the problem ... how to add programatically something to the cache.manifest
[16:24:54] alexp_: your manifest is a file on your remote server
[16:25:52] mcepl: yes, but the point of this package is that I can add there a file programtically ... e.g., I have a configuration JSON which every user can put somewhere herself.
[16:26:38] mcepl: and the script downloads it with Request. How to tell the script that this resource should be treated as an offline resource (and BTW how does same-origin policy applies to offline resources)?
[16:27:25] alexp_: if you start using nsIOfflineCacheUpdateServicen you need to follow cache manifest stuff: http://www.w3.org/TR/html5/offline.html
[16:28:10] alexp_: this component seems to be just some internal API to control this new HTML feature
[16:28:20] dcamp: that's pretty accurate
[16:28:57] mcepl: yes, but all information I found are talking only about statically written cache.manifest file, not about a way how to add/remove resources to this cache.
[16:29:53] alexp_: mcepl: all things happens around cache manifest file
[16:30:03] alexp_: if you update your json file, you have to edit the manifest file
[16:30:23] mcepl: moreover, there is no remote server in case of my (basically page-mod) script.
[16:30:42] dcamp: right, so nsIOfflineCacheUpdate is not likely to be your solution, sadly
[16:32:06] mcepl: dcamp: I cannot believe nobody thought about doing something like this before me .... /confused/
[16:32:52] mcepl: it seems so obvious, or I have somewhere huge hole in my thinking.
(Reporter)

Comment 5

7 years ago
(In reply to comment #3)
> What functionality are we implementing?  What's "Request"?  What kind of
> cache semantics are we looking for here?

By Request I mean obviously this object from Add-on SDK https://jetpack.mozillalabs.com/sdk/1.0b4/docs/packages/addon-kit/docs/request.html

Also the resulting API should work like here (just add cached Boolean option to Request):

const request = require("request").Request;
const myURL = "http://matej.ceplovi.cz/progs/data/RH_Data-packages.json";

request({
  url: myURL,
  cached: true,
  onComplete: function(response) {
    if (response.status == 200) {
      console.log("Length of the response is " + response.text.length);
    } else {
      console.error("Cannot download " + URL + "!");
    }
  }
}).get();

console.log("The add-on is running.");

------------------------------------------------------
Of course, alexp_ might be right and it might be better instead of an option to the existing Request to add special CachedRequest object built on the top of the current Request (or whatever else we found necessary to use).
bz: Request is an Add-on SDK API that wraps XMLHttpRequest with a simpler, easier-to-use interface.  Its docs are here:

https://jetpack.mozillalabs.com/sdk/latest/docs/packages/addon-kit/docs/request.html

As I understand it, Matej wants a simple way to specify that a given request's response should be cached and then transparently retrieved from the cache if a subsequent request for the same resource fails (f.e. because the machine is offline).

Here's a simple test case:

  new Request({
    url: "http://example.com",
    cache: true,
    onResponse: function(response) {
      // Retrieve the resource again.
      new Request({
        url: "http://example.com",
        cache: true,
        onResponse: function(response) {
          // If the second request failed, this function would still be called,
          // and it would provide the response cached by the first request.
        }
      });
    }
  );

In practice, an addon might use this to retrieve a resource on startup or at some regular interval whether or not the network or providing server is available.
OK.  So it may be possible to rejigger the HTML5 offline cache to do this, but it would take some work.  It's really not designed for caching of one-off resources, and the interactions with manifest updates would be complicated.

If you need guaranteed caching, such that you can't rely on our existing HTTP cache (which offers no guarantees), then you sort of have to roll your own.... :(
Though it might be good to suggest a necko change to have a way to do this more easily...

That said, anything in necko will need an eviction policy, so still won't offer guarantees.  Of course I really hope you're doing an eviction policy as well.
(Reporter)

Comment 9

7 years ago
(In reply to comment #8)
> Though it might be good to suggest a necko change to have a way to do this
> more easily...
> 
> That said, anything in necko will need an eviction policy, so still won't
> offer guarantees.  Of course I really hope you're doing an eviction policy
> as well.

Auch. I have no idea what you are talking about :( I guess there is more reading on my horizont.
(Reporter)

Comment 10

7 years ago
(In reply to comment #8)
> Though it might be good to suggest a necko change to have a way to do this
> more easily...
> 
> That said, anything in necko will need an eviction policy, so still won't
> offer guarantees.  Of course I really hope you're doing an eviction policy
> as well.

OK, I googled up on eviction policies. Basically my idea was never to evict and let programmer take care of managing his quota.

However, if it would be too difficult to make separate quotas for Add-on SDK (or individual add-ons) then probably LFU http://en.wikipedia.org/wiki/Cache_algorithms#Least-Frequently_Used.

Given that my expected use case is downloading some additional resources for addons running on many tabs (e.g., for me it would be this http://matej.ceplovi.cz/progs/data/RH_Data-packages.json configuration file which is downloaded on almost every bugzilla URL I open), I guess it can beat in frequency almost anything, and so with such policy it would stay in the cache almost forever (of course, I expect some periodic checking for updates of more recent resources on the remote site).
> Basically my idea was never to evict and let programmer take care of managing
> his quota.

That seems like a serious footgun to me....  Assuming there's a quota at all.  Is there?  That is, what keeps an extension from accidentally completely filling the user's disk with this?
(Reporter)

Comment 12

7 years ago
(In reply to comment #11)
> > Basically my idea was never to evict and let programmer take care of managing
> > his quota.
> 
> That seems like a serious footgun to me....  Assuming there's a quota at
> all.  Is there?  That is, what keeps an extension from accidentally
> completely filling the user's disk with this?

Isn't it the same thing which stops random website from doing the same via cache.manifest? (i.e., either there is something, or we are in much deeper trouble).
For normal websites we put up a prompt before allowing the site to store *any* data in the manifest cache. Once the user answers "yes" to that prompt I don't know how much we give the website though (for indexedDB we give the site 50MB and then put up another prompt, IMHO we should do the same for the manifest cache).

I suspect we don't want to have a similar prompt for addons?
(Reporter)

Comment 14

7 years ago
(In reply to comment #13)
> For normal websites we put up a prompt before allowing the site to store
> *any* data in the manifest cache. Once the user answers "yes" to that prompt
> I don't know how much we give the website though (for indexedDB we give the
> site 50MB and then put up another prompt, IMHO we should do the same for the
> manifest cache).
> 
> I suspect we don't want to have a similar prompt for addons?

Well, I have no opinion about that, but whole that dialog (as many other dialogs in other programs) feels to me just like a thin-veriled cheap excuse of a programmer who wants to blame user for his inability to find a better solution (http://joelonsoftware.com/uibook/chapters/fog0000000062.html and following). Wouldn't it be better to introduce some kind of quotas and better protection of Firefox from the users instead of doing that?
The dialog for the manifest cache happened without my involvement so I can't really speak to it's design. However this bug seems like the wrong forum to debate it.

Please cc me on any bugs filed on it or news group threads started on it though.
Marking anything that potentially looks like a feature request as "enhancement", sorry for the bugspam. :)
Severity: normal → enhancement

Updated

7 years ago
Attachment #538889 - Flags: feedback? → feedback?(myk)
Attachment #538889 - Flags: feedback? → feedback?(myk)
(Pushing all open bugs to the --- milestone for the new triage system)
Target Milestone: Future → ---
Comment on attachment 538889 [details] [diff] [review]
first draft of one possible way how to resolve the issue

This seems like a reasonable-enough approach.  I like the use of simple-storage as the backend store for the cache.  And note that simple-storage has a built-in quota, so it addresses the quota considerations we have discussed.

At the same time, this feels like the kind of API that would benefit from some time as a third-party API, perhaps a CachedRequest built on top of Request, as Alex suggested, before it potentially migrates to the core.  So my suggestion for the next steps would be to add it to the list of modules:

https://wiki.mozilla.org/Jetpack/Modules

And also upload it to Builder:

https://builder.addons.mozilla.org/

And then blog/tweet/otherwise point folks at it and get folks using it, so we can get a sense of how many people would benefit from it and what their needs are.
Attachment #538889 - Flags: feedback?(myk) → feedback+
(Reporter)

Comment 19

7 years ago
Created attachment 559488 [details]
standalone library with CachedRequest object

Can I ask for the review of this attempt of mine to create a standalone CachedRequest object?
Attachment #538889 - Attachment is obsolete: true
Attachment #559488 - Flags: feedback?(myk)
Comment on attachment 559488 [details]
standalone library with CachedRequest object

  /**
   * Decorating cached response object with a bit of additional
   * functionality.
   */
  function CachedResponse (url) {
    if (!myStorage.cachedRequest[url]) {
      myStorage.cachedRequest[url] = {};
      this.url = url;
      this.text = null;
      this.status = "";
      this.statusText = "";
      this.headers = {};
    }
    else {
      var storedResponse = myStorage.cachedRequest[url];
      this.url = url;
      this.text = storedResponse.text;
      this.status = storedResponse.status;
      this.statusText = storedResponse.statusText;
      this.headers = storedResponse.headers;
    }
  }

This creates a record in the datastore for a URL if one doesn't exist, but it doesn't initialize the record.  So if some code calls CachedResponse["http://example.com/"] twice without calling CachedResponse.saveCached() in between, it'll get different results the second time.  Better to avoid creating the record until CachedResponse.savedCached() is called (alternately: initialize the record when creating it).


   * Limitations: does only GET requests, so it doesn't even have
   *   .post(), .get() or any other methods.

I would still require consumers to call .get(), since the constructor's name suggests it is a subclass of Request, and hewing to the original API (except for the methods that aren't implemented) seems like the most understandable implementation of such a subclass.


      if (req.readyState == 4) {
        if(req.status == 200) {
          ...
        }
      }

Since there are no else clauses here, and the conditional code block is very long, this control structure would be easier to read if the function returned early when the conditions are *not* true, i.e.:

    if (request.readyState != 4 || req.status != 200)
      return;
    
    ...


        if (crStorage && crStorage.headers &&
             ((curETag == crStorage.headers.eTag) ||
             (curLastModified <= crStorage.lastModified))) {

In bug 643156, bz discouraged use of HEAD requests due to server-side bugs and suggested using nsIURIChecker instead, which itself does a HEAD request for HTTP URLs but apparently works around those bugs.  I'm not sure if you can use that interface here, but it would be worth looking into.


Other than those issues, this seems reasonable, although I don't know much about caching, so I can't speak to issues of cache invalidation algorithms, storage quota policies, and the like.
Attachment #559488 - Flags: feedback?(myk) → feedback+
One more thing: requests sometimes change URLs due to canonicalization, redirects, etc.  Since the cache uses URLs as unique IDs, consider whether it makes sense to use the original URL or the ultimate one as that ID (at first glance, the original seems preferable, since that is the URL by which the consumer identifies the resource).

And if you decide use the original URL, make sure you explicitly retrieve it if you ever retrieve the URL from the channel (at first glance, you don't seem to do this in the current code, but it might be worth making a note about it in case that ever changes):

http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIChannel.idl#63
Whiteboard: [good first bug]
"There are only two hard things in Computer Science"
Whiteboard: [good first bug]
I think we can wontfix and leave it to a community module, what do you think Irakli?
Flags: needinfo?(rFobic)
(Reporter)

Comment 24

4 years ago
(In reply to Erik Vold [:erikvold] (please needinfo? me) from comment #23)
> I think we can wontfix and leave it to a community module, what do you think
> Irakli?

I agree. Let's kill this now.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
Sounds good to me.
Flags: needinfo?(rFobic)
You need to log in before you can comment on or make changes to this bug.