Closed Bug 177329 Opened 22 years ago Closed 8 years ago

nsIWebBrowserPersist can't persist things whose cache entries have expired

Categories

(Core Graveyard :: Embedding: APIs, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED INCOMPLETE
mozilla1.5alpha

People

(Reporter: bzbarsky, Assigned: adamlock)

References

(Depends on 1 open bug)

Details

(Keywords: embed, helpwanted, topembed+)

Attachments

(1 file, 5 obsolete files)

When a page _has_ a cache entry but the entry is expired (eg no-cache has been
set) the persistence object will attempt to refetch it from the network instead
of reading it from cache.  This means that in the usual case (trying to save an
invoice page) one of two things happens:

1)  Nothing is saved
or
2)  The order is placed twice.

Neither is really acceptable.  The API needs to be fixed to accept not just a
URL but also a cache key or cache token or nsIWebPageDescriptor (preferred) or
something.
Blocks: 99642, 115174, 120809
Keywords: embed, topembed
QA Contact: depstein → dsirnapalli
Keywords: topembedtopembed+
Is this bug reproducible? My understanding is this pages in Mozilla are saved
using SaveDocument, not SaveURI. This means that the document will be saved from
the in-memory DOM and not from the network. Some of the URIs in the page may be
refetched however but since no post data goes with them, it should not cause the
problems described.

nsIWebPageDescriptor is just a crappy abstraction of calling loadUri with
LOAD_HISTORY from a session entry in the docshell. It's not really caching anything.

The patch in bug 170722 adds referrer, extra headers & a cache key arg to the
saveURI method. The cache key is the nsISupports that something has previously
grabbed from the last time the channel was loaded. Unless this bug is describing
something different I am tempted to dupe it.
> My understanding is this pages in Mozilla are saved using SaveDocument, not
> SaveURI. 

"web page, complete" uses SaveDocument; "web page, HTML only" uses SaveURI.  If 
you save "web page, HTML only" this bug is eminently reproducible.

I doubt a cache key will help the particular problem described in this bug
(expired cache entry), since cache keys do not pin entries in the cache; cache 
tokens do that, iirc.  Of course nsIWebPageDescriptor won't pin anything in the 
cache either, for obvious reasons...  A cache key _will_ allow us to retrieve 
the correct cache entry when it exists, which is certainly a step up from what 
we do now.

The real problem is that we need a _coherent_ strategy for dealing with issues 
like this (and we should decide, for example, whether saving should work even 
if cache is disabled).

ccing some necko/cache people in the hope that they will have something to 
say...
I'm not sure what to do in this situation either. I guess in the first instance
if the page is https or contains postdata we should display a prompt dialog to
give the user the chance to change their mind.

Is there a way to test if a given URI has expired or is not in the cache in
advance of trying to save it?
Target Milestone: --- → mozilla1.4beta
adam: no not exactly.  i mean you could use the cache APIs to check, but that is
not reliable since the cache is multithreaded and may be forced to evict
something on a background thread.  instead you should open the channel with the
nsICachingChannel::LOAD_ONLY_FROM_CACHE load flag set.  this will cause the load
to fail with NS_ERROR_DOCUMENT_NOT_CACHED if the document does not exist in the
cache.  then you can prompt the user in response to this error.
So doing what Darin suggested, I can think of three places this logic can be put:

1. In the progress / download javascript that calls the persist object
2. In the persist object with a flag to turn on this behaviour.
3. In necko with a flag on the nsIRequest / nsIChannel to turn on this behaviour.

The second and third would better satisfy the topembed+ against this bug. Darin,
do you have any preference? I'll see if I can come up with a patch either way.

The flag in both cases would be something that prompts the user if they wish to
fetch from the network if the error code NS_ERROR_DOCUMENT_NOT_CACHED comes back
the first time around.

What happens in Mozilla if the user hits reload on a page in this situation?
Attached patch Work in progress (obsolete) — Splinter Review
Still work in progress and untested. This demonstrates how the functionality
might be made available through the persist object and a new
PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED flag.
i don't think it makes sense for this prompting to live in necko.  in general,
user interactions should be left out of necko.  in some cases, we obviously
can't help it, but whenever we put user interactions in necko it makes it harder
to customize the UI.  (e.g., prompting vs. error pages.)
I'll proceed with the web browser persist approach. I'll probably need two new
flags not one - one to enable prompting and another to enable LOAD_ONLY_FROM_CACHE.

I'm having trouble producing a test page though. I've tried several ways to
generate a page that should be refetched each time from the network:

1. A CGI script which contains an Expires header pointing only a few seconds
into the future.
2. An html file containing <meta http-equiv="Pragma" content="no-cache"> and
loaded via http.
3. Changing Mozilla cache settings to check the page every time it is loaded.

It still wants to save the cache entry! The only way I discovered to force it to
not be cached is to flush the cache after loading the page for the first time.

Any ideas?
"pragma: no-cache" and loading via https should work....
Attached patch Work in progress 2 (obsolete) — Splinter Review
Second work in progress. This one actually works although there are some big
XXX comments where I've had to hack some flags to force the behaviour I want to
test. I also need to think where the prompt text goes; probably into
appstrings.properties along with the existing POST retry prompt.

Another issue to be considered is that the caller is expecting saveURI to save
just one URI, not two, implying one start state notification and one stop state
notification. So I'll have to suppress the state notification in OnStopRequest
from the first saveURI and in the OnStartRequest from the second to make it all
seem like a single request. I hope they don't cache and compare the nsIRequests
passed with the state changes for any reason...
Attachment #120498 - Attachment is obsolete: true
Attached patch Work in progress 2 (obsolete) — Splinter Review
Almost there, just have to test out js side of things and see about suppressing
the extra start / stop notifications.
Attachment #120639 - Attachment is obsolete: true
Attached patch Patch (obsolete) — Splinter Review
This is the patch with one thing to clear up.

What to do about shift click on links. These would normally bring up a save as
dialog. At present with the patch applied, shift clicking will pose a "This is
not in the cache message, do you want to proceed" message. I would like to
suppress this but I am confused about how the JS is being called. All of the
various "Save Page", "Save Link As...", shift+click all go through a single JS
function SaveURL() in contentAreaUtils.js. The bit that confuses me is that the
various places that call it do not supply all the arguments it wants.

http://lxr.mozilla.org/seamonkey/search?string=saveURL

What are the default values for they args they do not supply? For example,
shift+click only supplies the first 3 args but never says what the aBypassCache
arg value should be. It seems from the path it takes that aBypassCache is
false, but is this right for shift click? If not, I can leave the code as is
and just explicitly pass true instead. The same goes for the "Save Link As..."

Anyway, the patch is complete apart from this issue. It implements the
PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED and PERSIST_FLAGS_ONLY_FROM_CACHE
and makes contentAreaUtils.js and promptDlg.js supply these flags when calling
saveURI. I did not bother suppressing the extra nsIRequests and instead
documented PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED to warn that it could
result in multiple requests occurring.
Attachment #120994 - Attachment is obsolete: true
Args that are not passed to a JS function are "undefined", which tests as "false".

There is an existing bug, iirc, about whether shitf-click should indeed bypass
cache; it's tied up in discussion of what we should be doing.

What should I do about it for now? I will have to make it explicitly pass true
or false for this value. If false is the current behaviour I can preserve that
but either way I will have to add another bool to saveURL to supress the
prompting behaviour for Save Link & shift+click.

The current wording of the prompt doesn't make sense if you're never visited
what you're trying to save anyway.
So what happens if neither PERSIST_FLAGS_FROM_CACHE nor
PERSIST_FLAGS_BYPASS_CACHE is set?  Perhaps the "save link as" case should
simply not be setting either of those flags and allowing the normal cache
policies to be used?  That's the suggestion that seemed most reasonable from the
other bug...
If neither BYPASS_CACHE or FROM_CACHE is specified, you get whatever the default
behaviour from necko is. I don't know what that might be but presumably it is
one or the other.

I think it's better to be explicit here. Either shift+click should pass true, if
it wants to force a refetch or false if it doesn't. Personally I think it should
send false and let persist try the cache first, but we should have another arg
on SaveURL that suppresses the prompt behaviour.
Or you could add another persist flag... "CACHE_PREFERRED" which is intermediate
between the existing "CACHE_REQUIRED" and "CACHE_FORBIDDEN" flags, but a bit
stronger than the default "do whatever Necko wants to" behavior...

I think this would make a lot more sense as a flag than an arg to saveURI, but
that's just me.  And I'm not sure which is better for embedders.
FROM_CACHE is the equivalent of preferred since it will grab from the cache and
then try the network if it has to. BYPASS_CACHE skips the check altogether.

And the latest patch adds ONLY_FROM_CACHE which makes necko return an error code
if the item is not in the cache.

So, all the options are covered and map onto your choices like this:

ONLY_FROM_CACHE == CACHE_REQUIRED
FROM_CACHE == CACHE_PREFERRED
BYPASS_CACHE == CACHE_FORBIDDEN

> And the latest patch adds ONLY_FROM_CACHE which makes necko return an error code
> if the item is not in the cache.

Ah. So the comment about things not being fetched should move from FROM_CACHE to
ONLY_FROM_CACHE in the idl, and save link as should use FROM_CACHE.  Right?
Attached patch Patch (obsolete) — Splinter Review
Updated patch changes the documentation for PERSIST_FLAGS_FROM_CACHE to make it
more obvious and the wording of the warning. More significantly, I searched for
all calls to SaveURL and made the callers explicitly supply all the args the
function expects.

Since I pass in false, the behaviour is the same as now (i.e. bypassCache is
false). 

The one notable change to behaviour I encountered was that if you shift+click
on an unvisited link you get the warning message. This could probably be
surpressed by adding another arg to saveURL to control the prompt behaviour. I
can see this error might annoy some people :)
Attachment #121541 - Attachment is obsolete: true
Yep.  I thought the idea was that we would only prompt in the "only from cache"
case and would just silently fall back to getting off the network in the
"preferred from cache" case?
Yes. The thing here, is that the persist object is being told to save some 
random URI, but needs to ensure that if the item is not there in the cache that 
it does not go off and get it without checking first. Hence, when the saveURL() 
js method supplies a bypassCache value of false, I assume they want to grab 
from the cache and pose the dialog before trying from the network.

Most of the time, this is exactly what it probably means, how I wonder if 'grab 
from network' is implied by shift+clicking a link and whether the warning is 
irrelevant. If this other bug decides that shift+click should always refetch, 
it makes this problem go away since it will supply true as a bypassCache value 
and this issue won't exist any more.

Boris, if you have a chance you might like to apply the patch and see if it 
does what you hope it does.
Missed the 1.4b train, will this be targetted for final 1.4 or 1.5 alpha? 
Please re-set the milestone accordingly.
Target Milestone: mozilla1.4beta → ---
Hmm, no sign of 1.4 final on the target milestones, so I'll put 1.5a for now.
Target Milestone: --- → mozilla1.5alpha
Attached patch Patch mk2Splinter Review
New patch attempts to remove the annoying prompting by adding a new
aPromptIfNotInCache to saveURL() so callers can control this behaviour. I have
gone through the callers, supplying true or false for this value.
Attachment #122619 - Attachment is obsolete: true
Comment on attachment 124574 [details] [diff] [review]
Patch mk2

Can I have an r/sr on this patch to prompt for expired cache entries? This
patch contains both changes for the persist object and the xpfe. As the
comments will show, there are probably some places where prompting should
happen and others where it should not. The patch takes a best guess at these,
so if you feel uncomfortable about these by all means just r/sr the persist
part and we can deal with the other ones on a case by case basis.
Attachment #124574 - Flags: superreview?(bz-bugspam)
Attachment #124574 - Flags: review?(brade)
Kathy & Boris, can you take a look at the patch please and lend your opinion on
the best way for this to go in please? As mentioned before, the patch contains
both the client and persist object side changes. If you feel more comfortable
about putting the persist changes in first and reviewing the other parts we can
do that.
Adam, I'll try to get to this patch today or tomorrow.  Sorry about the delay.
Comment on attachment 124574 [details] [diff] [review]
Patch mk2

>Index: mozilla/embedding/components/ui/progressDlg/nsProgressDlg.js

>+        if (!persistArgs.bypassCache && 
persistArgs.aPromptIfNotInCache) {

Hmm... how is the persistArgs getting here?  The only place where I see this
window opened, window.arguments[1] is not even passed...

In any case, there are no objects that have a "aPromptIfNotInCache" property
with your changes (details below).

>+            webBrowserPersist.persistFlags |= nsIWBP.PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED +
>+                                              nsIWBP.PERSIST_FLAGS_ONLY_FROM_CACHE;

please use '|' instead of '+' here.

>Index: mozilla/docshell/resources/locale/en-US/appstrings.properties

Hmm.. Why is the persistence object using this particular stringbundle? 
Shouldn't it use its own stringbundle instead of hijacking the docshell's?

You probably want an em dash or something after "exercise caution".

>Index: mozilla/embedding/components/webbrowserpersist/public/nsIWebBrowserPersist.idl

So now we have the following options:
  NONE	       -- no flags  
  BYPASS_CACHE -- no cache used
  FROM_CACHE   -- try cache; if that fails, go to the network
  PROMPT_FOR_RETRY_IF_NOT_CACHED -- try cache; if that fails, prompt
				    before retrying.
  ONLY_FROM_CACHE -- like FROM_CACHE but does not go to the network on
		     a miss

The question I have is how FROM_CACHE differs from NONE.  Is there a
difference?  We should document it clearly, if so.

Is there a difference between 
(PROMPT_FOR_RETRY_IF_NOT_CACHED | ONLY_FROM_CACHE) and
(PROMPT_FOR_RETRY_IF_NOT_CACHED | FROM_CACHE) ?

>+   * flag is only valid from saveURI when combined with
>+   * PERSIST_FLAGS_FROM_CACHE and PERSIST_FLAGS_ONLY_FROM_CACHE.

This makes it sound like I have to set all three flags at once.  Perhaps the
"and" should be an "or"?

>+  /** Only fetch from the cache */
>+  const unsigned long PERSIST_FLAGS_ONLY_FROM_CACHE = 32768;

This needs to be better documented, if only by explicitly contrasting the
FROM_CACHE flag.

On a side note, it may make the interface more readable to use things like (1
<< 15) in place of "32768"....

>Index: mozilla/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp

>+    mHeaders.Append(aHeader);
>+    mHeaders.Append(": ");
>+    mHeaders.Append(aValue);
>+    mHeaders.Append("\r\n");

mHeaders.Append(aHeader + NS_LITERAL_CSTRING(": ") + aValue + 
		NS_LITERAL_CSTRING("\r\n"));

>+    if (data &&
>+        status == NS_ERROR_DOCUMENT_NOT_CACHED &&
>+        mPersistFlags & PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED &&
>+        mPersistFlags & PERSIST_FLAGS_FROM_CACHE)

What happens if ONLY_FROM_CACHE is set?

>+                if (mProgressListener)
>+                    prompt = do_QueryInterface(mProgressListener);

It sounded to me based on the IDL comments that the progress listener should be
prepared to return an nsIPrompt via the nsIInterfaceRequestor's GetInterface...
is that not the case?

In any case, do_QueryInterface is null-safe, so you don't need the null-check.

>+                HeaderVisitor *visitor = new HeaderVisitor;
>+                nsCOMPtr<nsIHttpHeaderVisitor> headerVisitor = do_QueryInterface(visitor);

Why not just: 

nsCOMPtr<nsIHttpHeaderVisitor> = new HeaderVisitor(); ?

In any case, check for OOM before passing it to VisitRequestHeaders...

>+    // Test for odd combinations of cache flags, e.g. asking to prompt if
>+    // not cached, but bypassing the cache altogether.

This should be documented in the IDL (it's not clear from the interface that
setting RETRY_IF_NOT_CACHED and BYPASS_CACHE will just make saveURI throw).

>Index: mozilla/xpfe/communicator/resources/content/contentAreaUtils.js
>+    if (!useSaveDocument && aData.aPromptIfNotInCache) {

The member in aData is called "promptIfNotInCache", no?

>+      persist.persistFlags |= nsIWBP.PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED +
>+                              nsIWBP.PERSIST_FLAGS_ONLY_FROM_CACHE;

Why do you need to set ONLY_FROM_CACHE given that you have the PROMPT flag on?

This looks like a fairly good aproach, now that I've looked at it in detail. 
It should certainly satisfy our internal "save as" consumers if they can just
decide for themselves what behavior they want....
Attachment #124574 - Flags: superreview?(bzbarsky) → superreview-
Can this please be reassigned to someone who's got time for it? At present it's
a dangerous bug for anyone who does online shopping or banking as forms can be
silently reposted (see bug 115174).
Keywords: helpwanted
Adding myself to CC:, this critically affects my ability to use Firefox for anything serious.
Bug 288462 provides the full summary of all related issues, complete with RFC violations.  Should be linked to this one.
(In reply to comment #0)
> The API needs to be fixed to accept not just a
> URL but also a cache key or cache token or nsIWebPageDescriptor (preferred) or
> something.

This was fixed in bug 84106; the remaining issue is probably the subject of
bug 479296.

The patch here is quite old, however it might be useful as a reference.
Actually, the question about what to do when the page cannot be retrieved
from the cache is still open, and is being addressed in bug 479296.

Setting a dependency between this bug and bug 479296 may be useful.
Depends on: 486921
QA Contact: dsirnapalli → apis
Attachment #124574 - Flags: review?(brade)
Marking a bunch of bugs in the "Embedding: APIs" component INCOMPLETE in preparation to archive that component. If I have done this incorrectly, please reopen the bugs and move them to a more correct component as we don't have "embedding" APIs any more.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: