Closed Bug 579846 Opened 9 years ago Closed 9 years ago

nsIHttpChannel::SetResponseHeader should work after the stream has ended (HTML5 parser regresses support for cache-control pragmas)

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX
mozilla2.0b10
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: hsivonen, Assigned: mayhemer)

References

()

Details

(Keywords: regression, Whiteboard: [softblocker])

Attachments

(2 files, 6 obsolete files)

Steps to reproduce:
 1) Follow the instructions at http://landfill.mozilla.org/ryl/meta-cache-control.cgi with and without the HTML5 parser.

Actual results:
cache-control pragma is ignored when the HTML5 parser is enabled but honored when the old parser is enabled.

Expected results:
Expected the pragma to be honored in both cases.
Priority: -- → P2
Does this need an HTML5 spec change too?
blocking2.0: --- → ?
(In reply to comment #1)
> Does this need an HTML5 spec change too?

Yes.

(My currently 100% unsubstantiated guess is that due to the asynchronity of the tree op stuff, the document drops its reference to the cache entry before the sink tries to write to the cache entry. At least that was the source of a HTML5 cache manifest bug earlier.)
The no-cacheness is propagated all the way to the HTTP channel at least.
And indeed, the cache entry is closed before the no-cacheness is propagated to the channel.

What would be the most appropriate way to defer the cache entry closing until the parser is done? (Without breaking other users of HTTP channels...)
Is there a reason why the channel tries to drop the cache entry reference early? (As opposed to holding onto it for the channel object life time?) Are channel objects expected to be particularly long-lived after the stream end has been see in cases other than the HTML5 parser?
Channel objects for the document have the same lifetime as the document, at least.  nsDocument holds a strong ref to the channel it was loaded from, and uses this to get at various information as well as to synthesize certain events.
> What would be the most appropriate way to defer the cache entry closing until
the parser is done? (Without breaking other users of HTTP channels...)

Add a method that allows clients to keep the cache entry around until they explicitly close it?

Note that under e10s you don't have direct access to the cache entry, you just get an nsICacheInfoChannel, which currently only supports setting cacheTokenCachedCharset.  What interface(s) are you using on the channel to handle the cache-control pragma?
(In reply to comment #7)
> Note that under e10s you don't have direct access to the cache entry, you just
> get an nsICacheInfoChannel, which currently only supports setting
> cacheTokenCachedCharset.

Can the e10s mechanism already be used on trunk / Gecko 2.0? Any chance of extending the interface to support making the entry less cacheable?

Do we have data on which cache-related http-equivs (cache-control, expires, pragma) are required for Web compat?

According to the test case http://landfill.mozilla.org/ryl/meta-cache-control.cgi Firefox (with the old parser), Opera, Safari and Chrome support cache-control: no-cache in meta but IE8 or IE9 don't. Is any of this cacheability-related meta stuff really required for Web compat?

http://support.microsoft.com/kb/234067 documents the lack of support for IE 4 through 6.

> What interface(s) are you using on the channel to
> handle the cache-control pragma?

The HTML5 parser calls nsContentSink::ProcessMETATag which ends up calling nsIHttpChannel::SetResponseHeader. Which leads to the question: Are there non-cacheability-related http-equivs that are required to be written onto the cache entry for Web compat?
See about 20 lines from
http://krijnhoetmer.nl/irc-logs/whatwg/20100722#l-466
onwards. The test results for Chrome and Safari were wrong.

If neither IE nor WebKit supports this and our trunk support is broken, perhaps we should take the support out explicitly (it's not racy).
Given comment 9 I don't think we need to worry about this for Firefox 4.
blocking2.0: ? → -
If IE and Webkit actually don't support this, I'm fine with ripping it out.  Henri, do you want to patch, or should I?
(In reply to comment #11)
> If IE and Webkit actually don't support this, I'm fine with ripping it out. 
> Henri, do you want to patch, or should I?

IE does not support <meta http-equiv="cache-control" value="no-cache"> but they do support <meta http-equiv="pragma" value="no-cache"> or at least IE 6, 7, and 8 do. I haven't tested IE 9.
Ah, interesting.  We certainly used to support that too.
Henri, do we have any data on whether pages use one or the other of those (or both?)
(In reply to comment #11)
> If IE and Webkit actually don't support this, I'm fine with ripping it out. 
> Henri, do you want to patch, or should I?

I won't get to it until late August, so I'd be OK with you doing it.
Well, comment 12 needs to be addressed first.
(In reply to comment #16)
> Well, comment 12 needs to be addressed first.

First, the tag in comment 12 is wrong. The second attribute should be content, not value.

I tested http://hsivonen.iki.fi/test/moz/pragma-no-cache-baseline.php and http://hsivonen.iki.fi/test/moz/pragma-no-cache.php in Firefox 3.6.9, IE8, Opera 10.61 and Chromium nightly. Both PHP scripts set Expires one minute into the future and show second since epoch in the page content.

AFAICT, only Firefox lets the meta tag have an effect.
I forgot to test IE8 in the IE 5.5 mode. The pragma seems to prevent caching in the IE 5.5 mode:
http://hsivonen.iki.fi/test/moz/pragma-no-cache-quirks.php
Sadness: Opera lets the pragma prevent caching in its quirks mode, too.
I'm assuming we want to get rid of this generic code anyway. Even if we feel we must support pragma no-cache (do we?), it would probably be more appropriate to add an electrolysis-friendly API just for dooming a cache entry (as opposed to writing arbitrary headers to the cache).
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #473506 - Flags: review?(bzbarsky)
(In reply to comment #14)
> Henri, do we have any data on whether pages use one or the other of those (or
> both?)

I don't have data.
Well... the idea was that http-equiv was in fact supposed to be treated like other HTTP headers.  So I'm not sure whether we actually want to remove this block.  Why do we?
(In reply to comment #22)
> Well... the idea was that http-equiv was in fact supposed to be treated like
> other HTTP headers.  So I'm not sure whether we actually want to remove this
> block.  Why do we?

It's a complication to the Web platform (and one that's currently broken on trunk) and there's a lack of evidence of it being a necessary complication for compatibility with existing content. 

Hixie says only the four values in the spec are required for compat with existing content. (Though default-style is theoretical, IMO. Browsers are so deeply committed to CSS that it's unlikely anyone manages to lauch a replacement for the styling layer.) #whatwg IRC remarks suggest Set-Cookie is needed, too. For those pseudo-headers that we very likely need to support, we need special handling, not generic 'write back to Necko' handling. Furthermore, WebKit has gotten away without supporting arbitrary HTTP headers as metas, and WebKit is doing pretty well. I think that's the strongest piece of evidence against the necessity of generic support for generic http-equiv in the sense of reporting them back to the HTTP layer.
> It's a complication to the Web platform 

How so?  Having <meta http-equiv> always act like the HTTP header seems like a strictly simpler setup than what hixie is proposing (which is that sometimes it does and sometimes it doesn't).  Though I suppose content-type can't act like the header anyway....

I agree that in general this isn't needed for web compat except for a few headers.  The question I have is whether we have a good list of such headers and a plan to make them work in 2.0.  If we do, on both counts, then removing the generic stuff is ok.  If not, then it seems like it would break pages, and we should instead add similar generic code to the html5 parser.

(I should note that Chrome is "doing pretty well" even though you can't sanely do basic things like print preview, and that Webkit is "doing pretty well" even though its CSS selector handling is broken all sorts of ways; people just gripe and work around it.  That doesn't mean we would even consider duplicating them on these counts.)
(In reply to comment #24)
> > It's a complication to the Web platform 
> 
> How so?

Injecting headers back to the cache is more complex than not doing it.

> Having <meta http-equiv> always act like the HTTP header seems like a
> strictly simpler setup than what hixie is proposing (which is that sometimes it
> does and sometimes it doesn't).  Though I suppose content-type can't act like
> the header anyway....

 * Refresh is special. It doesn't work from HTTP--just meta.
 * Content-Type is special. When it appears in meta, the HTML parser does elaborate and specific processing.
 * Set-Cookie already has special handling, so it's not handled generically, either.
 * Link has special handling, and the existing handling is wrong. (<meta> linking a sheet doesn't cause the sheet to become a style sheet that blocks scripts.)
 * Default-Style and Content-Language don't need writing back to Necko--they need reporting to nsIDocument.

I don't have a strong opinion about reporting metas to the document object, but it seems useless to write pseudo-headers back to Necko unless they affect caching, since caching is the only thing Necko can do with the resource at that point in time. So far, it seems that generic Cache-Control doesn't need to be supported.

So as far as the question of communicating back to Necko goes, I think Pragma: no-cache for pages that trigger the quirks mode in IE and Opera is the main compat concern.

> I agree that in general this isn't needed for web compat except for a few
> headers.  The question I have is whether we have a good list of such headers
> and a plan to make them work in 2.0.

At least I am lacking a solid plan, since I don't know the networking code well enough.

> If we do, on both counts, then removing
> the generic stuff is ok.  If not, then it seems like it would break pages, and
> we should instead add similar generic code to the html5 parser.

It's not really the HTML5 parser that's needing code. It's nsIChannel needing code that can talk back to the corresponding cache entry after the stream has ended.
My attempt to elicit information about compat experiences from other vendors via the HTML WG failed.

It is rather useless to try to get data by analyzing e.g. dotbot data, because presumably pages that'd want to control caching would not be document-oriented pages but mildly app-y (but so mildly their authors can't get their HTTP act together), so they probably wouldn't show up in a Web crawl.

Apart from seeing what other browsers do, the best data point we have is that this bug report has no duplicates. That is, in the Firefox 4 beta phase so far, no one has reported a site compat bug that would have been analyzed to be caused by not honoring HTTP caching-related pseudo-headers in HTML <meta>s.

Based on this, I suggest taking attachment 473506 [details] [diff] [review] effectively WONTFIXing this bug. (Alternatively, this should be an nsIChannel bug and shouldn't be sitting in the HTML: Parser component.)

CCing sicking, since he has previously advocated simplifying things when in doubt.
Priority: P2 → P3
(In reply to comment #24)
> > It's a complication to the Web platform 
> 
> How so?  Having <meta http-equiv> always act like the HTTP header seems like a
> strictly simpler setup than what hixie is proposing (which is that sometimes it
> does and sometimes it doesn't).  Though I suppose content-type can't act like
> the header anyway....

Note that there is a lot of headers that don't work in <meta http-equiv>. For example anything CORS related, things like http-ranges, and anything necko looks at before sending OnDataAvailable.

In short, there is a whole lot of headers that simply don't make sense in <meta http-equiv> either because we don't have a HTML parser hooked up, or because the header is used before the parser adds it.

I think Hixies solution is a more sensible one. That said, we definitely need to figure out a list of headers that should be supported.
> * Refresh is special. It doesn't work from HTTP--just meta.

I don't believe this is true.  See nsDocument::RetrieveRelevantHeaders

> * Content-Type is special. When it appears in meta, the HTML parser does
> elaborate and specific processing.

Agreed.

> * Set-Cookie already has special handling, so it's not handled generically,
>   either.

OK.

> * Link has special handling, and the existing handling is wrong.

We use the same codepath for <meta http-equiv="Link"> and Link: headers on the HTTP level, as I recall.

> * Default-Style and Content-Language don't need writing back to Necko--they
>   need reporting to nsIDocument.

The document reads them from the channel, is the point.

> but it seems useless to write pseudo-headers back to Necko unless they affect
> caching

Well, not if it simplifies code elsewhere.

I'm ok with not putting these headers back on the channel as long as the relevant <meta>s that need to work continue to work and as long as their interaction with actual HTTP headers is well-defined.
Look, we really do need to fix this.  This HTML (from bug 601591):

  <meta http-equiv="expires" content="0">
  <meta http-equiv="pragma" content="no-cache">
  <meta http-equiv="cache-control" content="no-cache">

Reliably prevents caching across browsers prior to fx 4 as far as I can see (different browsers respond to different parts of it, but all respond to _some_ part of it).  Now we ignore it, and that _will_ break pages.  Renominating for blocking based on that...
blocking2.0: - → ?
Agreed, we need to unregress this for b8, so that it doesn't mask other content compat issues.  Thanks, bz.
blocking2.0: ? → beta8+
(In reply to comment #29)
> Look, we really do need to fix this.  This HTML (from bug 601591):
> 
>   <meta http-equiv="expires" content="0">
>   <meta http-equiv="pragma" content="no-cache">
>   <meta http-equiv="cache-control" content="no-cache">
> 
> Reliably prevents caching across browsers prior to fx 4 as far as I can see
> (different browsers respond to different parts of it, but all respond to _some_
> part of it).  Now we ignore it, and that _will_ break pages.

What part does IE in a non-quirks mode respond to? Opera in its non-quirks modes? Safari? Chrome?

(Note that bug 601591 doesn't cite specific site breakage.)

Anyway, there's nothing to be done on the parser side. This needs code behind nsIHttpChannel, so it would be good to get an owner who knows how that stuff works.
Assignee: hsivonen → nobody
Status: ASSIGNED → NEW
Component: HTML: Parser → Networking: Cache
Priority: P3 → --
QA Contact: parser → networking.cache
Summary: HTML5 parser regresses cache-control support in meta pragmas → nsIHttpChannel::SetResponseHeader should work after the stream has ended (HTML5 parser regresses support for cache-control pragmas)
Attachment #473506 - Attachment is obsolete: true
Attachment #473506 - Flags: review?(bzbarsky)
> What part does IE in a non-quirks mode respond to?

Hmm.  I see comment 12 here wasn't quite right.  Bug we don't want to break quirks sites either; that's why we have quirks....  Unless you want to make this behavior a quirk?

Henri, what needs to be doable on the necko side here?  Handling SetResponseHeader calls after OnStopRequest?
And it sounds like we need to get a useful support matrix for the three pragmas from comment 29.

Henri, if you can set up some testcases that send those three metas, I'm happy to test at least on IE8 and various Opera/Safari/Chrome versions...
(In reply to comment #33)
> Bug we don't want to break quirks sites either; that's why we have quirks.... 

At present, we don't have data about quirks sites breaking. The data we do have to the contrary is the absence of duplicates of this bug that'd name specific sites as breaking. (Bug 601591 sure looks like a Web author requesting a feature to be present rather than a user reporting a broken site.)

> Unless you want to make this behavior a quirk?

I don't. I want to make things as simple as we can get away with. I think not writing back to Necko is simpler than writing back to Necko and doing the same thing in all modes is simpler than varying by mode.

> Henri, what needs to be doable on the necko side here?  Handling
> SetResponseHeader calls after OnStopRequest?

Right.

(In reply to comment #34)
> Henri, if you can set up some testcases that send those three metas, 

I'll set up test cases.
http://hsivonen.iki.fi/test/moz/meta-caching.html

It appears that my previous testing was somehow bogus, because now I didn't see a quirks/non-quirks difference with Opera or IE. However, the claim made in comment 29 seems incorrect, too.
OK, I can confirm your results on those testcases...  The listed HTML works in Opera/IE here, but not in the webkit-based browsers.  Can we check with the webkit folks whether that's correct?

If so, I still think that aligning with Opera/IE/Fx3 is less likely to break sites here...
(In reply to comment #37)
> Can we check with the
> webkit folks whether that's correct?

Earlier, Maciej didn't find code implementing this stuff in WebKit, so I believe the test result is correct.
http://krijnhoetmer.nl/irc-logs/whatwg/20100722#l-479

As I understand things from #whatwg IRC remarks (though I can't find the right bit in the logs now), WebKit has implemented meta http-equiv stuff as neede for compat, and caching-sensitive stuff hasn't been needed (at least not enough).
I don't think it's true that just because webkit doesn't implement something yet that it isn't needed for compat.

I would however love to see research on how many pages out there use these metas.
Will this require interface changes?  Right now HTTP channels get rid of their cache handles after calling OnStopRequest.  We can't simply hold onto the cache handle for the lifetime of the channel (a write handle will block other channels from reading the cache entry until it's released).  So it seems to me we'd need a pair of new IDL functions that 1) tell nsIHttpChannel to not release the cache at OnStopRequest; and 2) tell it when to release.  (#2 would also be a convenient place to tell the channel to write the header changes to the cache entry: this doesn't happen during SetResponseHeader() itself).

Or we could fix the cache-write-blocks-reads issue, but that may be ambitious for the FF 4 timeframe, and there might be other issues with holding cache entries open.
And what about to just reopen the cache entry, update the expiration time (only thing that happens when we modify response header after OnStartRequest) and close it again?  May be done asynchronously.
Assignee: nobody → dwitte
> just reopen the cache entry & update the expiration time [ed: or doom the entry, or fail silently if a previous call already doomed the entry and it's gone.]   

Yes, that could work.  It'd be less efficient than having a new interface (the HTML in comment 29 would reopen the cache entry 3 separate times), but that's probably not a big deal since this HTML is not very common.

We'd be creating a "window" where the cache entry could be successfully read by another channel in between the OnStop cache entry close, and the async reopen/update--is that ok?

Under e10s the chrome nsHttpChannel might be long gone, but we could just open the cache entry using the PNecko protocol.

I don't have a strong opinion between the new interface vs reopen design choice.
Perhaps someone can come up with a reason to prefer one.  I guess all things being equal it's best to avoid interface changes at this point.
If there is no strong reason to modify the API, it's IMHO better (and less work) to have a control over the cache entry.  Only risk I can see at this moment is that someone may forget to release the entry after all work is done.
> If there is no strong reason to modify the API, it's IMHO better 
> to have a control over the cache entry.  Only risk I can see at this
> moment is that someone may forget to release the entry after all work is done.

Did you mean "no strong reason *not* to modify the API"?  Your comment makes it sounds like you're in favor of modifying the API to control the cache entry.   If so, I agree with you (the "window" problem with the reopen approach has been troubling me since I last commented).

Assuming we want an interface change, I assume we can either 1) get it in before beta7 (but time is running out:  maybe we could just change the interface, but not actually use it yet); or 2) add a new interface just for this API, which I assume (?) is ok even after the API freeze (since we wouldn't be changing any existing interfaces).  Let's try for #1.
Getting interface change into b7 won't happen--shipping too soon.  But bz confirms we can add a new interface after b7 (just not change existing ones).
Honza, would you like to write a patch here? It sounds like you've got a good idea of what's involved. As jduell says I'd just add an extra interface that we can fold into nsIHttpChannelInternal, if appropriate, after we branch for 2.0.
Yep, will do that.

-> me
Assignee: dwitte → honzab.moz
Status: NEW → ASSIGNED
blocking2.0: beta8+ → betaN+
Attached patch v1 w/o an automated test (obsolete) — Splinter Review
This is the final working version of the patch, just missing an automated for it.

- I added a new interface nsICacheInfoChannel_GECKO_2_0
- there is an attribute that returns nsISupports object that, while kept, prevents the cache entry to be released
- this approach is safer then using kind of Hold()/Loose() methods to control it
- the interface should merge to nsICacheInfoChannel
- I have chosen nsICacheInfoChannel because I need an interface that works with cache entries on both chrome and content with different implementation ; this interface was created for that purpose
- for e10s we'll let HttpChannelChild also implement this new interface and remote the hold lock creation/deletion to chrome
- this will probably need some more event syncing between content and chrome, we must have the hold lock on content before chrome encounters OnStopRequest
- I am NOT doing update to nsIHttpChannelInternal because that interface is implemented by HttpChannelBase and we need different implementation for chrome and content

I need a little help with a test for this: the manual test case from comment 0 tells to focus the address bar and press enter, not to reload the page (that doesn't reproduce the bug).  How can a mochitest emulate that behavior? (location.reload() doesn't work for me)  Should I rather have a browser test for this?
Attachment #489981 - Flags: feedback?(jduell.mcbugs)
Attached patch v1 + test (obsolete) — Splinter Review
Added an automated test.
Attachment #489981 - Attachment is obsolete: true
Attachment #489991 - Flags: review?(jduell.mcbugs)
Attachment #489981 - Flags: feedback?(jduell.mcbugs)
Hixie didn't change the spec to support caching-related <meta> tags:
http://lists.w3.org/Archives/Public/public-html/2010Dec/0106.html
Yes, I saw the mail.  He also said the only reason he didn't is because of the webkit weirdness, right?
If we think it'll negatively impact websites then we should still go ahead and maintain support. The spec will likely change once implementations start to converge.
Comment on attachment 489991 [details] [diff] [review]
v1 + test

Boris, Jason is in the process of moving and probably won't get to this any time real soon, can you review?
Attachment #489991 - Flags: review?(jduell.mcbugs) → review?(bzbarsky)
Whiteboard: softblocker
Comment on attachment 489991 [details] [diff] [review]
v1 + test

mCacheEntryPassClose should probably be called mDeferredCacheEntryClose.

The new interface should just go in nsICacheInfoChannel.idl.

I'd call the new property |cacheEntryClosePreventer|.

>+++ b/netwerk/protocol/http/nsHttpChannel.cpp
>@@ -116,30 +116,33 @@ nsHttpChannel::nsHttpChannel()
> nsHttpChannel::CloseCacheEntry(PRBool doomOnFailure)
>+    mDoomCacheEntryOnClose = doomOnFailure;

This doesn't match the mDoomCacheEntryOnClose documentation.  I think the documentation should change to not mention the hold counter; mDoomCacheEntryOnClose is true just if CloseCacheEntry was called with doomOnFailure set to true.

>+class HttpChannelCacheEntryHoldLock : public nsISupports

Maybe call this HttpChannelCacheEntryClosePreventer?

And call mCacheEntryHoldCount something like mCacheEntryClosePreventionCount?

I don't understand the FinalizeCacheEntry logic in CloseCacheEntryInternal.  We're not making the call in OnStopRequest conditional on !mCacheEntryPassClose, right?  So it would happen no matter what, and then in the mCacheEntryPassClose case we'd finalize here too.  This part looks wrong to me.

>+++ b/parser/html/nsHtml5StreamParser.cpp
> nsHtml5StreamParser::~nsHtml5StreamParser()
>+  mCacheEntryHoldLock = nsnull;

Why?

>@@ -553,16 +555,26 @@ nsHtml5StreamParser::OnStartRequest(nsIR
>+  if (cacheInfoChannel)
>+    cacheInfoChannel->GetCacheHoldLockObject(getter_AddRefs(
>+                                             mCacheEntryHoldLock));

Local style is to brace all if bodies.  I'd also prefer this to be line-broken as:

  cacheInfoChannel->
    GetCacheHoldLockObject(getter_AddRefs(mCacheEntryHoldLock);

And we should rename mCacheEntryHoldLock if we rename the idl getter.

r- pending the above changes, especially the FinalizeCacheEntry issue being sorted out.

It would be good to have Henri review the parser bits to make sure that the object you're changing has the right lifetime (goes away when we're done parsing).
Attachment #489991 - Flags: review?(hsivonen)
Attachment #489991 - Flags: review?(bzbarsky)
Attachment #489991 - Flags: review+
Comment on attachment 489991 [details] [diff] [review]
v1 + test

er, really r-.
Attachment #489991 - Flags: review+ → review-
Talked to Henri--the lifecycle of the parser is ok (OnStart/destruction happens on main thread, and the parser gets destroyed within a reasonable period of time).

The other thing I remember from looking at this is that the patch didn't handle the e10s case.  We should fix that, or this will still be broken for fennec.  Right now the HttpChannelParent holds its own  mCacheDescriptor to ensure that the cache entry stays live.  I'm guessing it would be cleaner to drop that and instead have HttpChannelParent grab a cacheHoldLockObject.

I'm sleep-deprived now, but here are the notes I took when I was looking at it:

We should also fix this for e10s now for fennec.  I think this is fairly simple.
We have no guarantee that the child will receive anything, even OnStart, before
the parent channel hits OnStop.  So have all HttpChannelParents grab a
CacheHoldLockObject from the nsHttpChannel early on.  HttpChannelChild will
provide its own implementation of nsIHttpChannelInternal_GECKO_2_0, and it's own
mCacheEntryHoldCount (move mCacheEntryHoldCount and shared logic into
HttpBaseChannel).  In HttpChannelChild::OnStopRequest, we currently either 1)
delete the PHttpChannel, or 2) keep it around if it's a document channel.
Supplement #1 to not delete the channel if mCacheEntryHoldCount is nonzero (or
hold on to just the cache ref), and #2 to null out HttpChannelParent's
CacheHoldLockObject if it is zero.  Then add an IPDL message for decrementing
the parent's mCacheEntryHoldCount when OnStopRequest has already been called
(I'm guessing you can just send the msg when the count hits zero, rather than
for each decrement).
Comment on attachment 489991 [details] [diff] [review]
v1 + test

(In reply to comment #56)
> Talked to Henri--the lifecycle of the parser is ok (OnStart/destruction happens
> on main thread, and the parser gets destroyed within a reasonable period of
> time).

Yeah, r=hsivonen on the nsHtml5StreamParser changes.
Attachment #489991 - Flags: review?(hsivonen) → review+
Whiteboard: softblocker → [softblocker]
(In reply to comment #54)
> The new interface should just go in nsICacheInfoChannel.idl.

Aren't we past API freeze?  I would expect it is forbidden to modify any interface.

> I don't understand the FinalizeCacheEntry logic in CloseCacheEntryInternal. 
> We're not making the call in OnStopRequest conditional on
> !mCacheEntryPassClose, right?  So it would happen no matter what, and then in
> the mCacheEntryPassClose case we'd finalize here too.  This part looks wrong to
> me.

I made it that way to finalize the cache entry before calling OnStopRequest.  Not sure why exactly, but I wanted to leave the code intact as much as possible.  

However, in the new patch, moving call to FinalizeCacheEntry under the same condition to nsHttpChannel::CloseCacheEntryInternal seems to work well, preserving the same logic.  xpcshell tests are passing, some mochitests that could be affected as well.

> 
> >+++ b/parser/html/nsHtml5StreamParser.cpp
> > nsHtml5StreamParser::~nsHtml5StreamParser()
> >+  mCacheEntryHoldLock = nsnull;
> 
> Why?

Probably more questions for Henry Sivonen:
- Henry, why are you nullifying all members that some are nsCOMPtrs and nullify them self?  
- If there is a reason to, should this new member be freed also the same way?  - The object refers the channel, there might potentially be a cycle.  Shouldn't it also be handled in the cycle collection?
(In reply to comment #56)
> The other thing I remember from looking at this is that the patch didn't handle ....add an IPDL message for decrementing
> the parent's mCacheEntryHoldCount when OnStopRequest has already been called
> (I'm guessing you can just send the msg when the count hits zero, rather than
> for each decrement).

You read my mind, man? :)
> I would expect it is forbidden to modify any interface.

Right.  But you can put the new interface in an existing IDL file, which doesn't modify any existing interfaces.

> I made it that way to finalize the cache entry before calling OnStopRequest. 

My point is that the code doesn't match the comments as far as I can tell.  What a I missing?
(In reply to comment #60)
> My point is that the code doesn't match the comments as far as I can tell. 

The comment really was wrong.  I'll fix in the next version to do not mention the counter.

> What a I missing?

In comment 54 you said you didn't understand the logic.  It sounded like it was not just about the comment.  And there actually was a mistake: I was calling FinalizeCacheEntry always, not based on conditions present in OnStopRequest, not sure it was what you were referring to.

Now I'd say the question is: do we want to call FinalizeCacheEntry from OnStopRequest before delegating OnStopRequest to upper levels (as we do before this patch) _AND_ also during deferred close, or just ones, when the cache entry is actually being closed (=during deferred or immediate close) but, in both immediate and deferred cases, after OnStopRequest delegation?

I'll push the new patch using the letter approach to try server to check we may go that way.
> And there actually was a mistake

OK, so the logic was just wrong.  Fine.

> I was calling FinalizeCacheEntry always

Right, that's what I said in comment 54....

Finalizing the cache entry when we actually close it would make the most sense to me.  What are the reasons for not doing that?
(In reply to comment #62)
> Finalizing the cache entry when we actually close it would make the most sense
> to me.  What are the reasons for not doing that?

If you don't know about any then I don't know either.  Also, finalizing the cache entry during actual close is passing the try.

After Henry answers my questions in comment 58 bottom I'll add a new patch.

Henry, ping?
(In reply to comment #58)
> Probably more questions for Henry Sivonen:
> - Henry, why are you nullifying all members that some are nsCOMPtrs and nullify
> them self?

I've set some things to null in the destructor explicitly to make the memory of destroyed objects look distinct in a debugger (maybe when I do that, I should use #ifdef DEBUG). I thought nsCOMPtrs only null themselves in their constructor but not in destructor. A quick code inspection suggests that nsCOMPtr only derefs in the destructor but doesn't set its pointer to null.

> - If there is a reason to, should this new member be freed also the same way? 

Either way should work.

> - The object refers the channel, there might potentially be a cycle.  Shouldn't
> it also be handled in the cycle collection?

If there might be a cycle, sure, but currently, nsHttpChannel doesn't look like a CC participant. Is it one?
(In reply to comment #64)
> If there might be a cycle, sure, but currently, nsHttpChannel doesn't look like
> a CC participant. Is it one?

First, the cycle is established in OnStartRequest.  Call to OnStartRequest ensures by nsIRequestObserver contract that we must get call to OnStopRequest as well.  For instance, nsHttpChannel after call to OnStopRequest throws the listener away right the way.  So, the cycle will always be broken.  And also, nsHttpChannel is not CC participant, neither is HttpChannelChild.

So, I'll not add the cycle collecting for this new member.
- FinalizeCacheEntry is now called during actual CloseCacheEntry and under the same conditions as before this patch
- no cycle collecting added
- I still clear the member in the dtor to keep in sync with Henry's code; if anything will be changed in that destructor, let's do it in a different bug
Attachment #489991 - Attachment is obsolete: true
Attachment #502625 - Flags: review?(bzbarsky)
Attached patch v1 (e10s) (obsolete) — Splinter Review
As Jason is quit busy these days, if anyone else is willing to review this, please take it.

The patch does:
- mCacheEntryClosePreventionCount and the object declaration/implementation moved to HttpBaseChannel
- the object now calls two virtual methods (in ctor and dtor), implemented differently by both nsHttpChannel and HttpChildChannel
- HttpChannelParent now simply keeps the hold lock instead of the cache entry directly
- it is held until one of the child channel is deleted or the "document channel cleanup" event is received
- HttpChannelChild::mKeptAlive renamed to mDeferredDeleteSelf; meaning remains more or less unchanged
- new private method HttpChannelChild::MaybeDeleteSelf added responsible for immediate channel deletion if: 1) the cache hold lock is not held and 2) the channel is not a document load channel; if any of these conditions are not met, the channel is left alive and mDeferredDeleteSelf is set to true
- after all hold locks on the child channel are released MaybeDeleteSelf is called; this may release the child channel and call ActorDestroyed on the parent side that releases the hold lock on the real channel
- based on that, no need for any new protocol events for this logic
- HttpChannelChild::SetResponseHeader sends key/value to the parent now
Attachment #502638 - Flags: review?(jduell.mcbugs)
Comment on attachment 502625 [details] [diff] [review]
v1.1 + test [Check in comment 70 / Back out comment 71]

>+    // True if CloseCacheEntry was called while cache entry hold counter was

s/hold counter/close prevention count/

Also s/GECKO_2_0/MOZILLA_2_0_BRANCH/

r=me with those nits.  Thank you for fixing this!
Attachment #502625 - Flags: review?(bzbarsky) → review+
Comment on attachment 502638 [details] [diff] [review]
v1 (e10s)

Maybe dwitte has time to review this?
Attachment #502638 - Flags: review?(dwitte)
Comment on attachment 502625 [details] [diff] [review]
v1.1 + test [Check in comment 70 / Back out comment 71]

http://hg.mozilla.org/mozilla-central/rev/2d7561bc0cb0

E10s fix still not even reviewed, leaving open.
Attachment #502625 - Attachment description: v1.1 + test → v1.1 + test [Check in comment 70]
Flags: in-testsuite+
The test failure seems to be reproducible.  Going to investigate.
Confirming this is causing bug 626244.

Cause:
- this is about timing of releasing the cache close prevention lock
- if it is released (and the cache entry closed) BEFORE a subsequent request for the same cache entry gets processed, we get read+write access
- if it is released AFTER a new request has been processed, we get only read access
- based on that, in nsHttpChannel::CheckCache we bypass validation for read-only cache entries and do ReadFromCache for the load
- the network console in the test doesn't get logged the network request, because it has been satisfied from the cache
- => the "network message" is not found in the console log
- then the test tries to emulate "click" on the network message, that is clickable
- => as there is none, the test gets stuck and times out

Suggestions:
1. log also cache request to the console (joking, but on the other hand, wouldn't be that bad to do anyway)
2. change the test to reload the window after a very short timeout to ensure the cache entry has been released ; that doesn't fix possible failures of this tests in the future if we hold a cache entry for a longer time from whatever reason, but for now it should be sufficient
Attached patch test regression fix, v1 (obsolete) — Splinter Review
Mihai, please see comment 73 for details.
Attachment #504575 - Flags: review?(mihai.sucan)
Comment on attachment 504575 [details] [diff] [review]
test regression fix, v1

The change looks fine. Thanks for the fix!
Attachment #504575 - Flags: review?(mihai.sucan) → review+
Attachment #502625 - Attachment description: v1.1 + test [Check in comment 70] → v1.1 + test [Check in comment 70 / Back out comment 71]
Comment on attachment 502638 [details] [diff] [review]
v1 (e10s)

I'm looking at this. (There's only so much time one can spend sitting on the beach :)
Attachment #502638 - Flags: review?(dwitte)
Comment on attachment 502638 [details] [diff] [review]
v1 (e10s)

># HG changeset patch
># Parent 25fbb3790b3cc52821644b6061177f24b3a63560

+[scriptable, uuid(746064fc-8783-4d19-9c5d-6361ed807b39)]
+interface nsICacheInfoChannel_GECKO_2_0 : nsISupports
+{
+  /**
+   * Return an object that while not released prevents the channel from
+   * releasing the cache entry after all work on it has been done.  Used by
+   * asynchronous consumers that needs to work with the cache entry long after
+   * onStopRequest has been called.
+   */
+  readonly attribute nsISupports cacheEntryClosePreventer;

Add to comment: "Must be acquired no later than during OnStopRequest."

>diff --git a/netwerk/protocol/http/HttpBaseChannel.cpp b/netwerk/protocol/http/HttpBaseChannel.cpp
> 
>+  // These two methods are called on the preventer creation and deletion
>+  // respectively.  Their implementation is responsible for preventing
>+  // the cache entry being closed after the channel finishes its job,
>+  // i.e. after OnStopRequest has been called, if the number of calls to
>+  // OnIncrease method is larger then number of calls to OnDecrease method.

How about something shorter, say

    // Increment/decrement counter that keeps channel's cache entry open after
    // OnStopRequest if needed.

>+
>+  virtual void OnIncreaseCacheEntryClosePreventCount() = 0;
>+  virtual void OnDecreaseCacheEntryClosePreventCount() = 0;

>+  // Object craeted and returned on every call to cacheEntryClosePreventer
>+  // attribute.  Calls the two methods right above in its constructor and
>+  // destructor respectively.

s/craeted/created/

>diff --git a/netwerk/protocol/http/HttpChannelChild.h b/netwerk/protocol/http/HttpChannelChild.h
>--- a/netwerk/protocol/http/HttpChannelChild.h
>+++ b/netwerk/protocol/http/HttpChannelChild.h
>@@ -48,16 +48,17 @@
> #include "mozilla/net/ChannelEventQueue.h"
> 
> #include "nsICacheInfoChannel.h"
>+#include "nsICacheInfoChannel.h"

Any reason you want to #include nsICacheInfoChannel.h twice in a row here? :)

>+  // Indicates the channel was not deleted based on various conditions that
>+  // requires further communication to the parent channel. When these
>+  // conditions change to allow this channel be deleted and this flag is
>+  // set, we shall immediately proceed with deletion.
>+  //
>+  // The conditions are:
>+  // - this is a channel loading a document
>+  // - the cache entry is prevented from close

    // Indicates IPDL channel was not deleted during OnStopRequest, because
    // 1) this a document-level channel (IPDL channel will be deleted during
    // destructor); or 2) mCacheEntryClosePreventionCount is non-zero (IPDL
    // channel will be deleted when count hits 0).

>+  bool mDeferredDeleteSelf;
> 
>   void OnStartRequest(const nsHttpResponseHead& responseHead,
>                           const PRBool& useResponseHead,
>                           const RequestHeaderTuples& requestHeaders,
>                           const PRBool& isFromCache,
>                           const PRBool& cacheEntryAvailable,
>                           const PRUint32& cacheExpirationTime,
>                           const nsCString& cachedCharset,
>@@ -185,16 +199,21 @@ private:
>   void OnCancel(const nsresult& status);
>   void Redirect1Begin(const PRUint32& newChannelId,
>                       const URI& newUri,
>                       const PRUint32& redirectFlags,
>                       const nsHttpResponseHead& responseHead);
>   void Redirect3Complete();
>   void DeleteSelf();
> 
>+  void MaybeDeleteSelf(bool forceDocumentLoadDeletion = false);

I think this would probably be better named "MaybeCloseIPDL", since in most of the use cases we're not talking about anything that leads to "delete this" (at most we do a Release: only in the RecvDeleteSelf case is that likely to actually cause this object to get deleted).


>diff --git a/netwerk/protocol/http/HttpChannelParent.cpp b/netwerk/protocol/http/HttpChannelParent.cpp
>+  // Prevent cache entry from being closed for various reasons:
>+  // - we need the cache entry in RecvSetCacheTokenCachedCharset()
>+  // - the child channel may also demand the cache entry be alive during its
>+  //   job, but as we do not guarantee call to HttpChannelParent::OnStopRequest
>+  //   after the child channel even got RecvOnStartRequest, we prevent close
>+  //   for lifetime of the child channel

    // Prevent cache entry from being closed during HttpChannel::OnStopRequest:
    // - We need the cache entry for RecvSetCacheTokenCachedCharset()
    // - The child channel may call GetCacheEntryClosePreventer, so we have to
    //   call it now (otherwise we could hit OnStopRequest and close entry
    //   before child gets a chance to keep it open).
    // We close entry either when RecvDocumentChannelCleanup is called, or the IPDL
    // channel is deleted. 

>+  chan->GetCacheEntryClosePreventer(getter_AddRefs(mCacheClosePreventer));

>+void
>+nsHttpChannel::OnIncreaseCacheEntryClosePreventCount()
>+{
>+    ++mCacheEntryClosePreventionCount;
>+    LOG(("mCacheEntryClosePreventionCount increased to %d, [this=%x]",
>+         mCacheEntryClosePreventionCount, this));
>+}

Worth prefixing LOG msg with 'nsHttpChannel::' and also using LOG in HttpChannelChild versions of these functions?  Not a big deal--your decision.


>+void
>+nsHttpChannel::OnDecreaseCacheEntryClosePreventCount()
>+{
>+    --mCacheEntryClosePreventionCount;
>+    LOG(("mCacheEntryClosePreventionCount decreased to %d, [this=%x]",
>+         mCacheEntryClosePreventionCount, this));

Ditto
Attachment #502638 - Flags: review?(jduell.mcbugs) → review+
So if I read the comments correctly, all three of the patches here need to (re)land.
Ready to land.
Attachment #502625 - Attachment is obsolete: true
Attachment #504575 - Attachment is obsolete: true
Attachment #506260 - Flags: review+
Updated to Jason's comments.  Ready to land.
Attachment #506261 - Flags: review+
Attachment #502638 - Attachment is obsolete: true
Comment on attachment 506260 [details] [diff] [review]
v1.2 (merge of 1.1 and the test update) [Check in comment 81]

http://hg.mozilla.org/mozilla-central/rev/9acd51502a65
Attachment #506260 - Attachment description: v1.2 (merge of 1.1 and the test update) → v1.2 (merge of 1.1 and the test update) [Check in comment 81]
Comment on attachment 506261 [details] [diff] [review]
v1.1 (e10s) [Check in comment 82]

http://hg.mozilla.org/mozilla-central/rev/38d1b321c98b
Attachment #506261 - Attachment description: v1.1 (e10s) → v1.1 (e10s) [Check in comment 82]
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b10
I've backed this out since it appears to be the cause of bug 559932 going near perma-orange. All of the failures apart from the first one in there were either when this code was in tree or from try server runs.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 628422
Cause of bug 628422:

When an HTML page parsed with the HTML5 parser refers a resource whom URL resolves to the referring page and is necessary to finish parsing (blocks the parser), we end up with the second URL load waiting for the cache entry held by the first channel loading the HTML page.  The parser makes the cache entry be held after OnStopRequest.

This could not be simply fixed without allowing read a cache entry while we write into it or changing the code to re-open the cache entry when we need to modify it with SetResponseHeader.  No ideas for a fix at the moment.
Hmm.  So how, if at all, did this work before the HTML5 parser?  Or did we simply have this bug in that situation before?
(In reply to comment #86)
> Hmm.  So how, if at all, did this work before the HTML5 parser?  Or did we
> simply have this bug in that situation before?

Not sure I understand your objection well.  Before this bug fix the cache entry were released after OnStopRequest, always, independently of using HTML5 parser.  So, request(s) to the same page just started to load after the main (document) request finished loading.

With patch for this bug, the parser waits for a CSS load and cannot continue until content of the CSS file is know.  The parser hold the cache entry of the document.  The request for the CSS file is for the same URL as the document.  So, the CSS request waits for the cache entry be released.  But it is held by the parser.  And it won't release it until it finishes its job.  One waits for the other.
> Before this bug fix the cache entry were released after OnStopRequest

Yes, but the point is that OnStopRequest could happen before the document finished parsing, right (e.g. if a <script> pointed at the same URI)?  Which would land us in the situation of comment 4 if the <script> comes before the <meta>?  So we had this bug (as described in comment 0) in that situation?

Or did something else happen?

> With patch for this bug

Yes, I understand the behavior with the patch for this bug.  I'm asking what the 3.6 behavior was.
I talked to Honza, and 3.6 does in fact have this bug, if a <script> that takes a while to load comes before the <meta>.

On trunk, can we null out mCacheEntryClosePreventer when we block on a script?  That should get the common case of <meta> early in the document without running into problems due to other requests waiting on our cache entry...
This is a pretty big change to pretty organic code -- are we CERTAIN that we need to take the risk and effort on this before FF4, given that it's a regression in frequency but not in kind, and the failure mode isn't that bad?

I would really like to unblock this both to free up resources and to eliminate a source of late-game risk.  Can I have a second?
Henri, it seems to me that this is mostly a problem if there is no <script src> after the <meta>, right?  If that's so, I would argue that real-life incidence of this is probably low and that we should fix it in the next release.
(In reply to comment #90)
> This is a pretty big change to pretty organic code -- are we CERTAIN that we
> need to take the risk and effort on this before FF4, given that it's a
> regression in frequency but not in kind, and the failure mode isn't that bad?

Comment 26 still represents my opinion on this.

 * We still have *zero* reports of real sites breaking due to this. (I filed this bug based on a test case.)
 * Chrome and Safari don't make http-equiv talk back to the HTTP layer, because they haven't seen the need to do so. (So if we don't fix this, the level of potential badness is bounded to "no worse than WebKit". I'm not saying we shouldn't aim to be better than WebKit in general. My point is that there isn't a dire disaster looming.)
 * IE hasn't revised their support in ages. They haven't made the HTTP 1.1 header names work.

So no, I am not certain that we need to take the risk and effort here for Firefox 4.

I suggest unblocking.

(In reply to comment #91)
> Henri, it seems to me that this is mostly a problem if there is 
> no <script src> after the <meta>, right?

The tree op executor never blocks the stream consumption, so OnStopRequest can be reached before any tree ops have been run. So no, I think you can't assume any relationship between <script src> and still having a reference to the cache entry.

> If that's so, I would argue that real-life incidence
> of this is probably low and that we should fix it in the next release.

Unless substantial new evidence shows up, I suggest WONTFIXing and landing attachment 473506 [details] [diff] [review] to remove the remaining race condition. (Landing attachment 473506 [details] [diff] [review] is something we could do for Firefox 4 even, in my opinion.)
Agreed.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → WONTFIX
OK, fine.  Please file a followp (to land post-2.0) for removing the races?
(In reply to comment #94)
> Please file a followp (to land post-2.0) for removing the races?

Filed bug 629621.
Jason, Boris, it might be worth to land the infrastructure for holding the cache entry, both the base and e10s patches, while not being used by the parser, as it might be useful in the future.

Agreed?  (A lot of time to answer as this would happen after 2.0 release).
Given bug 628422, I would be worried about doing that while our cache doesn't support parallel access.  Too easy to screw up.
Duplicate of this bug: 643710
" * We still have *zero* reports of real sites breaking due to this. (I filed
this bug based on a test case.)"
FYI, Bug 643710, which is a duplicate, is such a report.
Duplicate of this bug: 654357
> We still have *zero* reports of real sites breaking due to this.

+1 :(

This has broken one of the largest banking sites in Germany:

> we got a complaint from the Postbank (Postbank Group is one of Germany’s major
> financial services providers with 14 million active domestic customers, 20,000
> employees and total assets of €215 billion) and they complaint about Firefox 5
> and are thinking of recommending to user a different browser, other than
> Firefox.
> 
> The Issue seems to be:
> 
> They have a kind of outage/overloading/service down page -
> https://banking.postbank.de/app/blogged.html - This page has in the source
> code -> <meta http-equiv="cache-control" content="no-cache" /> and  
> <meta http-equiv="pragma" content="no-cache" /> 
> 
> So the Problem/Regression they think we have with Firefox 5 is that they
> complain that every user - who have seen this outage page before (because of a
> real outage or for whatever reason) and because its saved in the cache -
> cannot use the online banking service because (even when the online banking
> service is running) they get the outage page. 
> 
> According to Postbank they got a lot of complaints/press inquiries etc why
> there online banking service is down because of hitting this outage page every
> time due to the cache issue.
> 
> Also it seems that other browsers do not have the Problem.
> 
> Do you know if we have changed something in cache control in Firefox 5? Do we
> get also got reports in sumo about such cache issues?

I'm surprised they didn't see this same error in Firefox 4 (but it's a race
condition between the parser calling SetHeader and the necko channel hitting
OnStopRequest, so who knows what might have changed to make it happen more often
in FF 5).  I'm also surprised that Chrome/Safari haven't hit this issue: comment
9 asserts they don't support cache-control pragmas.  I just tested chrome on my
linux box with Henri's test from comment 8, and it indeed doesn't work:  but one
thing it *does* do differently (along with Safari, apparently) that might
mitigate this a little is that it does a reload/no-cache load whenever a user
types a URI in the address bar (should we do that too?).  (Also, curiously, it
loads from cache if you open a bookmark, unless you're already at the bookmarked
URI, in which case it does a reload: perhaps also worth taking?).

The solution for Postbank is clearly that they should immediately start sending
the HTTP header "Cache-Control: no-cache" in their outage HTML responses: that's
a canonical way to ensure a document isn't cached (and also works with proxies,
where the <meta> tags usually don't: see http://www.mnot.net/cache_docs/).

There is sadly little that can be done to get users who have cached outage pages
to get to the live bank site, other than to have them refresh the page (i.e. hit
control-R, or the reload button), and/or to wait for the cache expiration logic
to expire the cache entry (by default, if no "Expires" header was set, we use
10% of the time between the "Last-Modified" value and the time the page was
received: i.e. if the page was modified 10 days ago, we cache it for a day.

Should we re-implement cache-control pragmas?  We've now shipped 2 versions
without it (with the proviso that it's a race that might becoming more
visible?), Chrome and Safari don't support it, and this is the first major issue
we've run into.  So I'd guess no.  But this was a painful hit for Postbank, and
there are still "recipe" sites on the net that say that advise using no-cache
<meta> tags.  A solution along the lines of comment 89 should still be possible
if we think it's worth re-enabling support.  It doesn't look like that much
extra work on top of what we've already implemented in the earlier broken
patches.
> but it's a race condition

No, it's not.  In firefox 4, those <meta> tags just never have an effect, precisely to avoid it being a race condition....
(In reply to comment #102)
> > but it's a race condition
> 
> No, it's not.  In firefox 4, those <meta> tags just never have an effect,
> precisely to avoid it being a race condition....

It's racy in Firefox 4. The race removal landed for Firefox 5: bug 629621.


(In reply to comment #101)
> Should we re-implement cache-control pragmas?

I think we shouldn't re-implement cache-control metas based on this data. The lack thereof wasn't breaking numerous apps in their normal form. What broke was an ad hoc outage page which I bet didn't have the right response status (503). (People creating outage pages *really* ought to get their response status right...)
You need to log in before you can comment on or make changes to this bug.