717350 - Doom Cache Entries with last-modified inconsistency

Assignee

Description

•

12 years ago

If I do 

GET /foo
If-Modified-Since: Jan-10

and I get

HTTP/1.1 304 not modified
Last-Modified: Jan-7

that's pretty suspicious. Because the IMS header is derived from a previous l-m entry and now the server l-m is going backwards in time. One of those things is probably wrong (I suspect the cache entry) or at least out of date.

This is a failure mode of the server pipeline implementation from bug 716840.

I suggest we take a pretty conservative approach here and just doom the cache entry when we see it. i.e. honor the 304 but clean up the cache so that future reloads will get a version from scratch.

Patrick McManus [:mcmanus]

Assignee

Comment 1

•

12 years ago

Attached patch patch 0 (obsolete) — Details — Splinter Review

honza, this is in my pipeline queue after restart-in-progress

Assignee: nobody → mcmanus

Status: NEW → ASSIGNED

Attachment #587783 - Flags: review?(honzab.moz)

Attachment #587783 - Flags: feedback?(michal.novotny)

Honza Bambas (:mayhemer)

Comment 2

•

12 years ago

Comment on attachment 587783 [details] [diff] [review]
patch 0

Changes like these needs a test.  xpc-shell probably.

Michal Novotny [:michal]

Comment 3

•

12 years ago

Comment on attachment 587783 [details] [diff] [review]
patch 0

(In reply to Patrick McManus [:mcmanus] from comment #0)
> cache entry when we see it. i.e. honor the 304 but clean up the cache so
> that future reloads will get a version from scratch.

Why should we honor 304 for the current request when it is suspicious?


> +    rv = mResponseHead->GetHeader(nsHttp::Last_Modified,
> +                                  lastModified);
> +    if (NS_SUCCEEDED(rv))
> +        rv = mResponseHead->GetHeader(nsHttp::Last_Modified, 
> +                                      lastModified304);

The first one should be mCachedResponseHead, right?


> +        if (mCacheEntry)
> +            mCacheEntry->Doom();

We always have mCacheEntry here. There is a check NS_ENSURE_TRUE(mCacheEntry,...) few lines above.

Attachment #587783 - Flags: feedback?(michal.novotny)

Patrick McManus [:mcmanus]

Assignee

Comment 4

•

12 years ago

(In reply to Michal Novotny (:michal) from comment #3)
> Comment on attachment 587783 [details] [diff] [review]
> patch 0
> 
> (In reply to Patrick McManus [:mcmanus] from comment #0)
> > cache entry when we see it. i.e. honor the 304 but clean up the cache so
> > that future reloads will get a version from scratch.
> 
> Why should we honor 304 for the current request when it is suspicious?

I'm trying to take the least invasive route here. the 304 is suspicious, but I haven't 100% concluded that it is an error - so I don't want to start tossing errors.. but if we can "right the ship" in the background but evicting that cache entry, I think that's a good conservative first step.


> The first one should be mCachedResponseHead, right?
 

one of them should be, yes. thanks.

Honza Bambas (:mayhemer)

Comment 5

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #0)
One of those
> things is probably wrong (I suspect the cache entry) or at least out of date.

But you prefer to load it, a bit controversial.  When you detect this, it could potentially be also some kind of protection (not sure if not double-blade right now) against cache poisoning.

> 
> This is a failure mode of the server pipeline implementation from bug 716840.

Can you please rephrase this?  I don't quit understand the relation.

Patrick McManus [:mcmanus]

Assignee

Comment 6

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #5)
> (In reply to Patrick McManus [:mcmanus] from comment #0)
> One of those
> > things is probably wrong (I suspect the cache entry) or at least out of date.
> 
> But you prefer to load it, a bit controversial.  

we can add more, even in another patch. I'd like to get a little experience with it first to validate (pun!) the idea with a penalty that won't stop anything from operating.

> > 
> > This is a failure mode of the server pipeline implementation from bug 716840.
> 
> Can you please rephrase this?  I don't quit understand the relation.

sure - first a bit about this .no "news site".. it (or one of its partner sites) is the only one I've had cited to me by another browser as being known to cause img mixups since IIS 5 and a busted squid version have left the scene many years ago now. And I've done a lot of asking on the topic. (netscaler hardware has other failure modes - generally rsts which are much more detectable as broken images). I can't find any other ones myself and fwiw it is extremely hard to get a reproduction on the vg/db.no site.

if you weren't aware iphone ios 5 does pipelining by default. That's awesome for us - it really helps validate the general suitability of the algorithm on the web. Because of this I have much greater concerns about head of line blocking performance problems than I do about serious safety issues.

Nonetheless, there is this .no site referenced in 716840... a variation on it shows up in a bunch of ancient bugzilla entries too. If you follow the link in that bug to a blog post you'll see the server operators have published a lot of information and even 2 pcaps.

basically at some point a request for "B" was given a valid response to "C" instead. I personally suspect something along the lines of a pipeline request of A,B,C was served as A,C,EOF - which would have the effect of using the C response for the B request (and caching it if the headers said so under B's URI). You can see how that would happen if something on the server side did an over-read when parsing A (analagous to our pushback code).

we can tell something like that happened because you can see in the packet traces a conditional-validation of the "B" URI using a If-Modified-Since of C's timestamp. The actual timestamp of B is older than C so it gets a 304 even though they don't match. (now you see why etags were invented!)

That brings us full circle to this patch - which basically says if the Last-Modified's of our cache entry (which is where we got the IMS value from) and the 304 revalidation don't match then we've got a problem.

It doesn't solve the initial problem (at the 200 stage) - I don't think you really can solve that. But it at least appears fringe enough at this point to not worry about it more than we would worry about any other broken server behavior. (which is to say - do our best but don't disable it for the vast majority of the web because of a break by something non compliant at the very long end of the tail - there will always be broken implementations.)

Patrick McManus [:mcmanus]

Assignee

Comment 7

•

12 years ago

Attached patch patch 1 (obsolete) — Details — Splinter Review

* adds an xpcshell test

* changes to use an appropriate existing 'red' feedback code instead of minting a new one.

* fixes the bug michal noted

Attachment #587783 - Attachment is obsolete: true

Attachment #587783 - Flags: review?(honzab.moz)

Attachment #588454 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 8

•

12 years ago

Comment on attachment 588454 [details] [diff] [review]
patch 1

Review of attachment 588454 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab however controversial the patch is..

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +1977,5 @@
> +    // our cache with a bad response. Take the minor step here of just dooming
> +    // that cache entry so there is a fighting chance of getting things on the
> +    // right track as well as disabling pipelining for that host.
> +
> +    nsCAutoString lastModified;

lastModifiedCached ?

@@ +1984,5 @@
> +    rv = mCachedResponseHead->GetHeader(nsHttp::Last_Modified,
> +                                        lastModified);
> +    if (NS_SUCCEEDED(rv))
> +        rv = mResponseHead->GetHeader(nsHttp::Last_Modified, 
> +                                      lastModified304);

In braces please.

@@ +1987,5 @@
> +        rv = mResponseHead->GetHeader(nsHttp::Last_Modified, 
> +                                      lastModified304);
> +    if (NS_SUCCEEDED(rv) && !lastModified304.Equals(lastModified)) {
> +        LOG(("Cache Entry and 304 Last-Modified Headers Do Not Match "
> +             "%s and %s\n", lastModified.get(), lastModified304.get()));

Nit: "'%s' and '%s'\n" to see any spaces and special chars.

::: netwerk/test/unit/test_mismatch_lm.js
@@ +25,5 @@
> +                                       offset, count) {
> +	var data = new BinaryInputStream(inputStream).readByteArray(count);
> +      
> +	// This is 'B'
> +	do_check_eq(data, 66);

Shouldn't there be data[0] ?  And maybe instead of 66 do "B".charCodeAt(0);

@@ +59,5 @@
> +	httpChan.asyncOpen(listener_3, null);
> +    }
> +};
> +
> +var listener_1 = {

You should add 

  QueryInterface: function(iid) {
    if (iid.equals(Components.interfaces.nsIStreamListener) ||
        iid.equals(Components.interfaces.nsIRequestObserver) ||
        iid.equals(Components.interfaces.nsISupports))
      return this;
    throw Components.results.NS_ERROR_NO_INTERFACE;
  },
 
to your listeners.


Btw, why not to use ChannelListener from head_channels.js here?

@@ +95,5 @@
> +    httpserver.start(4444);
> +
> +    var chan = ios.newChannel("http://localhost:4444/test1", "", null);
> +    var httpChan = chan.QueryInterface(Ci.nsIHttpChannel);
> +    httpChan.requestMethod = "GET";

Do you really need to do this?

@@ +103,5 @@
> +}
> +
> +var iter=0;
> +function handler(metadata, response) {
> +    iter++;

Nit: it might be a bit better to drive this iterator from the test code (just before asyncOpen with a direct assign) and not here.

::: netwerk/test/unit/xpcshell.ini
@@ +130,5 @@
>  [test_httpcancel.js]
>  [test_httpsuspend.js]
>  [test_idnservice.js]
>  [test_localstreams.js]
> +[test_mismatch_lm.js]

test_mismatch_last-modified.js

Attachment #588454 - Flags: review?(honzab.moz) → review+

Honza Bambas (:mayhemer)

Comment 9

•

12 years ago

> I personally suspect something along the lines of a pipeline request of A,B,C was
> served as A,C,EOF ... something on the server side did an over-read when parsing A

In other words, A must then be received broken.  We could detect this, restart its load (should be triggered from the load context with properties not to pipeline - let's call it "reload in safe mode" that may retry w/o pipelining when it was engaged before) and along with it restart the rest of the pipeline sooner we start distributing the potentially broken content to upper level consumers.

I think we should have a mechanism like this anyway one day for cases of failed loads because of pipelining.


Re: the patch: I suspect we just end up with 200 OK/304 -> doom/200 OK/304 -> doom/etc looping on reloads.  However, from user's point of view, there will be a bad load on the first try from cache, and a good load on F5, which is good...

Telemetry?

Patrick McManus [:mcmanus]

Assignee

Comment 10

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #9)
> > I personally suspect something along the lines of a pipeline request of A,B,C was
> > served as A,C,EOF ... something on the server side did an over-read when parsing A
> 
> In other words, A must then be received broken.  

A is just fine.. B gets C's response body (but in a valid HTTP way), and C is detected as a hangup and probly restarted and obtained fine. That matches the data of what is seen off 716840 - two cache items with the same L-M header (i.e. C's), where upon revalidation it is obviously wrong for one of them.

> 
> Re: the patch: I suspect we just end up with 200 OK/304 -> doom/200 OK/304
> -> doom/etc looping on reloads.  However, from user's point of view, there
> will be a bad load on the first try from cache, and a good load on F5, which
> is good...


I don't think we'll end up in the loop for a couple of reasons.. one, this is exceptionally rare and a race condition. two, this patch generates a negative pipeline feedback event which will probably break the cycle.

Patrick McManus [:mcmanus]

Assignee

Comment 11

•

12 years ago

Attached patch v2 — Details — Splinter Review

updated rev

Attachment #588454 - Attachment is obsolete: true

Attachment #597424 - Flags: review+

Patrick McManus [:mcmanus]

Assignee

Comment 12

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9d37b359199f

Matt Brubeck (:mbrubeck)

Comment 13

•

12 years ago

Sorry, I backed this out on inbound because something in the push appeared to cause a leak in mochitest-browser-chrome:
https://hg.mozilla.org/integration/mozilla-inbound/rev/577da08878cb

Patrick McManus [:mcmanus]

Assignee

Comment 14

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f94cf04e8d73

Marco Bonardo [:mak]

Comment 15

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/f94cf04e8d73

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla14

patch 0 12 years ago Patrick McManus [:mcmanus] 5.31 KB, patch		Details \| Diff \| Splinter Review
patch 1 12 years ago Patrick McManus [:mcmanus] 8.78 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review
v2 12 years ago Patrick McManus [:mcmanus] 11.43 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review

Bugzilla

Quick Search

Doom Cache Entries with last-modified inconsistency

Categories

(Core :: Networking: HTTP, defect)

Tracking

()

People

(Reporter: mcmanus, Assigned: mcmanus)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Attachment

General

Description

File Name

Content Type