Last Comment Bug 715198 - browser is loading compressed data directly to the page
: browser is loading compressed data directly to the page
Status: VERIFIED FIXED
[qa!]
:
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: 12 Branch
: x86_64 Mac OS X
: P1 normal with 1 vote (vote)
: mozilla12
Assigned To: Bjarne (:bjarne)
:
Mentors:
Depends on:
Blocks: 648429 712032 712277
  Show dependency treegraph
 
Reported: 2012-01-04 10:18 PST by Josh Aas
Modified: 2015-03-05 17:13 PST (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
affected
-
affected
+
verified
+
fixed


Attachments
screenshot (282.09 KB, image/png)
2012-01-05 15:10 PST, Josh Aas
no flags Details
Disables compression (2.95 KB, patch)
2012-01-15 12:15 PST, Bjarne (:bjarne)
jaas: review+
Details | Diff | Review
patch to disable compression v1.1 (3.74 KB, patch)
2012-01-16 11:19 PST, Geoff Brown [:gbrown] (pto May 28-June 13)
bjarne: review+
Details | Diff | Review
patch to disable compression v1.1 (3.90 KB, patch)
2012-01-16 14:36 PST, Geoff Brown [:gbrown] (pto May 28-June 13)
gbrown: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review

Description Josh Aas 2012-01-04 10:18:43 PST
Over the past couple of weeks I've started noticing what looks like compressed data loaded as text on web pages. This happens with pages that are (or should be) cached. I'll try to get a screenshot next time I see it.

I frequently switch back and forth between nightly builds and the latest official Firefox. The nightly has cache compression, Firefox 9 does not. My theory is that Firefox 9 is trying to load compressed cache entries.
Comment 1 Bjarne (:bjarne) 2012-01-04 16:04:45 PST
Didn't the cache-compression patch bump the cache-version? I would expect this to be the only reasonable way to allow the usage-pattern described.
Comment 2 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-04 16:21:27 PST
(In reply to Bjarne (:bjarne) from comment #1)
> Didn't the cache-compression patch bump the cache-version? I would expect
> this to be the only reasonable way to allow the usage-pattern described.

Cache compression did not change the cache version (nsDiskCache::kCurrentVersion). My thinking was that the only pseudo-structural change was the addition of the "uncompressed-len" header, which signals that the entry is compressed. Such a header would not exist in any "old" cache, and would not trigger decompression; I did not think that we would use a "new" cache with a previous version of software. 

If you run a new build (with cache compression) and create compressed cache entries, then revert to an older build (without cache compression), the older build will read the compressed data, take no notice of the "uncompressed-len" header, and display the compressed data -- that's almost certainly what's happening here.

As I understand it, a version bump will cause existing caches to be deleted on both upgrade and downgrade. I suppose that's appropriate here -- to avoid this problem.
Comment 3 Bjarne (:bjarne) 2012-01-05 03:20:39 PST
So, how much switching back-and-forth do we want to allow? Josh?

We have IMO two options: 1) We bump the cache-version, making caches from pre- and post-compression-builds incompatible (which means we lose the cache when switching between pre- and post-compression-builds) or 2) We invent some naming-scheme which makes pre-compression-builds ignore (i.e. not find) cache-entries created by post-compression-builds.

(And there is of-course the trivial option: Regard this as wrong usage and do nothing.)
Comment 4 Josh Aas 2012-01-05 07:46:09 PST
(In reply to Bjarne (:bjarne) from comment #3)
> So, how much switching back-and-forth do we want to allow? Josh?

I'm not sure what the answer is, but this question is going to come up again and again as we continue to change the cache in significant ways. If bumping the cache version is such a big problem that we'll never do it then we're going to be pretty limited going forward.

Maybe the answer is to turn compression off on desktop until the next time we want to bump the cache version, so we get more in exchange for the trouble. We can keep compression on for mobile, where we've never shipped a disk cache and compression seems to help more anyway. Not sure how difficult that would be.

Being able to switch to an older browser without getting junk loaded in your content seems like a behavior that should be allowed. Right now it's a problem for developer build usage, but eventually it will be a problem for switching between official release builds. And when this is happening to you, it's not a small problem.
Comment 5 Michal Novotny (:michal) 2012-01-05 07:55:34 PST
(In reply to Josh Aas (Mozilla Corporation) from comment #4)
> (In reply to Bjarne (:bjarne) from comment #3)
> > So, how much switching back-and-forth do we want to allow? Josh?
> 
> I'm not sure what the answer is, but this question is going to come up again
> and again as we continue to change the cache in significant ways. If bumping
> the cache version is such a big problem that we'll never do it then we're
> going to be pretty limited going forward.

IMO bumping the cache version isn't a problem. We just forgot to do it...
Comment 6 Jason Duell [:jduell] (needinfo? me) 2012-01-05 11:57:25 PST
> IMO bumping the cache version isn't a problem. We just forgot to do it...

This would be the first time we've bumped the cache version since we shipped FF 4, i.e. since we have a very large disk cache, so we need to be absolutely sure we've gotten rid of any possible hangs from the DeleteDir logic before we unleash this on every single FF user.

The telemetry for NETWORK_DISK_CACHE_TRASHRENAME actually looks pretty good since our recent bugfixes (that landed in aurora).  Right now the numbers at

   https://metrics.mozilla.com/pentaho/content/pentaho-cdf-dd/Render?solution=metrics2&path=telemetry&file=telemetryHistogram.wcdf

show the vast majority of users taking less than 300ms for the initial rename to Cache.trash (which is the blocking part of DeleteDir), and no significant tail.  (I selected data for the last week for FF 11.0a2 (aurora).  

But we still have to delete up to 1 GB on the background thread (each bump of the cache version flushes a lot of data--200 million users * 1 GB = 200 Petabytes upper bound), and I'd personally feel more comfortable if we had some user testing (like on old slow Windows boxes) to make sure the way we do that now is minimally disruptive.  If there's any chance that we've got a tradeoff between 1) showing compressed data for users who install an older FF on top of a new one, vs 2) greeting some significant % of our total user base with a Firefox N+1 that locks up their system for 10 sec or a minute (the telemetry #s max at
10 seconds--see bug 709286), I'd rather pick #1.
Comment 7 Bjarne (:bjarne) 2012-01-05 13:47:05 PST
I say we bump. This mechanism exists to ensure the code and cache are in sync. Bug #713480 will benefit from this as well.
Comment 8 Josh Aas 2012-01-05 15:10:19 PST
Created attachment 586240 [details]
screenshot
Comment 9 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-05 17:07:21 PST
I agree with bumping too (but I got us into this mess!).
Comment 10 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-05 17:39:53 PST
We must either disable disk compression or bump the version number now.

I filed bug 715752 describing a potential improvement for the versioning problem. Please see the comments in that bug.
Comment 11 Josh Aas 2012-01-06 10:32:02 PST
This is happening to me constantly. We need to do something now, either turn off compression or bump the cache version. Once we do that we can think about what we really want to ship.

Bjarne - can you put together a patch to disable compression or bump the cache version? It looks like we may disable cache on mobile, and if compression is the biggest win there then we may want to just disable compression for now rather than blow away lots of caches.
Comment 12 Jason Duell [:jduell] (needinfo? me) 2012-01-06 13:58:32 PST
One other idea would be for us to change cache versioning so that we support backward-compatibility when possible (i.e. we have cache version "bumps", which allow older caches to be upgraded, and "breaks", which require that we trash old caches).  

Here's a sketch of how this could work:

Change the cache code so that we freeze the current version number, also freeze the existing "dirty" bit to true, then add a another, "minor" version number and a "newdirty" bit.  Code going forward would read { currentVersion, dirty, newVersion, newDirty } and only blow the cache away if currentVersion is a mismatch, or newVersion is greater than the version it knows about, or if newDirty is set.  Old code would see the old dirty bit and blow the cache away.  

I know this seems ugly, but IMO it's less ugly than blowing away all our users' cache for minor bugs like this, or for improvements that could logically be implemented w/o destroying the existing cache.

I don't know if this could also be enough for bug 713480 to be fixed.  If that's going to require a cache version break, this is moot (for now at least).
Comment 13 Jason Duell [:jduell] (needinfo? me) 2012-01-06 14:03:51 PST
My last comment makes more sense perhaps if you s/currentVersion/majorVersion/ and s/newVersion/minorVersion/.

It's fairly similar to the GNU convention for system C library naming, in case that's a useful reference point.
Comment 14 Bjarne (:bjarne) 2012-01-07 15:19:39 PST
(In reply to Josh Aas (Mozilla Corporation) from comment #11)
> This is happening to me constantly. We need to do something now, either turn
> off compression or bump the cache version. Once we do that we can think
> about what we really want to ship.

If turning off compression is an option I vote for that. It's simplest to do by backing out the patch which enabled it, but we could also add a pref with default value off. I can do that with some investigation although Geoff could probably do it faster since he knows the patch better.

Bumping the cache-version is trivial - see first line of comment #2.

Let me know what you prefer. I recommend adding the pref immediately, then get an overview of the various bugs which require or will benefit from a bump (see comments in this bug and in bug #715752), then schedule all these for some release together with a bump.
Comment 15 (dormant account) 2012-01-07 17:11:45 PST
Compression provides a decent perf boost on mobile(possibly elsewhere). I would rather bump the cache version, our users are already pretty used to suffering the wrath of cache invalidation :)
Comment 16 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-07 18:29:56 PST
The only way we can keep this feature enabled on Aurora is if we bump the cache version number. But, as I mentioned in bug 715752, we shouldn't bump the cache version number if we can avoid it. If this is an important feature for mobile Firefox, then I propose that we:

1. Do the version bump and keep the feature enabled on mobile.
2. Disable the feature on desktop, as suggested by Bjarne in comment 14, until we have resolved bug 715752 (regardless of whether that bug is fixed or WONTFIXed).
3. Write a patch for the beta channel that causes it to ignore entries with the "uncompressed-len" property, and nominate it for the release channel, in case we have any future point releases.

Release drivers: I marked this as tracking all the Firefox versions I could, because this bug in Nightly and Aurora basically corrupts the cache for the Beta and Release versions, when the user switches back and forth between Nightly/Aurora and Beta/Release. Even when we fix this bug for Nightly/Aurora, the patch I suggest in item #3 in this comment would be useful to get Beta/Release to skip past this corruption.
Comment 17 Bjarne (:bjarne) 2012-01-08 11:11:26 PST
Observe that we must bump the cache version at some point in any case if we want to enable compression while allowing users to switch back and forth between releases. The question is whether we do it now or wait - caches will be blown away at some point in any case. Fixing bug #715752 will give fewer of these discussions in the future, preffing compression off gives flexibility wrt trying this feature and desktop vs mobile, but a bump has to come at some point. (Fixing bug #715752 should be coordinated this with a bump, btw, so please keep this in sync.)
Comment 18 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-08 14:27:12 PST
I think we should turn compression off and resolve bug 715752 and the other compression-related bugs before we turn it back on, at least on desktop.

There are other cache entry format changes that I would like to make for Firefox 12, so especially if we decide not to fix bug 715752, then we should at least try to make all of these format changes at once in that version. But, I think we really should fix bug 715752.
Comment 19 Josh Aas 2012-01-08 16:41:31 PST
(In reply to Brian Smith (:bsmith) from comment #18)
> I think we should turn compression off and resolve bug 715752 and the other
> compression-related bugs before we turn it back on, at least on desktop.

Agreed. Bjarne - can you post a patch for this? I suggest leaving compression in but turning it off at compile time (maybe an ifdef)?

Like I said before, there is a good chance we will turn the disk cache off on mobile soon in order to resolve more issues before we ship. If we do that then turning off compression doesn't hurt us on mobile.
Comment 20 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-08 22:46:36 PST
> Agreed. Bjarne - can you post a patch for this? I suggest leaving
> compression in but turning it off at compile time (maybe an ifdef)?

A simple way to disable compression is to change the value of pref browser.cache.compression_level to 0:
https://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#99
https://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheEntryDescriptor.cpp#333
Comment 21 Bjarne (:bjarne) 2012-01-09 04:17:33 PST
(In reply to Geoff Brown [:gbrown] from comment #20)
> > Agreed. Bjarne - can you post a patch for this? I suggest leaving
> > compression in but turning it off at compile time (maybe an ifdef)?
> 
> A simple way to disable compression is to change the value of pref
> browser.cache.compression_level to 0:

This ok with you, Josh? Disable it for desktop and leave it on for mobile?
Comment 22 Josh Aas 2012-01-09 08:55:49 PST
(In reply to Bjarne (:bjarne) from comment #21)

> This ok with you, Josh? Disable it for desktop and leave it on for mobile?

I'm not a relevant owner or peer, so ultimately this is not my decision to make, but I think we should just disable it all around until we're more comfortable with it. Doing this via the pref sounds good to me. If you you really want to leave it on for mobile you'll have to find a more fine-grained mechanism than that pref. I think we've seen enough issues pop up with the mobile cache in general that we're not ready to ship it with or without compression.
Comment 23 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-09 09:53:40 PST
(In reply to Josh Aas (Mozilla Corporation) from comment #22)
> If you you
> really want to leave it on for mobile you'll have to find a more
> fine-grained mechanism than that pref. 

There are separate prefs for Xul Fennec and Native Fennec. 

https://hg.mozilla.org/mozilla-central/file/9a230265bad5/mobile/xul/app/mobile.js#l110

https://hg.mozilla.org/mozilla-central/file/9a230265bad5/mobile/android/app/mobile.js

It seems that the pref was not copied over to Native Fennec but I'm pretty sure this was just an oversight. (Native Fennec is using the default (enabled, level=1)).

> I think we've seen enough issues pop
> up with the mobile cache in general that we're not ready to ship it with or
> without compression.

I hadn't heard about this (lately)...which bugs are tracking the problem(s)?
Comment 24 Alex Keybl [:akeybl] 2012-01-09 12:33:20 PST
(In reply to Brian Smith (:bsmith) from comment #16)
> Release drivers: I marked this as tracking all the Firefox versions I could,
> because this bug in Nightly and Aurora basically corrupts the cache for the
> Beta and Release versions, when the user switches back and forth between
> Nightly/Aurora and Beta/Release. Even when we fix this bug for
> Nightly/Aurora, the patch I suggest in item #3 in this comment would be
> useful to get Beta/Release to skip past this corruption.

Tracking for FF11/12 where the regression lives. Switching between channels is an uncommon user scenario for most of our users, so we won't be tracking a separate fix for Beta/Release.
Comment 25 Bjarne (:bjarne) 2012-01-09 13:35:24 PST
(In reply to Josh Aas (Mozilla Corporation) from comment #22)
> (In reply to Bjarne (:bjarne) from comment #21)
> 
> > This ok with you, Josh? Disable it for desktop and leave it on for mobile?
> 
> I'm not a relevant owner or peer, so ultimately this is not my decision to
> make

Sorry - just me being confused about the decision-process and who takes part in it. I make the decision, then, to disable this all over the place using the pref. If this in inappropriate, feel free to correct me.
Comment 26 Josh Aas 2012-01-09 13:36:39 PST
(In reply to Geoff Brown [:gbrown] from comment #23)

> > I think we've seen enough issues pop
> > up with the mobile cache in general that we're not ready to ship it with or
> > without compression.
> 
> I hadn't heard about this (lately)...which bugs are tracking the problem(s)?

Perhaps Taras can elaborate, it looks like we have start-up time regressions from the cache.
Comment 27 (dormant account) 2012-01-09 15:30:01 PST
(In reply to Josh Aas (Mozilla Corporation) from comment #26)
> (In reply to Geoff Brown [:gbrown] from comment #23)
> 
> Perhaps Taras can elaborate, it looks like we have start-up time regressions
> from the cache.

NETWORK_DISK_CACHE_OPEN is expensive on mobile.

Various histograms that measure time to fetch entry from disk are also on the expensive side(ie HTTP_PAGE_CACHE_READ_TIME).
HTTP_OPEN_TO_FIRST_FROM_CACHE is not obviously better than HTTP_OPEN_TO_FIRST_FROM_RECEIVED...infact the median time is worse.
Comment 28 Bjarne (:bjarne) 2012-01-15 12:15:54 PST
Created attachment 588762 [details] [diff] [review]
Disables compression

No objections to comment #25 so I follow Geoffs pointers in comment #20 and comment #23. This patch disables cache-entry compression by setting the relevant pref to 0.

I am not sure who should review this but the code-change is trivial. Requesting review by Josh - feel free to fwd it to someone else more appropriate.
Comment 29 Bjarne (:bjarne) 2012-01-15 14:58:12 PST
Requesting check-in
Comment 31 Phil Ringnalda (:philor) 2012-01-15 18:48:06 PST
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/0fe6641a2076 because of the fascinating result that we failed tests all over the place in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&onlyunstarred=1&rev=bccc0adeebd7
Comment 32 Bjarne (:bjarne) 2012-01-16 02:54:27 PST
Weird...  afaics no tests were modified as part of bug #648429...

Geoff - any idea off the top of your head? I'll look into this a bit later today but any hint upfront might be helpful.
Comment 33 Bjarne (:bjarne) 2012-01-16 03:45:33 PST
I had a sudden insight and believe I see the problem: nsHttpChannel::InstallCacheListener() sets the compression-marker (metadata "uncompressed-len") unconditionally to 0. nsCacheEntryDescriptor::OpenOutputStream() only applies compression if the desired compression-level is > 0, hence the value of "uncompressed-len" will remain 0 and the cache believes the entry is compressed. IMO InstallCacheListener() should not add the marker if the desired compression-level is < 1.

Geoff - do you have other views on this subject and want to handle it yourself? I can add the simple fix suggested above to the patch for this bug, but it may be useful to spin it off and add a test to check for the special case of desired compression-level = 0 (or in fact for various compression-levels, imo).
Comment 34 Josh Aas 2012-01-16 08:28:30 PST
Just to be sure... Is there any chance the test machines might use cache from a previous run in an unpatched browser? That would have them loading compressed data from the cache in the patched (no-compression) browser, which could be fixed with a clobber.
Comment 35 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-16 10:05:33 PST
(In reply to Bjarne (:bjarne) from comment #32)
> Weird...  afaics no tests were modified as part of bug #648429...

Bug #648429 did add a new xpcshell test: test_compressappend.js. But I think that still works with compression disabled, and it certainly doesn't account for the mass failure.
Comment 36 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-16 10:11:46 PST
(In reply to Bjarne (:bjarne) from comment #33)
> I had a sudden insight and believe I see the problem:
> nsHttpChannel::InstallCacheListener() sets the compression-marker (metadata
> "uncompressed-len") unconditionally to 0.
> nsCacheEntryDescriptor::OpenOutputStream() only applies compression if the
> desired compression-level is > 0, hence the value of "uncompressed-len" will
> remain 0 and the cache believes the entry is compressed. 

I'm not sure that I see the problem exactly. OpenOutputStream will not apply compression, and uncompressed-len will be present and 0. I agree this seems inappropriate but where does this cause a problem? When the item is decompressed, a decompressing wrapper will be selected, but I thought that the decompressing stream would handle this case (return the uncompressed data as-is). I'll check on that...
Comment 37 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-16 11:19:51 PST
Created attachment 588942 [details] [diff] [review]
patch to disable compression v1.1

I guess I was optimistic about the powers of the decompressing stream! I verified that some netwerk xpcshell tests failed with the original patch, then updated the patch to clear the uncompressed-len header when compression is disabled and an entry is being written -- xpcshell tests started to pass reliably. 

I would prefer to not create the uncompressed-len header in the first place, but accessing nsCacheService::CacheCompressionLevel from nsHttpChannel seems non-trivial.

Bjarne - feel free to go forward with your own patch if you have something in progress, or review this one -- whichever you prefer.
Comment 38 Bjarne (:bjarne) 2012-01-16 13:54:08 PST
Comment on attachment 588942 [details] [diff] [review]
patch to disable compression v1.1

Review of attachment 588942 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Geoff Brown [:gbrown] from comment #37)
>  accessing nsCacheService::CacheCompressionLevel from nsHttpChannel seems
> non-trivial

Fair point...  that would require changing one or another public API, which seems rather unnecessary for such a trivial purpose.

r=me but please add a comment in the added if-block explaining why it's there and refer to this bug. I would also like a test, but given the avalanche of errors we see without your fix I guess we can say that getting rid of these in itself is a test...  :)

(You should probably put yourself as the author of the patch, btw.)
Comment 39 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-16 14:07:19 PST
Try: https://tbpl.mozilla.org/?tree=Try&rev=3c7316b8b08b
Comment 40 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-16 14:36:52 PST
Created attachment 589024 [details] [diff] [review]
patch to disable compression v1.1

Added comment and updated patch author.

r=bjarne carried.
Comment 42 Joe Drew (not getting mail) 2012-01-17 17:55:47 PST
https://hg.mozilla.org/mozilla-central/rev/b41beba4c7bb
Comment 43 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-18 10:06:18 PST
Comment on attachment 589024 [details] [diff] [review]
patch to disable compression v1.1

[Approval Request Comment]
Regression caused by (bug #): 648429
User impact if declined: Bug 648429 introduced "cache compression". If entries are compressed in the network disk cache and the browser is subsequently downgraded to a version which does not support cache compression, the compressed entries may cause pages to be displayed incorrectly. Cache compression is also implicated in bugs 712032 (intermittent test failures) and 712277 (crashes on Windows).
Testing completed (on m-c, etc.): on m-c now
Risk to taking this patch (and alternatives if risky): Low
Comment 44 Alex Keybl [:akeybl] 2012-01-22 17:19:04 PST
Comment on attachment 589024 [details] [diff] [review]
patch to disable compression v1.1

[Triage Comment]
Allows people to switch between FF versions without issue - approved for Aurora.
Comment 45 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-01-25 09:40:53 PST
Can someone push to Aurora for me?
Comment 46 Michal Novotny (:michal) 2012-01-25 10:02:16 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/ad9819c41d54
Comment 47 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-01 12:55:07 PST
Unable to reproduce this anymore in Firefox 11.0b5. Please reopen if you can.

Note You need to log in before you can comment on or make changes to this bug.