Closed Bug 355567 Opened 18 years ago Closed 16 years ago

Firefox stores corrupted version of cached JavaScript file (merges two files together)

Categories

(Core :: Networking: Cache, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1b1

People

(Reporter: neverlast01, Assigned: michal)

References

Details

(Keywords: fixed1.9.0.4, verified1.8.1.18)

Attachments

(7 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7

Problem occurs under a specific circumstance but could be an indication of a more serious buffer or cache management problem.  First discovered error after encountering what appeared to be inconsistent JavaScript errors.  Turns out, the problem was reproducible by loading webpages in a specific order that included specific JavaScript files.  The name and size of the JavaScript files seem to affect reproducibility.  The JavaScript error was due to a garbaged version of one of the JavaScript files.  Clicking on the JavaScript error showed the JavaScript source code which was actually two JavaScript files partially merged.

Reproducible: Always

Steps to Reproduce:
Extract attached reproducible test case and then follow these steps:

   1.  Load Page
   2. Clear Private Data
   3. Reload content.html Frame
   4. Reload navTree.html Frame
   5. Reload navTree.html Frame
   6. navTree.html Frame will indicate JavaScript ReferenceError which indicates NavTree.js was corrupted

Actual Results:  
After running the test and producing the JavaScript error and then viewing the code where the error occured you will see that the code that Firefox shows is actually TabPane.js and NavTree.js partially merged together which produces a syntax error.

Expected Results:  
TabPane.js and NavTree.js should not be corrupted.
Extract the contents of this ZIP file and then open "index.html" in web browser.  Follow steps indicated on webpage.  After completing steps a JavaScript error will be thrown.  There will also be another error in the JavaScript console that indicates a "syntax error" occured.  When you click on the syntax error entry in the JavaScript Console you will see that the JavaScript file NavTree.js was corrupted and actually contains portions of TabPane.js.
Comment on attachment 241356 [details]
Reproducible test case (demonstration web pages)

Extract the contents of this ZIP file and then open "index.html" in web
browser.  Follow steps indicated on webpage.  After completing steps a
JavaScript error will be thrown.  Refresh the entire page.
There will also be another error in the
JavaScript console that indicates a "syntax error" occured.  When you click on
the syntax error entry in the JavaScript Console you will see that the
JavaScript file NavTree.js was corrupted and actually contains portions of
TabPane.js.
Changing to critical state because this bug seems to be a problem with managing buffers and corrupting cached versions of files.
Severity: normal → critical
I found this bug "in the wild" - one browser profile not being able to render while another profile in the same browser could (for the same public site). Clearing the browser cache fixes the issue.

I would say that this has been independently confirmed.
I also found this "In the wild" with a DokuWiki installation with Firefox 2.0.3 on Windows XP. The javascript is garbled and when I clear the cache and reload the page, the Javascript is ok.
This file shows the bug in a single file (although it links to external javascript). Instructions are included, but it just involves refreshing the page.

Any idea how to work around this issue?
Forgot to add:

I've only tested the above file on Firefox 2.0.0.14, Windows XP. Managed to recreate on several machines.
Please test it on FF3 as soon as possible.  It will be easier to fix in the new version while every is still paying attention to it.
Test passes in FF3, it isn't affected by this bug.

However, I'm still interested in knowing the cause of this issue so it can be avoided in FF1 & FF2, I work with sites that have a significant number of users with those versions.

I'm surprised this issue remained unfixed for so long, javascript from one file leaking into another is a potential security issue. Also, it could be possible that content from other files could leak into js files.

Depending on how the cache works, it may be possible to embed and execute code hidden in image files.
CCing a couple of networking/cache people.

Gerv
So.... The testcase from comment 1 doesn't reproduce the problem over here.

I tried using the testcase from comment 7, and as far as I can tell the 2007-01-01-01 build fails and the 2007-07-01-01 build passes.  Getting that information took me about 40 minutes (the testcase seems to be _really_ slow to load).

I'd really appreciate someone hunting down a one-day range for trunk when this got fixed.  That would be an excellent starting point for getting this fixed on branches.
Keywords: qawanted
Status: UNCONFIRMED → NEW
Ever confirmed: true
I tested with the testcase from comment 7. I get a fix range between 2008-02-28 and 208-03-01:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-02-28+04&maxdate=2008-03-01+06&cvsroot=%2Fcvsroot
(Note that between the different builds, I had to clear the cache, to get reliable results)

In a debug 1.8.1 build I get this assertion on first load:
###!!! ASSERTION: FindEntry() called for a bound entry.: '!binding', file c:/moz
illa181/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp, line 458
++DOMWINDOW == 12
I guess this might be a useless assert as mentioned in bug 303003.
And this assertion on reload:
###!!! ASSERTION: unexpected progress values: 'progress <= progressMax', file c:
/mozilla181/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp, line 4199
JavaScript error: http://www.bbc.co.uk/glow/0.4.0/dist/widgets/widgets.debug.js,
 line 1498: missing ; before statement
There are 2 bugs open for this assertion: bug 266532 and bug 410558.
I don't see anything cache-related in that range...  and none of the necko-related things seem relevant.
Ugh, I'm really sorry about this, The fix range is between 2007-02-28 and 2007-03-01:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-02-28+04&maxdate=2007-03-01+06&cvsroot=%2Fcvsroot
The fix range in comment 13 was using the year 2008, it has to be the year 2007 (I just rechecked the fix range again).
Hmm.  OK, that matches my range, but the only thing in there that looks related is bug 371576, which also landed on branch (albeit a smaller patch).  And that fix would have just changed timing, I suspect, not addressed underlying cache issues...

Michal, do you think you can take a look at what cache is doing here?
Attached file Bug Workaround
I've found a workaround which may help track down what causes it (details in the file).

Just in time actually, it's started to affect not only the debug versions of our library, but also the latest version of our compressed for-live version.
The basic load sequence during the reload looks like this:

1)  GET for glow.debug.js.  We make a If-Modified-Since request, get back a
    304.  We read the data from cache, reading 138045 bytes (the same as the 
    Content-Length the first time around).
2)  GET for effects.debug.js.  We do NOT send If-Modified-Since, get an HTTP 200
    response just like on the first load.  The size is 73827 bytes (the size of
    the first response for this file).
3)  GET for widgets.debug.js.  We make a If-Modified-Since request, get back a
    304.  We read the data from cache, reading 49152 bytes.  But the original
    HTTP 200 response only had a Content-Length (and data size) of 48511.

So that seems to be the bug.  I would bet that on trunk some timing issue or something is covering this up, but the basic bug is still there....
Doh, I had noticed that too, thought we'd done something wrong server-side with our cache-control headers.

Interestingly, even though we're using cache-control & max-age, Firefox 3 always makes a request for widgets.js, but it gets the other two files from its cache. May be an unrelated bug.
Component: General → Networking: Cache
Product: Firefox → Core
QA Contact: general → networking.cache
Moving to correct component based on bzs debugging and giving it some priority as this sounds really scary.
Flags: wanted1.9.1+
Priority: -- → P1
(In reply to comment #16)
> Michal, do you think you can take a look at what cache is doing here?

I'll try to investigate it.
Assignee: nobody → michal
The problem is again with the collisions. Collisions for disk cache are detected in nsDiskCacheDevice::FindEntry() at line http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#471. But if the first entry with the same hash hasn't been yet written to the disk, then FindEntry() returns null at http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#468, new ActiveEntry is then created for the second colliding entry and data of the first cached document is later rewritten. Also the check at http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#458 throws assertion for the second entry, since it finds wrong binding to the first entry.

The bug can be reproduced only in 1.8 branch because only here is second entry being opened before the first has been written to the disk. But the problematic code is in mercurial and CVS trunk too. Attached patch fixes the problem for 1.8 branch.
Attachment #340283 - Flags: review?(bzbarsky)
Comment on attachment 340283 [details] [diff] [review]
fix v1 for 1.8 branch

Nice...  This should also apply to 1.9 and trunk, I would think, right?
Attachment #340283 - Flags: superreview+
Attachment #340283 - Flags: review?(bzbarsky)
Attachment #340283 - Flags: review+
Attachment #340283 - Flags: approval1.8.1.18?
(In reply to comment #23)
> Nice...  This should also apply to 1.9 and trunk, I would think, right?

There was a tiny change in the assertion... This one is for 1.9 and trunk.
Attachment #340531 - Flags: approval1.9.0.4?
Keywords: checkin-needed
In unpatched versions of firefox, how can this issue be prevented? What is it about the structure of the files in the test case that triggers the bug?
(In reply to comment #25)
> In unpatched versions of firefox, how can this issue be prevented? What is it
> about the structure of the files in the test case that triggers the bug?

Create structure of web pages so that there aren't collisions. See https://bugzilla.mozilla.org/show_bug.cgi?id=290032#c5
Don't suppose there's a rule an idiot clientslide developer like me could understand? (not enough low level programming knowledge in this brain)
The basic problem is if you have filenames that are pretty similar and our cache doesn't handle that well.  In this case, effects and widgets only differ in this part:

  effects/effects.
  widgets/widgets.

And when we compute the hash of the URI, those both compute to 0, because we xor things that are 4 chars apart, and the different parts repeat exactly 8 chars apart.

In other words, changing the filename will help most immediately.  

If changing filenames is a non-starter, as is appending a query to one filename or the other, you might be able to stick a no-op external script in between the two in hopes of avoiding the "opening an entry when the previous one has not been written to disk" scenario.  Not sure whether that would work.
Pushed changeset 91327a79cb90 to mozilla-central.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Boris: That's excellent, thanks

And thanks to all for looking into this
Attachment #340531 - Attachment description: patch for 1.9 and TRUNK → patch for 1.9 and TRUNK [Checkin: Comment 29]
OS: Windows XP → All
Hardware: PC → All
Target Milestone: --- → mozilla1.9.1b1
Version: unspecified → Trunk
Comment on attachment 340283 [details] [diff] [review]
fix v1 for 1.8 branch

Approved for 1.8.1.18, a=dveditz for release-drivers
Attachment #340283 - Flags: approval1.8.1.18? → approval1.8.1.18+
Comment on attachment 340531 [details] [diff] [review]
patch for 1.9 and TRUNK
[Checkin: Comment 29]

Approved for 1.9.0.4, a=dveditz for release-drivers
Attachment #340531 - Flags: approval1.9.0.4? → approval1.9.0.4+
Shouldn't we put together a testcase for this?
Keywords: testcase-wanted
Flags: in-testsuite?
(In reply to comment #33)
> Shouldn't we put together a testcase for this?

That's not so easy. Such testcase must use two different URLs that have same hash. But hash function for disk cache will be changed as soon as I get review+ for patch #335821 in bug #290032. Then it won't be trivial to find such URLs.
Fixed on both branches.
Verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18pre) Gecko/2008102103 BonEcho/2.0.0.18pre for 1.8.1.18.
When I try to verify this for 1.9.0.4, I see that the test case passes every time with 3.0.3.
(In reply to comment #37)
> When I try to verify this for 1.9.0.4, I see that the test case passes every
> time with 3.0.3.

The bug can be consistently reproduced only on 1.8 branch. See end of comment #22.
I've seen this bug occur in Firefox 3 recently. Sometimes the 2nd file contains a portion of the first, sometimes the 1st file comes back empty. But no, I haven't been able to reproduce it consistently.
To compile: g++ -o ffgenhash ffgenhash.cpp
First let me state that this bug still happens with users using Firefox 3.0.8

This bug was a major issue for my site: http://worldofsolitaire.com

I worked around it a long time ago by using a conditional rule in an .htaccess file that would disable ALL caching of images on the site for Firefox users. This was a horrible thing to need to do, but at the time I couldn't track down the bug within Firefox and having the site be slightly slower is better than showing duplicate/corrupted images.

I changed the conditional on April 19th 2009 (yesterday) to only disable caching for Firefox 2 users.

A few hours later I've received over 10 e-mails from Firefox 3 users (confirmed) that they were seeing duplicate images. So this issue is STILL a problem in Firefox 3.

I decided to create a simple Linux test program that would allow me to check URL's to see if they are generating the same cache hash keys.

To compile in any Linux system: g++ -o ffgenhash ffgenhash.cpp

The code is uploaded in Attachment #373643 [details]

As you can see, here are two real life URL's that generate the same cache hash key:

./ffgenhash "http://worldofsolitaire.com/decks/paris/5/12c.png"
1087949033
./ffgenhash "http://worldofsolitaire.com/decks/paris/5/13s.png"
1087949033

Since I pre-load these images in a Javascript loop, trying to use some sort of empty <script> tag workaround is not possible here.

Indeed I think my only real solution is to modify the URL's for Firefox users in some way to generate a unique cache hash key. So that's the approach I'll use.

By the way, I'm half tempted to create a Firebug addon that will check all resources loaded by a site and give a big error if two resources on the site share a common hash key so the developer is aware. It would be great to run sites like Google maps through this as I've seen weird things with those images over the past few years :)
Robert, it's odd that you're seeing this with Firefox 3, since this was fixed in 3.0.4.  Is it possible that you're seeing some other similar) issue?

Note that this bugfix made the cache handle the hash collision correctly, not prevent it from happening at all, so just seeing a hash collision doesn't mean that much...
Robert, just to be clear, what you should do is file a new bug, with clear steps to reproduce on your site that show the bug in 3.0.8.
(In reply to comment #43)
> Robert, just to be clear, what you should do is file a new bug, with clear
> steps to reproduce on your site that show the bug in 3.0.8.

Will do.

First however I'm gonna implement a naming workaround so that users  still using Firefox < 3.0.4 will work as several Linux distro's still default to Firefox 2.

After that when I have some time to create a test case, I'll create a new bug :)
You need to log in before you can comment on or make changes to this bug.