Last Comment Bug 355567 - Firefox stores corrupted version of cached JavaScript file (merges two files together)
: Firefox stores corrupted version of cached JavaScript file (merges two files ...
Status: RESOLVED FIXED
: fixed1.9.0.4, verified1.8.1.18
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: All All
: P1 critical with 4 votes (vote)
: mozilla1.9.1b1
Assigned To: Michal Novotny (:michal)
:
Mentors:
: 367205 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-05 12:57 PDT by Phillip
Modified: 2009-04-20 09:25 PDT (History)
21 users (show)
jonas: wanted1.9.1+
samuel.sidler+old: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Reproducible test case (demonstration web pages) (2.41 KB, application/zip)
2006-10-05 13:02 PDT, Phillip
no flags Details
Single file bug demonstration (2.08 KB, text/html)
2008-06-11 13:48 PDT, Jake Archibald
no flags Details
Bug Workaround (2.62 KB, text/html)
2008-09-17 03:39 PDT, Jake Archibald
no flags Details
HTTP log of the two loads (first success, then failure) in Firefox 2 (273.08 KB, text/plain)
2008-09-17 06:57 PDT, Boris Zbarsky [:bz]
no flags Details
fix v1 for 1.8 branch (1.39 KB, patch)
2008-09-24 20:32 PDT, Michal Novotny (:michal)
bzbarsky: review+
bzbarsky: superreview+
dveditz: approval1.8.1.18+
Details | Diff | Review
patch for 1.9 and TRUNK [Checkin: Comment 29] (1.47 KB, patch)
2008-09-26 01:57 PDT, Michal Novotny (:michal)
dveditz: approval1.9.0.4+
Details | Diff | Review
simple program to generate a hash the same way firefox does (476 bytes, text/x-c++src)
2009-04-20 06:53 PDT, Robert Schultz
no flags Details

Description Phillip 2006-10-05 12:57:00 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7

Problem occurs under a specific circumstance but could be an indication of a more serious buffer or cache management problem.  First discovered error after encountering what appeared to be inconsistent JavaScript errors.  Turns out, the problem was reproducible by loading webpages in a specific order that included specific JavaScript files.  The name and size of the JavaScript files seem to affect reproducibility.  The JavaScript error was due to a garbaged version of one of the JavaScript files.  Clicking on the JavaScript error showed the JavaScript source code which was actually two JavaScript files partially merged.

Reproducible: Always

Steps to Reproduce:
Extract attached reproducible test case and then follow these steps:

   1.  Load Page
   2. Clear Private Data
   3. Reload content.html Frame
   4. Reload navTree.html Frame
   5. Reload navTree.html Frame
   6. navTree.html Frame will indicate JavaScript ReferenceError which indicates NavTree.js was corrupted

Actual Results:  
After running the test and producing the JavaScript error and then viewing the code where the error occured you will see that the code that Firefox shows is actually TabPane.js and NavTree.js partially merged together which produces a syntax error.

Expected Results:  
TabPane.js and NavTree.js should not be corrupted.
Comment 1 Phillip 2006-10-05 13:02:14 PDT
Created attachment 241356 [details]
Reproducible test case (demonstration web pages)

Extract the contents of this ZIP file and then open "index.html" in web browser.  Follow steps indicated on webpage.  After completing steps a JavaScript error will be thrown.  There will also be another error in the JavaScript console that indicates a "syntax error" occured.  When you click on the syntax error entry in the JavaScript Console you will see that the JavaScript file NavTree.js was corrupted and actually contains portions of TabPane.js.
Comment 2 Phillip 2006-10-05 13:05:22 PDT
Comment on attachment 241356 [details]
Reproducible test case (demonstration web pages)

Extract the contents of this ZIP file and then open "index.html" in web
browser.  Follow steps indicated on webpage.  After completing steps a
JavaScript error will be thrown.  Refresh the entire page.
There will also be another error in the
JavaScript console that indicates a "syntax error" occured.  When you click on
the syntax error entry in the JavaScript Console you will see that the
JavaScript file NavTree.js was corrupted and actually contains portions of
TabPane.js.
Comment 3 Phillip 2006-10-09 08:01:10 PDT
Changing to critical state because this bug seems to be a problem with managing buffers and corrupting cached versions of files.
Comment 4 Mike Park 2007-01-16 19:23:52 PST
*** Bug 367205 has been marked as a duplicate of this bug. ***
Comment 5 Mike Park 2007-01-16 19:26:53 PST
I found this bug "in the wild" - one browser profile not being able to render while another profile in the same browser could (for the same public site). Clearing the browser cache fixes the issue.

I would say that this has been independently confirmed.
Comment 6 strauchdieb 2007-03-26 03:09:18 PDT
I also found this "In the wild" with a DokuWiki installation with Firefox 2.0.3 on Windows XP. The javascript is garbled and when I clear the cache and reload the page, the Javascript is ok.
Comment 7 Jake Archibald 2008-06-11 13:48:10 PDT
Created attachment 324673 [details]
Single file bug demonstration

This file shows the bug in a single file (although it links to external javascript). Instructions are included, but it just involves refreshing the page.

Any idea how to work around this issue?
Comment 8 Jake Archibald 2008-06-11 13:57:01 PDT
Forgot to add:

I've only tested the above file on Firefox 2.0.0.14, Windows XP. Managed to recreate on several machines.
Comment 9 John J. Barton 2008-06-11 16:53:12 PDT
Please test it on FF3 as soon as possible.  It will be easier to fix in the new version while every is still paying attention to it.
Comment 10 Jake Archibald 2008-06-12 01:24:05 PDT
Test passes in FF3, it isn't affected by this bug.

However, I'm still interested in knowing the cause of this issue so it can be avoided in FF1 & FF2, I work with sites that have a significant number of users with those versions.

I'm surprised this issue remained unfixed for so long, javascript from one file leaking into another is a potential security issue. Also, it could be possible that content from other files could leak into js files.

Depending on how the cache works, it may be possible to embed and execute code hidden in image files.
Comment 11 Gervase Markham [:gerv] 2008-09-11 08:35:39 PDT
CCing a couple of networking/cache people.

Gerv
Comment 12 Boris Zbarsky [:bz] 2008-09-11 12:28:48 PDT
So.... The testcase from comment 1 doesn't reproduce the problem over here.

I tried using the testcase from comment 7, and as far as I can tell the 2007-01-01-01 build fails and the 2007-07-01-01 build passes.  Getting that information took me about 40 minutes (the testcase seems to be _really_ slow to load).

I'd really appreciate someone hunting down a one-day range for trunk when this got fixed.  That would be an excellent starting point for getting this fixed on branches.
Comment 13 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2008-09-15 03:21:13 PDT
I tested with the testcase from comment 7. I get a fix range between 2008-02-28 and 208-03-01:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-02-28+04&maxdate=2008-03-01+06&cvsroot=%2Fcvsroot
(Note that between the different builds, I had to clear the cache, to get reliable results)

In a debug 1.8.1 build I get this assertion on first load:
###!!! ASSERTION: FindEntry() called for a bound entry.: '!binding', file c:/moz
illa181/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp, line 458
++DOMWINDOW == 12
I guess this might be a useless assert as mentioned in bug 303003.
And this assertion on reload:
###!!! ASSERTION: unexpected progress values: 'progress <= progressMax', file c:
/mozilla181/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp, line 4199
JavaScript error: http://www.bbc.co.uk/glow/0.4.0/dist/widgets/widgets.debug.js,
 line 1498: missing ; before statement
There are 2 bugs open for this assertion: bug 266532 and bug 410558.
Comment 14 Boris Zbarsky [:bz] 2008-09-15 11:28:25 PDT
I don't see anything cache-related in that range...  and none of the necko-related things seem relevant.
Comment 15 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2008-09-15 12:56:21 PDT
Ugh, I'm really sorry about this, The fix range is between 2007-02-28 and 2007-03-01:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-02-28+04&maxdate=2007-03-01+06&cvsroot=%2Fcvsroot
The fix range in comment 13 was using the year 2008, it has to be the year 2007 (I just rechecked the fix range again).
Comment 16 Boris Zbarsky [:bz] 2008-09-15 13:28:56 PDT
Hmm.  OK, that matches my range, but the only thing in there that looks related is bug 371576, which also landed on branch (albeit a smaller patch).  And that fix would have just changed timing, I suspect, not addressed underlying cache issues...

Michal, do you think you can take a look at what cache is doing here?
Comment 17 Jake Archibald 2008-09-17 03:39:24 PDT
Created attachment 339035 [details]
Bug Workaround

I've found a workaround which may help track down what causes it (details in the file).

Just in time actually, it's started to affect not only the debug versions of our library, but also the latest version of our compressed for-live version.
Comment 18 Boris Zbarsky [:bz] 2008-09-17 06:57:16 PDT
Created attachment 339058 [details]
HTTP log of the two loads (first success, then failure) in Firefox 2

The basic load sequence during the reload looks like this:

1)  GET for glow.debug.js.  We make a If-Modified-Since request, get back a
    304.  We read the data from cache, reading 138045 bytes (the same as the 
    Content-Length the first time around).
2)  GET for effects.debug.js.  We do NOT send If-Modified-Since, get an HTTP 200
    response just like on the first load.  The size is 73827 bytes (the size of
    the first response for this file).
3)  GET for widgets.debug.js.  We make a If-Modified-Since request, get back a
    304.  We read the data from cache, reading 49152 bytes.  But the original
    HTTP 200 response only had a Content-Length (and data size) of 48511.

So that seems to be the bug.  I would bet that on trunk some timing issue or something is covering this up, but the basic bug is still there....
Comment 19 Jake Archibald 2008-09-17 08:01:09 PDT
Doh, I had noticed that too, thought we'd done something wrong server-side with our cache-control headers.

Interestingly, even though we're using cache-control & max-age, Firefox 3 always makes a request for widgets.js, but it gets the other two files from its cache. May be an unrelated bug.
Comment 20 Jonas Sicking (:sicking) 2008-09-17 08:03:03 PDT
Moving to correct component based on bzs debugging and giving it some priority as this sounds really scary.
Comment 21 Michal Novotny (:michal) 2008-09-22 12:41:11 PDT
(In reply to comment #16)
> Michal, do you think you can take a look at what cache is doing here?

I'll try to investigate it.
Comment 22 Michal Novotny (:michal) 2008-09-24 20:32:25 PDT
Created attachment 340283 [details] [diff] [review]
fix v1 for 1.8 branch

The problem is again with the collisions. Collisions for disk cache are detected in nsDiskCacheDevice::FindEntry() at line http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#471. But if the first entry with the same hash hasn't been yet written to the disk, then FindEntry() returns null at http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#468, new ActiveEntry is then created for the second colliding entry and data of the first cached document is later rewritten. Also the check at http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#458 throws assertion for the second entry, since it finds wrong binding to the first entry.

The bug can be reproduced only in 1.8 branch because only here is second entry being opened before the first has been written to the disk. But the problematic code is in mercurial and CVS trunk too. Attached patch fixes the problem for 1.8 branch.
Comment 23 Boris Zbarsky [:bz] 2008-09-25 20:18:01 PDT
Comment on attachment 340283 [details] [diff] [review]
fix v1 for 1.8 branch

Nice...  This should also apply to 1.9 and trunk, I would think, right?
Comment 24 Michal Novotny (:michal) 2008-09-26 01:57:26 PDT
Created attachment 340531 [details] [diff] [review]
patch for 1.9 and TRUNK
[Checkin: Comment 29]

(In reply to comment #23)
> Nice...  This should also apply to 1.9 and trunk, I would think, right?

There was a tiny change in the assertion... This one is for 1.9 and trunk.
Comment 25 Jake Archibald 2008-09-26 02:45:53 PDT
In unpatched versions of firefox, how can this issue be prevented? What is it about the structure of the files in the test case that triggers the bug?
Comment 26 Michal Novotny (:michal) 2008-09-26 03:31:34 PDT
(In reply to comment #25)
> In unpatched versions of firefox, how can this issue be prevented? What is it
> about the structure of the files in the test case that triggers the bug?

Create structure of web pages so that there aren't collisions. See https://bugzilla.mozilla.org/show_bug.cgi?id=290032#c5
Comment 27 Jake Archibald 2008-09-26 06:19:02 PDT
Don't suppose there's a rule an idiot clientslide developer like me could understand? (not enough low level programming knowledge in this brain)
Comment 28 Boris Zbarsky [:bz] 2008-09-26 06:29:57 PDT
The basic problem is if you have filenames that are pretty similar and our cache doesn't handle that well.  In this case, effects and widgets only differ in this part:

  effects/effects.
  widgets/widgets.

And when we compute the hash of the URI, those both compute to 0, because we xor things that are 4 chars apart, and the different parts repeat exactly 8 chars apart.

In other words, changing the filename will help most immediately.  

If changing filenames is a non-starter, as is appending a query to one filename or the other, you might be able to stick a no-op external script in between the two in hopes of avoiding the "opening an entry when the previous one has not been written to disk" scenario.  Not sure whether that would work.
Comment 29 Boris Zbarsky [:bz] 2008-09-26 06:34:16 PDT
Pushed changeset 91327a79cb90 to mozilla-central.
Comment 30 Jake Archibald 2008-09-26 06:47:48 PDT
Boris: That's excellent, thanks

And thanks to all for looking into this
Comment 31 Daniel Veditz [:dveditz] 2008-10-01 15:02:28 PDT
Comment on attachment 340283 [details] [diff] [review]
fix v1 for 1.8 branch

Approved for 1.8.1.18, a=dveditz for release-drivers
Comment 32 Daniel Veditz [:dveditz] 2008-10-01 15:03:40 PDT
Comment on attachment 340531 [details] [diff] [review]
patch for 1.9 and TRUNK
[Checkin: Comment 29]

Approved for 1.9.0.4, a=dveditz for release-drivers
Comment 33 Daniel Veditz [:dveditz] 2008-10-01 15:04:23 PDT
Shouldn't we put together a testcase for this?
Comment 34 Michal Novotny (:michal) 2008-10-03 17:25:19 PDT
(In reply to comment #33)
> Shouldn't we put together a testcase for this?

That's not so easy. Such testcase must use two different URLs that have same hash. But hash function for disk cache will be changed as soon as I get review+ for patch #335821 in bug #290032. Then it won't be trivial to find such URLs.
Comment 35 Boris Zbarsky [:bz] 2008-10-07 08:41:21 PDT
Fixed on both branches.
Comment 36 Al Billings [:abillings] 2008-10-21 14:21:13 PDT
Verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18pre) Gecko/2008102103 BonEcho/2.0.0.18pre for 1.8.1.18.
Comment 37 Al Billings [:abillings] 2008-10-21 14:29:20 PDT
When I try to verify this for 1.9.0.4, I see that the test case passes every time with 3.0.3.
Comment 38 Michal Novotny (:michal) 2008-10-21 14:56:54 PDT
(In reply to comment #37)
> When I try to verify this for 1.9.0.4, I see that the test case passes every
> time with 3.0.3.

The bug can be consistently reproduced only on 1.8 branch. See end of comment #22.
Comment 39 Jake Archibald 2008-10-21 15:28:41 PDT
I've seen this bug occur in Firefox 3 recently. Sometimes the 2nd file contains a portion of the first, sometimes the 1st file comes back empty. But no, I haven't been able to reproduce it consistently.
Comment 40 Robert Schultz 2009-04-20 06:53:46 PDT
Created attachment 373643 [details]
simple program to generate a hash the same way firefox does

To compile: g++ -o ffgenhash ffgenhash.cpp
Comment 41 Robert Schultz 2009-04-20 06:55:36 PDT
First let me state that this bug still happens with users using Firefox 3.0.8

This bug was a major issue for my site: http://worldofsolitaire.com

I worked around it a long time ago by using a conditional rule in an .htaccess file that would disable ALL caching of images on the site for Firefox users. This was a horrible thing to need to do, but at the time I couldn't track down the bug within Firefox and having the site be slightly slower is better than showing duplicate/corrupted images.

I changed the conditional on April 19th 2009 (yesterday) to only disable caching for Firefox 2 users.

A few hours later I've received over 10 e-mails from Firefox 3 users (confirmed) that they were seeing duplicate images. So this issue is STILL a problem in Firefox 3.

I decided to create a simple Linux test program that would allow me to check URL's to see if they are generating the same cache hash keys.

To compile in any Linux system: g++ -o ffgenhash ffgenhash.cpp

The code is uploaded in Attachment #373643 [details]

As you can see, here are two real life URL's that generate the same cache hash key:

./ffgenhash "http://worldofsolitaire.com/decks/paris/5/12c.png"
1087949033
./ffgenhash "http://worldofsolitaire.com/decks/paris/5/13s.png"
1087949033

Since I pre-load these images in a Javascript loop, trying to use some sort of empty <script> tag workaround is not possible here.

Indeed I think my only real solution is to modify the URL's for Firefox users in some way to generate a unique cache hash key. So that's the approach I'll use.

By the way, I'm half tempted to create a Firebug addon that will check all resources loaded by a site and give a big error if two resources on the site share a common hash key so the developer is aware. It would be great to run sites like Google maps through this as I've seen weird things with those images over the past few years :)
Comment 42 Boris Zbarsky [:bz] 2009-04-20 09:18:03 PDT
Robert, it's odd that you're seeing this with Firefox 3, since this was fixed in 3.0.4.  Is it possible that you're seeing some other similar) issue?

Note that this bugfix made the cache handle the hash collision correctly, not prevent it from happening at all, so just seeing a hash collision doesn't mean that much...
Comment 43 Boris Zbarsky [:bz] 2009-04-20 09:18:51 PDT
Robert, just to be clear, what you should do is file a new bug, with clear steps to reproduce on your site that show the bug in 3.0.8.
Comment 44 Robert Schultz 2009-04-20 09:25:58 PDT
(In reply to comment #43)
> Robert, just to be clear, what you should do is file a new bug, with clear
> steps to reproduce on your site that show the bug in 3.0.8.

Will do.

First however I'm gonna implement a naming workaround so that users  still using Firefox < 3.0.4 will work as several Linux distro's still default to Firefox 2.

After that when I have some time to create a test case, I'll create a new bug :)

Note You need to log in before you can comment on or make changes to this bug.