Closed Bug 355567 Opened 14 years ago Closed 12 years ago
Firefox stores corrupted version of cached Java
Script file (merges two files together)
2.41 KB, application/zip
2.08 KB, text/html
2.62 KB, text/html
273.08 KB, text/plain
1.39 KB, patch
|Details | Diff | Splinter Review|
1.47 KB, patch
|Details | Diff | Splinter Review|
476 bytes, text/x-c++src
Changing to critical state because this bug seems to be a problem with managing buffers and corrupting cached versions of files.
Severity: normal → critical
I found this bug "in the wild" - one browser profile not being able to render while another profile in the same browser could (for the same public site). Clearing the browser cache fixes the issue. I would say that this has been independently confirmed.
Forgot to add: I've only tested the above file on Firefox 184.108.40.206, Windows XP. Managed to recreate on several machines.
Please test it on FF3 as soon as possible. It will be easier to fix in the new version while every is still paying attention to it.
CCing a couple of networking/cache people. Gerv
So.... The testcase from comment 1 doesn't reproduce the problem over here. I tried using the testcase from comment 7, and as far as I can tell the 2007-01-01-01 build fails and the 2007-07-01-01 build passes. Getting that information took me about 40 minutes (the testcase seems to be _really_ slow to load). I'd really appreciate someone hunting down a one-day range for trunk when this got fixed. That would be an excellent starting point for getting this fixed on branches.
I don't see anything cache-related in that range... and none of the necko-related things seem relevant.
Ugh, I'm really sorry about this, The fix range is between 2007-02-28 and 2007-03-01: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-02-28+04&maxdate=2007-03-01+06&cvsroot=%2Fcvsroot The fix range in comment 13 was using the year 2008, it has to be the year 2007 (I just rechecked the fix range again).
Hmm. OK, that matches my range, but the only thing in there that looks related is bug 371576, which also landed on branch (albeit a smaller patch). And that fix would have just changed timing, I suspect, not addressed underlying cache issues... Michal, do you think you can take a look at what cache is doing here?
I've found a workaround which may help track down what causes it (details in the file). Just in time actually, it's started to affect not only the debug versions of our library, but also the latest version of our compressed for-live version.
The basic load sequence during the reload looks like this: 1) GET for glow.debug.js. We make a If-Modified-Since request, get back a 304. We read the data from cache, reading 138045 bytes (the same as the Content-Length the first time around). 2) GET for effects.debug.js. We do NOT send If-Modified-Since, get an HTTP 200 response just like on the first load. The size is 73827 bytes (the size of the first response for this file). 3) GET for widgets.debug.js. We make a If-Modified-Since request, get back a 304. We read the data from cache, reading 49152 bytes. But the original HTTP 200 response only had a Content-Length (and data size) of 48511. So that seems to be the bug. I would bet that on trunk some timing issue or something is covering this up, but the basic bug is still there....
Doh, I had noticed that too, thought we'd done something wrong server-side with our cache-control headers. Interestingly, even though we're using cache-control & max-age, Firefox 3 always makes a request for widgets.js, but it gets the other two files from its cache. May be an unrelated bug.
Component: General → Networking: Cache
Product: Firefox → Core
QA Contact: general → networking.cache
Moving to correct component based on bzs debugging and giving it some priority as this sounds really scary.
Priority: -- → P1
(In reply to comment #16) > Michal, do you think you can take a look at what cache is doing here? I'll try to investigate it.
Assignee: nobody → michal
The problem is again with the collisions. Collisions for disk cache are detected in nsDiskCacheDevice::FindEntry() at line http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#471. But if the first entry with the same hash hasn't been yet written to the disk, then FindEntry() returns null at http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#468, new ActiveEntry is then created for the second colliding entry and data of the first cached document is later rewritten. Also the check at http://mxr.mozilla.org/mozilla1.8/source/netwerk/cache/src/nsDiskCacheDevice.cpp#458 throws assertion for the second entry, since it finds wrong binding to the first entry. The bug can be reproduced only in 1.8 branch because only here is second entry being opened before the first has been written to the disk. But the problematic code is in mercurial and CVS trunk too. Attached patch fixes the problem for 1.8 branch.
Comment on attachment 340283 [details] [diff] [review] fix v1 for 1.8 branch Nice... This should also apply to 1.9 and trunk, I would think, right?
(In reply to comment #23) > Nice... This should also apply to 1.9 and trunk, I would think, right? There was a tiny change in the assertion... This one is for 1.9 and trunk.
Attachment #340531 - Flags: approval220.127.116.11?
In unpatched versions of firefox, how can this issue be prevented? What is it about the structure of the files in the test case that triggers the bug?
(In reply to comment #25) > In unpatched versions of firefox, how can this issue be prevented? What is it > about the structure of the files in the test case that triggers the bug? Create structure of web pages so that there aren't collisions. See https://bugzilla.mozilla.org/show_bug.cgi?id=290032#c5
Don't suppose there's a rule an idiot clientslide developer like me could understand? (not enough low level programming knowledge in this brain)
The basic problem is if you have filenames that are pretty similar and our cache doesn't handle that well. In this case, effects and widgets only differ in this part: effects/effects. widgets/widgets. And when we compute the hash of the URI, those both compute to 0, because we xor things that are 4 chars apart, and the different parts repeat exactly 8 chars apart. In other words, changing the filename will help most immediately. If changing filenames is a non-starter, as is appending a query to one filename or the other, you might be able to stick a no-op external script in between the two in hopes of avoiding the "opening an entry when the previous one has not been written to disk" scenario. Not sure whether that would work.
Pushed changeset 91327a79cb90 to mozilla-central.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Boris: That's excellent, thanks And thanks to all for looking into this
Attachment #340531 - Attachment description: patch for 1.9 and TRUNK → patch for 1.9 and TRUNK [Checkin: Comment 29]
Comment on attachment 340283 [details] [diff] [review] fix v1 for 1.8 branch Approved for 18.104.22.168, a=dveditz for release-drivers
Attachment #340283 - Flags: approval22.214.171.124? → approval126.96.36.199+
Comment on attachment 340531 [details] [diff] [review] patch for 1.9 and TRUNK [Checkin: Comment 29] Approved for 188.8.131.52, a=dveditz for release-drivers
Attachment #340531 - Flags: approval184.108.40.206? → approval220.127.116.11+
Shouldn't we put together a testcase for this?
(In reply to comment #33) > Shouldn't we put together a testcase for this? That's not so easy. Such testcase must use two different URLs that have same hash. But hash function for disk cache will be changed as soon as I get review+ for patch #335821 in bug #290032. Then it won't be trivial to find such URLs.
Verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168pre) Gecko/2008102103 BonEcho/22.214.171.124pre for 126.96.36.199.
When I try to verify this for 188.8.131.52, I see that the test case passes every time with 3.0.3.
(In reply to comment #37) > When I try to verify this for 184.108.40.206, I see that the test case passes every > time with 3.0.3. The bug can be consistently reproduced only on 1.8 branch. See end of comment #22.
I've seen this bug occur in Firefox 3 recently. Sometimes the 2nd file contains a portion of the first, sometimes the 1st file comes back empty. But no, I haven't been able to reproduce it consistently.
To compile: g++ -o ffgenhash ffgenhash.cpp
Robert, it's odd that you're seeing this with Firefox 3, since this was fixed in 3.0.4. Is it possible that you're seeing some other similar) issue? Note that this bugfix made the cache handle the hash collision correctly, not prevent it from happening at all, so just seeing a hash collision doesn't mean that much...
Robert, just to be clear, what you should do is file a new bug, with clear steps to reproduce on your site that show the bug in 3.0.8.
(In reply to comment #43) > Robert, just to be clear, what you should do is file a new bug, with clear > steps to reproduce on your site that show the bug in 3.0.8. Will do. First however I'm gonna implement a naming workaround so that users still using Firefox < 3.0.4 will work as several Linux distro's still default to Firefox 2. After that when I have some time to create a test case, I'll create a new bug :)
You need to log in before you can comment on or make changes to this bug.