Last Comment Bug 325765 - Cache causes crash crossing architecture [@ nsDiskCacheRecord::HashNumber].[@ nsDiskCacheMap::FindRecord]
: Cache causes crash crossing architecture [@ nsDiskCacheRecord::HashNumber].[@...
Status: RESOLVED FIXED
[camino-1.0][camino-1.0.1][camino-1.0...
: fixed1.8.0.2, fixed1.8.1
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: 1.8 Branch
: PowerPC Mac OS X
: -- critical (vote)
: ---
Assigned To: Mark Mentovai
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-03 09:46 PST by Stuart Morgan
Modified: 2007-08-10 11:27 PDT (History)
8 users (show)
dveditz: blocking1.8.0.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Stack trace (ppc with x86 cache) (22.30 KB, text/plain)
2006-02-03 16:38 PST, Mark Mentovai
no flags Details
Byte-swap the entire cache header (2.51 KB, patch)
2006-02-03 21:39 PST, Mark Mentovai
darin.moz: review+
sfraser_bugs: superreview+
Details | Diff | Splinter Review
Branch version (2.57 KB, patch)
2006-02-03 21:40 PST, Mark Mentovai
darin.moz: approval‑branch‑1.8.1+
Details | Diff | Splinter Review
1.8.0 branch version (1.83 KB, patch)
2006-02-04 16:01 PST, Mark Mentovai
darin.moz: review+
dveditz: approval1.8.0.2+
Details | Diff | Splinter Review

Description Stuart Morgan 2006-02-03 09:46:55 PST
Try saying that three times fast.

Camino and Firefox will both crash on first lanch if the previous launch was under the opposite architecture (because someone copied their entire library from an old machine, got a universal version after having run under Rosetta, or just toggled Rosetta on and off for kicks).  The crash doesn't reproduce if you manually delete the cache before launching the other way.
Comment 1 Mark Mentovai 2006-02-03 16:38:28 PST
Created attachment 210651 [details]
Stack trace (ppc with x86 cache)

Oh yeah.  It's all about the cache.

Also TB14724380H.
Comment 2 Mark Mentovai 2006-02-03 21:39:53 PST
Created attachment 210674 [details] [diff] [review]
Byte-swap the entire cache header

This is crashing because the nsDiskCacheHeader::mBucketUsage array is not being byte-swapped when read from/written to disk.  See Swap and Unswap:

http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheMap.h#332

nsDiskCacheMap::FindRecord was then getting bad data for the number of buckets on a table entry, so attempts to get the buckets would fail.  Instead of records[1], it would look for records[16777216].  Kaboom.

http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheMap.cpp#446

The cache would be marked dirty on first launch before crashing and therefore discarded on a subsequent launch.

The disk cache is supposed to be endian-neutral and stored on disk in big-endian format.  (Re)introducing proper byte-swapping renders all existing caches unusable on little-endian systems (for the above reason), so the disk cache version number is bumped.  I selected 1.9 for the trunk, reserving 1.8 for the same change on the branch - the trunk and branch are currently at 1.7 and 1.6, respectively.
Comment 3 Mark Mentovai 2006-02-03 21:40:28 PST
Created attachment 210675 [details] [diff] [review]
Branch version
Comment 4 Darin Fisher 2006-02-04 08:43:57 PST
Comment on attachment 210674 [details] [diff] [review]
Byte-swap the entire cache header

r=darin
Comment 5 Alfred Kayser 2006-02-04 11:36:17 PST
mEvictionRank was never swapped, and mBucketUsage was introduced by me.
So, at least for the last one, my apologies for this crasher.
Comment 6 Mark Mentovai 2006-02-04 12:29:22 PST
mEvictionRank never hurt because it's never used for pointer arithmetic, only in comparisons.  It was wrong, but apparently nobody ever noticed.
Comment 7 Darin Fisher 2006-02-04 13:46:56 PST
Comment on attachment 210675 [details] [diff] [review]
Branch version

a=darin for the 1.8 branch
Comment 8 Mark Mentovai 2006-02-04 14:02:21 PST
Fixed on trunk, MOZILLA_1_8_BRANCH, and CAMINO_1_0_BRANCH.
Comment 9 Mark Mentovai 2006-02-04 14:02:51 PST
Comment on attachment 210675 [details] [diff] [review]
Branch version

I'd like this for 1.8.0.2.
Comment 10 Darin Fisher 2006-02-04 14:56:38 PST
It'd be nice if we could avoid discarding the cache for all FF 1.5 users.  Maybe some migration code is the answer, or perhaps we should make this change be XP_MACOSX only on the 1.8.0 branch.  Neither is a great solution, but tossing out 50 MB of cache for users behind a modem isn't very nice either.
Comment 11 Mark Mentovai 2006-02-04 16:01:02 PST
Created attachment 210720 [details] [diff] [review]
1.8.0 branch version

Considering that the only caches out there on Macs right now are big-endian, we can handle this with #ifdef XP_MACOSX and not bumping the cache version on the 1.8.0 branch.  The only people who will get shafted this way are those who used an unofficial Firefox 1.5 x86 build.
Comment 12 Mark Mentovai 2006-02-04 16:05:54 PST
We can spin a new unofficial x86 Firefox build equivalent to 1.5.0.1 with this 1.8.0 patch, so users running the unofficials will crash first using THAT, and not with an official release.

(http://wiki.mozilla.org/Mac:Intel)
Comment 13 Daniel Veditz [:dveditz] 2006-02-21 15:19:35 PST
Comment on attachment 210720 [details] [diff] [review]
1.8.0 branch version

approved for 180 branch, a=dveditz for drivers
Comment 14 Mark Mentovai 2006-02-21 19:23:34 PST
1.8.0 branch version checked in on 1.8.0 branch.

Note You need to log in before you can comment on or make changes to this bug.