Last Comment Bug 529733 - Crash in nsCacheMetaData::SetElement
: Crash in nsCacheMetaData::SetElement
Status: RESOLVED FIXED
[watching crash-stats]
: crash, regression, reproducible
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: All All
: -- critical with 6 votes (vote)
: mozilla9
Assigned To: Alfred Kayser
:
Mentors:
http://lurkmore.ru/Caramelldansen
: 634920 640229 659833 664320 668945 669557 675007 (view as bug list)
Depends on: 726417
Blocks: 394425
  Show dependency treegraph
 
Reported: 2009-11-18 21:30 PST by Mats Palmgren (vacation)
Modified: 2013-12-27 14:19 PST (History)
41 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wanted


Attachments
V1: Make MetaData more crash proof (1.96 KB, patch)
2009-12-29 04:19 PST, Alfred Kayser
bzbarsky: review+
Details | Diff | Splinter Review
V2: Check the metadata structure before using it (2.21 KB, patch)
2011-03-08 01:05 PST, Alfred Kayser
bzbarsky: review-
Details | Diff | Splinter Review
V3: More thorough check of metadata format (2.34 KB, patch)
2011-03-09 00:31 PST, Alfred Kayser
michal.novotny: review-
Details | Diff | Splinter Review
V4: Check for even number of zero's (2.62 KB, patch)
2011-03-11 07:10 PST, Alfred Kayser
michal.novotny: review+
Details | Diff | Splinter Review
V4: Replaced NS_ASSERTION with NS_ABORT_IF_FALSE (2.63 KB, patch)
2011-07-27 08:50 PDT, Alfred Kayser
jduell.mcbugs: review+
Details | Diff | Splinter Review

Description Mats Palmgren (vacation) 2009-11-18 21:30:42 PST
Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)]
is currently #45 in 1.9.3 top crash list (all crashes on Windows).
Looks like a regression from bug 394425.

memmove	 MEMCPY.ASM:271
nsCacheMetaData::SetElement	netwerk/cache/src/nsCacheMetaData.cpp:98
nsCacheEntryDescriptor::SetMetaDataElement	netwerk/cache/src/nsCacheEntryDescriptor.cpp:470
nsHttpChannel::AddCacheEntryHeaders	netwerk/protocol/http/src/nsHttpChannel.cpp:2603
nsHttpChannel::InitCacheEntry	netwerk/protocol/http/src/nsHttpChannel.cpp:2510
nsHttpChannel::ProcessNormal	netwerk/protocol/http/src/nsHttpChannel.cpp:1113
nsHttpChannel::ProcessResponse	netwerk/protocol/http/src/nsHttpChannel.cpp:1014
nsHttpChannel::OnStartRequest	netwerk/protocol/http/src/nsHttpChannel.cpp:5135
nsInputStreamPump::OnStateStart	netwerk/base/src/nsInputStreamPump.cpp:439
nsInputStreamPump::OnInputStreamReady	netwerk/base/src/nsInputStreamPump.cpp:395
nsInputStreamReadyEvent::Run	xpcom/io/nsStreamUtils.cpp:191
nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:527 
...

List of crashes with this signature in the past week:
http://crash-stats.mozilla.com/report/list?branch=1.9.3&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=memmove%20|%20nsCacheMetaData%3A%3ASetElement%28char%20const*%2C%20char%20const*%29
Comment 1 Alfred Kayser 2009-11-19 04:00:08 PST
Hard to debug without a specific reproduction scenario...
Comment 2 IU 2009-12-02 08:11:02 PST
I've experienced this crash.  It happened repeatedly once I tried to restart after experiencing an [@RtlDeleteCriticalSection ] crash when trying to download from majorgeeks.com while watching a DivX movie via the DivX web player plugin: http://crash-stats.mozilla.com/report/index/bp-5f7a056e-63e8-4946-9575-fa9162091202

After that, I kept getting the [@memmove | nsCacheMetaData::SetElement(char const*, char const*) ] crash every time I tried to start Firefox, until I started Firefox in safe mode.  After that, everything was okay. So seems it could be Firefox having problems dealing with cache corruption issues.

Related crash reports:
http://crash-stats.mozilla.com/report/index/bp-ab949ab4-d0a0-49f2-9ae8-2c2392091202
http://crash-stats.mozilla.com/report/index/bp-2ca3ffc4-d325-4197-a3a1-4b0362091202
http://crash-stats.mozilla.com/report/index/bp-7874bf8e-9b7f-48ca-833c-60afe2091202
http://crash-stats.mozilla.com/report/index/bp-1b9ec0a1-b975-4f14-ac83-278df2091202
Comment 3 Alfred Kayser 2009-12-29 04:19:33 PST
Created attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

The crashes all seems to be related to corrupted cache data.
What seems to happen is that the MetaData string is corrupted, meaning that the key/value pair is no longer terminated with a '\0'.
In other words, when scanning the buffer, either a key or a value is not zero terminated within the available buffer. 

So, this patch adds two measures:
1. make sure that the buffer always has two additional zero's, so that any strlen's looking for the '\0' always end within the bufferspace.
2. make sure the memmove (that crashes) only is done when the source and destination ranges are within the buffer space.
Comment 4 Alfred Kayser 2010-01-01 03:09:06 PST
Tested locally, and on tryserver, passing all tests.
Comment 5 Johnny Stenback (:jst, jst@mozilla.com) 2010-01-26 17:47:56 PST
Not blocking since this seems like a low frequency crash (not in the top 300 for Firefox 3.6), but we'd obviously take a safe fix more or less any time.
Comment 6 Michal Novotny (:michal) 2010-01-27 08:45:54 PST
Comment on attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

Although this patch would fix the problem, I would prefer to fix the wrong memory handling. IMHO it should be sufficient to ensure that

1) the last byte of metadata is \0 when copying data in nsCacheMetaData::UnflattenMetaData()

2) pointer to value doesn't go outside limits
http://hg.mozilla.org/mozilla-central/annotate/3cdf3bd2608c/netwerk/cache/src/nsCacheMetaData.cpp#l53
http://hg.mozilla.org/mozilla-central/annotate/3cdf3bd2608c/netwerk/cache/src/nsCacheMetaData.cpp#l152
Comment 7 V. Korn 2010-10-20 09:29:07 PDT
Hello, since today build (Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101020 Firefox/4.0b8pre) i getting crash with such signature every time i visiting URL 
http://lurkmore.ru/Caramelldansen

Windows gives me message about error in plugin-container.exe and Firefox crashing at the same time.
For me it seems that Firefox is crashing at moment when youtube video is loading.

Crash ids:

bp-8ef42504-a298-4fe7-878b-ff4522101020
bp-3dd67acd-6cb5-4dcb-bfa7-eff272101020
bp-e5cde7ef-e292-42d1-9a7e-ba1912101020
bp-8fa3ee22-9d3d-4699-8091-e08032101020
bp-1028b789-8da2-4c3a-a9a5-871372101020
bp-04024108-5c56-4588-9a59-e335e2101020

I hope it helps
Flash plugin is 10.1.85.3
Comment 8 Mats Palmgren (vacation) 2010-10-20 13:48:42 PDT
Works fine for me using 20101020 with Flash 10.1.85.3 on WinXP.
V. Korn, I would guess one of your extensions is required for the crash
to occur.  Can you figure out which one it is?
Comment 9 Scoobidiver (away) 2011-02-22 03:24:35 PST
It is #9 top crasher in 4.0b12pre and #65 top crasher in 4.0b11.
Comment 10 Scoobidiver (away) 2011-03-04 01:08:41 PST
It is #67 top crasher in 4.0b12.

In reply to comment 8
> Can you figure out which one it is?
Session manager seems to be the likely culprit:
  memmove | nsCacheMetaData::SetElement(char const*, char const*)|EXCEPTION_ACCESS_VIOLATION_READ (54 crashes)
     89% (48/54) vs.   0% (245/62143) {1280606b-2510-4fe0-97ef-9b5a22eafe30} (Session Manager, https://addons.mozilla.org/addon/2324)
         74% (40/54) vs.   0% (225/62143) 0.7.4
         15% (8/54) vs.   0% (15/62143) 0.7.4pre
Comment 11 Alfred Kayser 2011-03-04 05:04:48 PST
I think that the patch of comment #6 will prevent some crashes.
I would add an extra check to ensure that:
91 // Update the value in place

92 newSize -= oldValueSize; 
doesn't result in a negative size.
This could happen if the data in the key/value string is malformed,
then oldValueSize could be much bigger than mMetaData + valueSize.
So, an extra check would be:
 NS_ASSERTION(newSize > oldValueSize, "oldValueSize out of range");
just before line 92.
Comment 12 Alfred Kayser 2011-03-04 05:07:44 PST
And 'UnflattenMetadata' could at least check that the metadata string ends with 2 zero's, or even better check whether it is a valid { key \0 value \0 } \0 construct before using it.
Adding two zero's at the end will at least prevent getting out of bounds.
Comment 13 Alfred Kayser 2011-03-04 05:19:37 PST
Requesting blocking, as this one is now in position 22 on the 4.0b13 top crasher list, and there is a simple and safe patch that will reduce the change of crashing.
Comment 14 Mike Beltzner [:beltzner, not reading bugmail] 2011-03-04 05:57:06 PST
We're done for RC unless there is a showstopper. This does not feel like a showstopper to me - #22 topcrash is meaningless to me without knowing:

 - volume / user
 - what platform it affects
Comment 15 chris hofmann 2011-03-04 07:17:08 PST
relevant stats

its pretty much a start up crash with evidence people might be giving up on 4.0 when hitting it (low repeats).   I'm a bit confused on reading the data around some spikes in daily volume.  they really don't map well to release of betas when we might see higher volume of new installs and updates, but maybe there is a delayed (or press cycle) effect in play there.

daily crash volume
         memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
20110201 23
20110202 38
20110203 83
20110204 51
20110205 105
20110206 76
20110207 48
20110208 51    beta11
20110209 59
20110210 54
20110211 51
20110212 52
20110213 74
20110214 70
20110215 166
20110216 120
20110217 156
20110218 80
20110219 66
20110220 78
20110221 101
20110222 107
20110223 109
20110224 104
20110225 83     - beta12 released
20110226 67
20110227 57
20110228 117
20110301 114
20110302 83


Correlation to startup or time of session
83 total crashes for memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. on 20110302-crashdata.csv
19 startup crashes inside 30 sec.
50 startup crashes inside 3 min.
33 repeated crashes inside 3 min. of last crash

checking --- memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. 20110302-crashdata.csv
found in: 4.0b12 4.0b11 4.0b13pre 4.0b7 4.0b10
release total-crashes
              memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. crashes
                         pct.
all     358047  83      0.000231813
4.0b12  67246   72      0.0010707
4.0b11  15135   6       0.000396432
4.0b13pre 1586  3       0.00189155
4.0b7   2419    1       0.000413394
4.0b10  5515    1       0.000181324

os breakdown
memmove...nsCacheMetaData::SetElement.char.const.,.char.const..Total 83
Win5.1  0.14
Win6.0  0.05
Win6.1  0.81
Comment 16 Mike Beltzner [:beltzner, not reading bugmail] 2011-03-04 11:36:57 PST
Renominate if we believe that this is a bigger problem than we're seeing, but right now looks like it might be some corrupted caches, etc, not gonna hold release.
Comment 17 Boris Zbarsky [:bz] 2011-03-04 11:47:56 PST
Comment on attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

r=me
Comment 18 chris hofmann 2011-03-04 12:09:55 PST
added [fx4-rc-ridealong] to get this on the radar 
- start up crashes on a brand new firefox 4 seen by hundreds of people a day can hurt adoption, especially if they hit unsuspecting high profile press and bloggers.

bz, you should remove the ridealong tag if there is inordinate risk, or the patch needs more time or baking.
Comment 19 Michal Novotny (:michal) 2011-03-06 04:25:55 PST
> +   // Clear the new bytes and the two fence bytes
> +   memset(buf + mBufferSize, 0, bufSize - mBufferSize + 2);

Why not to clear just the 2 fence bytes?


> And 'UnflattenMetadata' could at least check that the metadata string ends with
> 2 zero's, or even better check whether it is a valid { key \0 value \0 } \0
> construct before using it.
> Adding two zero's at the end will at least prevent getting out of bounds.

This would be IMO the correct solution and we wouldn't need these fence bytes. The difference is that with the current fix we unflatten incorrect metadata and use it. We only fail when we try to update it. On the other hand with a proper check in UnflattenMetadata() we would fail already while loading the cache entry.
Comment 20 Michal Novotny (:michal) 2011-03-06 04:52:17 PST
Comment on attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

> +   // Clear the new bytes and the two fence bytes
> +   memset(buf + mBufferSize, 0, bufSize - mBufferSize + 2);

Why not to clear just the 2 fence bytes?


> And 'UnflattenMetadata' could at least check that the metadata string ends with
> 2 zero's, or even better check whether it is a valid { key \0 value \0 } \0
> construct before using it.
> Adding two zero's at the end will at least prevent getting out of bounds.

This would be IMO the correct solution and we wouldn't need these fence bytes. The difference is that with the current fix we unflatten incorrect metadata and use it. We only fail when we try to update it. On the other hand with a proper check in UnflattenMetadata() we would fail already while loading the cache entry.
Comment 21 Michal Novotny (:michal) 2011-03-06 15:57:24 PST
*** Bug 634920 has been marked as a duplicate of this bug. ***
Comment 22 Alfred Kayser 2011-03-08 01:05:42 PST
Created attachment 517680 [details] [diff] [review]
V2: Check the metadata structure before using it

This one is probably enough, but possibly we need to combine it with the crashproofing check of V1.
Comment 23 Boris Zbarsky [:bz] 2011-03-08 13:27:13 PST
Comment on attachment 517680 [details] [diff] [review]
V2: Check the metadata structure before using it

I don't know enough about the metadata format to mark r+ here (Michal might be a better choice), but this patch doesn't look like it was ever tested... Specifically, the C++ "!=" operator doesn't do any assignment, so |odd| always ends up false.
Comment 24 Alfred Kayser 2011-03-09 00:31:28 PST
Created attachment 518001 [details] [diff] [review]
V3: More thorough check of metadata format

Michal, I have based this patch on your patch from the dup'ed bug, and made the check even more thorough/crash proof. Can you review this?
Comment 25 Michal Novotny (:michal) 2011-03-09 12:01:23 PST
I'll review it today.
Comment 26 Johnny Stenback (:jst, jst@mozilla.com) 2011-03-09 12:55:47 PST
*** Bug 640229 has been marked as a duplicate of this bug. ***
Comment 27 Michal Novotny (:michal) 2011-03-09 15:38:29 PST
Comment on attachment 518001 [details] [diff] [review]
V3: More thorough check of metadata format

> +   // Check if the metadata ends with a zero byte
> +   // and if longer than 1 byte, with two zero's.
...
> +   // Check that there are an odd number of zero bytes
> +   // to match the pattern { key \0 value \0 } \0

This is wrong. There is no extra null at the end of the data. I.e. we should check just the ending null and the count of nulls should be even.


> +    NS_ASSERTION(data == limit, "Metadata corrupted");
...
> +    NS_ASSERTION(data <= limit, "Metadata corrupted");

Why did you omit these assertions from my patch? (those at the end of GetElement() and VisitElements()). Either we should add them too, or we could assume that the crash is caused only by unflattening corrupted metadata and in this case we don't need to add any assertion/abort in GetElement() and VisitElements().
Comment 28 Mike Beltzner [:beltzner, not reading bugmail] 2011-03-10 11:39:43 PST
No reviewed patch, no ridealong status, but bsmedberg thinks we should keep it in the nomination queue and watch the crash stats on RC.
Comment 29 chris hofmann 2011-03-10 14:28:30 PST
> watch the crash stats on RC

here is what to watch for increases in.

there was a small residual crash that was 4.0x only of about 20-30 crashes per day up up until b10.  after b10 volume is progressively higher.  with the recent growth in users its hard to say if there was another regressiion spike around b11 or b12.

looks like a combined 100-150 crashes per day through b12.

         memcpy...nsCacheMetaData::SetElement.char.const...char.const..
date     total    breakdown by build
         crashes  count build, count build, ...

20110301   
20110302 4 4.0b13pre2011030103 	4 , 
20110303   
20110304 2 4.0b13pre2011030103 	2 , 
20110305 3  	2 4.0b13pre2011030403, 
        		1 4.0b13pre2011030103, 
20110306 5  	4 4.0b13pre2011030503, 
        		1 4.0b13pre2011030603, 
20110307 3  	2 4.0b13pre2011030703, 
        		1 4.0b112011020314, 
20110308 15  	9 4.0b13pre2011030703, 
        		3 4.0b13pre2011030803, 	3 4.0b13pre2011030603, 
20110309 9 4.0b13pre2011030803 	9 , 

        memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
date     total    breakdown by build
         crashes  count build, count build, ...

20110301 114  	109 4.0b122011022221, 
        		4 4.0b112011020314, 	1 4.0b13pre2011030103, 
20110302 83  	72 4.0b122011022221, 
        		6 4.0b112011020314, 	2 4.0b13pre2011030203, 
        		1 4.0b72010110414, 	1 4.0b13pre2011030103, 
        		1 4.0b102011012116, 
20110303 139  	108 4.0b122011022221, 
        		16 4.0b13pre2011030303, 	7 4.0b112011020314, 
        		6 4.0b82010121417, 	1 4.0b22010072019, 
        		1 4.0b13pre2011030203, 
20110304 142  	132 4.0b122011022221, 
        		4 4.0b82010121417, 	2 4.0b13pre2011030312, 
        		2 4.0b112011020314, 	1 4.0b13pre2011030303, 
        		1 4.0b13pre2011022803, 
20110305 118  	115 4.0b122011022221, 
        		2 4.0b112011020314, 	1 4.0b32010080519, 
20110306 66  	61 4.0b122011022221, 
        		2 4.0b112011020314, 	1 4.0b22010072019, 
        		1 4.0b13pre2011030312, 	1 4.02011030319, 
20110307 132  	130 4.0b122011022221, 
        		1 4.0b62010091408, 	1 4.0b112011020314, 
20110308 91  	74 4.0b122011022221, 
        		8 4.0b112011020314, 	6 4.0b13pre2011030312, 
        		3 4.0b102011012116, 
20110309 98  	68 4.0b122011022221, 
        		13 4.02011030319, 	12 4.0b13pre2011030312, 
        		3 4.0b13pre2011030603, 	1 4.0b92011011019, 
        		1 4.0b12010063014,
Comment 30 Alfred Kayser 2011-03-11 07:10:44 PST
Created attachment 518711 [details] [diff] [review]
V4: Check for even number of zero's

I'll ask review as soon as I have tested this more thoroughly.

As for the review comments:
1. It is indeed an even number of zero's to check for, with one zero byte at the end.
2. Added the two assert's, but I think they won't catch anything.

With the check for *even* number of zero bytes included, the metadata buffer handling cannot fail.
Comment 31 Mike Beltzner [:beltzner, not reading bugmail] 2011-03-14 11:41:17 PDT
bsmedberg said that it wouldn't be that important to be taken on branch, so we'll get it for Firefox 5.
Comment 32 chris hofmann 2011-03-14 11:47:52 PDT
Your comment was:

    update on the stats

    ound in: 4.0 4.0b12 4.0b13pre 4.0b11
    release total-crashes
                  memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
    crashes
                             pct.
    all      370362    102    0.000275406
    4.0    32055    53    0.00165341
    4.0b12    62897    46    0.000731354
    4.0b13pre    1376    2    0.00145349
    4.0b11    7416    1    0.000134844

    os breakdown
    memmove...nsCacheMetaData::SetElement.char.const.,.char.const..Total 102
    Win5.1  0.12
    Win6.0  0.09
    Win6.1  0.79


    Correlation to startup or time of session
    102 total crashes for
    memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. on
    20110310-crashdata.csv
    49 startup crashes inside 30 sec.
    70 startup crashes inside 3 min.


     urls for testing
       9 http://www.mediaite.com/
       6 \N
       5 http://alayia-bos.mhcstudios.com/
       3 http://www.wowprogress.com/character/eu/crushridge/Bellecosce
       3 http://www.tumblr.com/dashboard
       3 http://www.sexadir.co.il/forum/default.asp
       3 http://tavria.org.ua/forum/private.php?do=showpm&pmid=242321
       2 http://www.slickdeals.net/
       2 http://www.poterie-du-monde.com/poteries-classiques,fr,2,31.cfm

    a little lower volume over the weekend.

             memmove...nsCacheMetaData::SetElement.char.const.,.char.const..

    date     total    breakdown by build
             crashes  count build, count build, ...

    20110310 102      53 4.02011030319, 
                    46 4.0b122011022221,     2 4.0b13pre2011030312, 
                    1 4.0b112011020314, 
    20110311 129      105 4.02011030319, 
                    18 4.0b122011022221,     3 4.0b13pre2011030312, 
                    1 4.0b82010121417,     1 4.0b13pre2011031003, 
                    1 4.0b102011012116, 
    20110312 90      73 4.02011030319, 
                    8 4.0b13pre2011031203,     4 4.0b122011022221, 
                    3 4.0b13pre2011031103,     1 4.0b32010080519, 
                    1 4.0b13pre2011030312, 
    20110313 93      77 4.02011030319, 
                    7 4.0b13pre2011031203,     5 4.0b122011022221, 
                    4 4.0b13pre2011031303,
Comment 33 katgamer@gmail.com 2011-03-19 12:53:06 PDT
I strongly disagree that this is not a major bug - it is. It is preventing me from visiting ne of my favorite forums at http://forums.groundspeak.com/GC/, and you can see my troubleshooting here http://forums.mozillazine.org/viewtopic.php?f=38&t=2135783. It crashes with all add-ons disabled so it is clearly a fault in FF4 and not the add-ons.
Comment 34 Michael Kraft [:morac] 2011-03-28 12:15:20 PDT
(In reply to comment #10)
> It is #67 top crasher in 4.0b12.
> 
> In reply to comment 8
> > Can you figure out which one it is?
> Session manager seems to be the likely culprit:
>   memmove | nsCacheMetaData::SetElement(char const*, char
> const*)|EXCEPTION_ACCESS_VIOLATION_READ (54 crashes)
>      89% (48/54) vs.   0% (245/62143) {1280606b-2510-4fe0-97ef-9b5a22eafe30}
> (Session Manager, https://addons.mozilla.org/addon/2324)
>          74% (40/54) vs.   0% (225/62143) 0.7.4
>          15% (8/54) vs.   0% (15/62143) 0.7.4pre


How did you come to that conclusion?  From my brief check, while a number of people seeing this crash have Session Manager installed, many do not.  Some don't have any add-ons installed.  For example:

bp-f7639368-cce8-4096-9348-59f5f2110328
bp-ef6d5e4d-67f7-4583-86db-d2e1f2110327
bp-f15ee523-155c-406b-b43d-18adf2110328


I will mention though that Session Manager has an implementation of "Cache fixer" in it (from a ways back) which removes the "dirty" flag from the _CACHE_MAP_ on startup which prevents the cache from being wiped on a crash. See http://imglikeopera.mozdev.org/cache_fixer.html.  

It is currently enabled by default, but I had planned to change that to be disabled by default.  If this is somehow corrupting the cache data, I can simply remove the feature.
Comment 35 Boris Zbarsky [:bz] 2011-03-28 12:40:06 PDT
Michael, it's quite possible that there are several things that can cause the cache corruption and that Session Manager is one of them.

What the numbers above are saying that of the 54 people who hit this crash, 48 had Session Saver installed and 6 did not (hence it's not the _only_ reason for the crash).  In general, of 62143 people submitting crash reports, for all crashes, 245 had Session Saver installed, so it's not like most our user base is using Session Saver.
Comment 36 chris hofmann 2011-03-28 13:12:17 PDT
up to over 600 crashes per day.

         memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
date     total    breakdown by build
         crashes  count build, count build, ...


20110326 506  	492 4.02011031805, 
        		6 4.02011030319, 	5 4.0b122011022221, 
        		1 4.2a1pre2011032603, 	1 4.0b82010121417, 
        		1 4.0b72010110414, 
20110327 610  	595 4.02011031805, 
        		13 4.0b122011022221, 	2 4.02011030319, 


> Created attachment 518711 [details] [diff] [review]
> V4: Check for even number of zero's
> I'll ask review as soon as I have tested this more thoroughly.

any progress on the testing and review?

are there try builds available for any one who sees this so they could help out in testing?
Comment 37 Alfred Kayser 2011-03-29 05:32:54 PDT
Resetting the 'dirty' flag of the cache, and thereby using a dirty cache will cause problems...
Comment 38 George 2011-03-30 01:24:00 PDT
I had this crash going on since yesterday. It crashed over 10-20 times.

It all started when FF4 hanged on a URL that had bad url redirection loop.
The memory usage went up to 1.5GB and the CPU usage to 50%, basically using one CPU core to 100%.

I tried to quit FF and after a long wait of swapping and swapping it eventually crashed.

I tried to restore the session but it kept crashing every time.

In the end I tried to start FF without restoring session. It started ok. When I though tried to load a backup of previous session via session manager it crashed again.

So slowly slowly I started to narrow down which tab was causing the crash.
I identified the tab and URL and tried the same session that was causing the crash on a different pc with FF4. It didn't crash on other pc.

I then restarted FF4 (not restoring any session) and tried typing the URL that was causing the problem. It crashed again. So this proved to me that the problem was not the session manager or the state of the tab / session. It must have been the cache. So I cleared the cache and the crashes stopped.

So clearly the problem was with corrupted data in the cache in relation to that specific URL.
Comment 39 Mats Palmgren (vacation) 2011-03-30 02:29:21 PDT
If anyone can reproduce this crash in future, please make a backup copy your
cache files so we can analyze them and try to figure out what the bug is.
Comment 40 Paul [pwd] 2011-04-14 04:50:40 PDT
Could bug 616117 be related to this?
Comment 41 Richard Newman [:rnewman] 2011-04-15 18:01:26 PDT
Bug 634920 was duped to this, but this bug is for Windows. Can we either un-dupe, or broaden the platform for this?

I have a user who's seeing this crash on Mac. Seeing if I can get a tarball of ~/Library/Caches/Firefox.
Comment 42 Mats Palmgren (vacation) 2011-04-15 19:42:45 PDT
> I have a user who's seeing this crash on Mac. Seeing if I can get a tarball of
> ~/Library/Caches/Firefox.

Great.  Please inform him that cache files may contain sensitive data and
when attaching it here he should select the Privacy checkbox to restrict
access to people with security bug access.
http://www.mozilla.org/projects/security/secgrouplist.html
Comment 43 Richard Newman [:rnewman] 2011-04-16 13:40:45 PDT
(In reply to comment #42)
> > I have a user who's seeing this crash on Mac. Seeing if I can get a tarball of
> > ~/Library/Caches/Firefox.
> 
> Great.  Please inform him that cache files may contain sensitive data and
> when attaching it here he should select the Privacy checkbox to restrict
> access to people with security bug access.
> http://www.mozilla.org/projects/security/secgrouplist.html

Cache directory is 1.6GB uncompressed. Can you suggest a subset of files that would be helpful, rather than me schlepping the whole thing around?

Once the cache directory has been obtained, can I clear the cache to allow her to continue using Firefox without frequent crashes?

This was the last crash:

https://crash-stats.mozilla.com/report/index/1203ef99-943f-4980-b625-e461c2110416
Comment 44 Mats Palmgren (vacation) 2011-04-16 16:55:14 PDT
> Cache directory is 1.6GB uncompressed. Can you suggest a subset of files

I'm not really an expert on the network cache but I think we only need
the Cache/ folder and its contents, not the *.mfasl files or urlclassifier*
(that should make it a lot smaller)

> Once the cache directory has been obtained, can I clear the cache...

Before you clear the cache, please check the disk usage (output from the
"df -k" command for example).  Also, load about:config and type "cache"
in the filter box, then note any values for browser.disk.cache.*
that has status "user set" (you can click the column head to sort them).

Thanks for your help, it's really appreciated.
Comment 45 Richard Newman [:rnewman] 2011-04-17 12:55:54 PDT
(In reply to comment #44)
> > Cache directory is 1.6GB uncompressed. Can you suggest a subset of files
> 
> I'm not really an expert on the network cache but I think we only need
> the Cache/ folder and its contents, not the *.mfasl files or urlclassifier*
> (that should make it a lot smaller)

The Cache/ folder is 1.5GB. urlclassifier* is 45MB. XUL.mfasl is 1.7MB.

Of the 150251 cache files, only 11043 were modified within a day of the reported crash. That should bring the size of the cache archive down to perhaps 70MB compressed.

Is there a better way to narrow down a working set?
 

> Before you clear the cache, please check the disk usage...

Thanks Mats. There's plenty of free disk:

Filesystem    1024-blocks     Used Available Capacity  Mounted on
/dev/disk0s2    487919600 85087324 402576276    18%    /
devfs                 109      109         0   100%    /dev
map -hosts              0        0         0   100%    /net
map auto_home           0        0         0   100%    /home


These are the only user-set cache params:

browser.cache.disk.capacity: 1048576
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size_cached_value: 1048576
Comment 46 Michal Novotny (:michal) 2011-04-18 06:21:34 PDT
First of all, please backup the whole Cache directory (while firefox is not running). Even normal usage of the browser could evict broken entries so we would lose them. Then there are 2 options (let me know which do you prefer).

1) We could see which entry is broken from the log in case it crashes again. Start NSPR logging (see https://developer.mozilla.org/en/HTTP_Logging) and make sure you're logging http and cache activity (i.e. NSPR_LOG_MODULES=nsHttp:5,cache:5). When the browser crashes attach the log to this bug.

2) I could write a small utility that would go through the cache and point out the wrong entry.
Comment 47 Richard Newman [:rnewman] 2011-04-18 09:37:07 PDT
(In reply to comment #46)
> First of all, please backup the whole Cache directory (while firefox is not
> running).

I have a complete snapshot.

> 2) I could write a small utility that would go through the cache and point out
> the wrong entry.

This would be the most convenient option for me. (And certainly most reusable!)

This is my mum's machine, and there's an 8-hour time difference, with commensurate latency; anything I can do over SSH without her involvement would make my day!
Comment 48 itcheg 2011-05-18 06:42:48 PDT
I can confirm that I had this crash for one particular site, going on for weeks, reproducible 100%, every single time I would go to the site it would crash - even with all extensions disabled (I do have session manager installed).

I cleared the cash and have not had a crash since.
Comment 49 Michael Kraft [:morac] 2011-05-25 21:41:41 PDT
*** Bug 659833 has been marked as a duplicate of this bug. ***
Comment 50 Harvey Chapman 2011-05-25 22:23:12 PDT
(In reply to comment #49)
> *** Bug 659833 has been marked as a duplicate of this bug. ***

Should I save my cache first before I clear it?
Comment 51 Andrew Hurle [:ahurle] 2011-05-27 02:05:30 PDT
I'm experiencing this crash as well.  I visit a page and Firefox crashes while loading it.  I'm always able to reproduce it on the page it crashed on after I restart Firefox.  Sometimes I need to refresh a few times to get it to happen.  I also got it to crash while in safe mode on one of these problematic pages, but, interestingly, it wasn't recognized as being related to this bug.  I do have Session Manager installed and enabled.  Windows 7 64, Firefox 4.0.1.

Here is a list of crashes I've reported since they first started happening in this pattern (roughly).  I didn't check all of them, but I know at least half refer to this bug.


bp-9d510644-18c3-45af-82b9-84a212110527    5/27/2011 4:14 AM
^^^Crash in safe mode on the same page, but not related to this bug

bp-ffb4270f-e62b-4c7e-a246-d7ed62110526    5/27/2011 2:25 AM
bp-dd11c7d6-2c67-46d4-8833-7ec2a2110526    5/27/2011 1:32 AM
bp-5e7775a5-bd1e-4e6f-8b3a-9739d2110526    5/27/2011 1:15 AM
bp-9b28e27e-f811-4b63-8a35-954a42110526    5/27/2011 1:14 AM
bp-fcdbea69-21f2-4d53-aca3-042472110526    5/27/2011 1:11 AM
bp-484df0a3-a51d-4735-be76-36e082110526    5/27/2011 12:41 AM
bp-72b1d5e6-012d-4ecd-a745-78aa52110526    5/27/2011 12:11 AM
bp-d47df0dc-b1f1-478a-9e21-7be192110526    5/27/2011 12:08 AM
bp-0b12e2a3-0e0b-4878-85c1-e38082110526    5/27/2011 12:07 AM
bp-f7e56991-254b-45fc-8cb8-1ad8e2110526    5/26/2011 11:25 PM
^^^I believe all of these were on the same page:
http://support.github.com/discussions/repos/5439-git-and-renaming-remote-branches

bp-ca5bbae9-8049-4b49-89b3-aa4892110525    5/25/2011 4:09 PM
bp-3bf6255b-7732-424a-a089-b94242110524    5/24/2011 9:09 PM
bp-d435a8be-fcf2-4bf6-82cc-597c02110524    5/24/2011 6:38 PM
bp-4e5e3ddd-3225-498d-b457-aa6342110521    5/21/2011 1:47 PM
bp-a014c557-4e70-49c9-872b-750e52110520    5/20/2011 4:58 AM
bp-c2fbffa3-0c83-4bf0-94ae-74b842110520    5/20/2011 4:55 AM
bp-1e1d35fd-430c-4a66-ba36-44c532110516    5/16/2011 9:50 PM
bp-6f31fdae-3f7d-46fc-b60c-610df2110516    5/16/2011 9:48 PM
^^^First crash related to this bug

bp-275bc3cd-dff6-4702-accd-883902110516    5/16/2011 3:30 PM
bp-83981cd5-225c-4776-90fc-8dc112110511    5/11/2011 3:07 PM
^^^First crash for about 3 months

I checked out what Firefox listed for my cache, and there is definitely something weird going on.  Some files say they've been accessed slightly less than a billion times, some 700,000.  These large numbers will often repeat themselves for different files.  Some files say they were last modified in 1982 or 2029.  I looked at some of these anomalous files and their contents usually don't make sense.  A GIF file will say that the server responded with some javascript, stuff like that.  I've also been experiencing images that don't make sense on some websites, I assume because of this cache issue.  For example, the background of this page had a bunch of stars on it before I refreshed.


I made a copy of my cache directory.  I also logged when the crash happened as suggested in comment 46.  I copied over all files that were modified on or after the 26th (when my most recent string of crashes happened) and 7zipped them up (36 megs).  However, when I went to attach the archive and log file, I see no privacy checkbox as suggested in comment 42.  Is there any way to privately attach my files?
Comment 52 chris hofmann 2011-05-27 07:47:33 PDT
5.0 regression that we really should get on top of.  looks like there are some thoughts on a patch, but they never made it to check-in state.  looks like it should work its way into the new networking team priorities.
Comment 53 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2011-05-27 07:53:08 PDT
5.0 regression? This bug was filed against 4.0 betas. chofmann, are you saying that this crash is much higher in 5.0 betas than 4.0 betas, or something else? I see it as #34 in the list, which isn't very high to be tracking against.
Comment 54 Scoobidiver (away) 2011-05-27 07:57:56 PDT
It is #72 top crasher in 4.0.1 and #25 top crasher in 5.0b2 (without hangs).
Comment 55 chris hofmann 2011-05-27 11:23:03 PDT
> 5.0 regression

er, sorry. 4.0 regression.  

it appears that it might also be rising in volume associated with major update offers and other increases in migration from 3.x up to 4.0.x and 5.0, then volume declines when the influx of new updates levels off.

we hit a peak of 339 crashes per day on may 18.

        memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
date     total    breakdown by build
         crashes  count build, count build, ...

20110518 339  	321 4.0.12011041322, 
        		12 4.02011031805, 	2 5.02011042714, 
        		1 6.0a12011051703, 	1 6.0a12011051603, 
        		1 6.0a12011051003, 	1 4.0b122011022221,
Comment 56 Paulo F 2011-05-28 13:07:31 PDT
I was having this problem on specific pages for awhile now, even with all add-ons and plugins disabled.

It finally resolved itself when I told Firefox to clear out its ~650MB (auto-managed cache size), which took a good couple of mins.

Now all is good. Very bizarre.
Comment 57 Michal Novotny (:michal) 2011-05-30 15:50:59 PDT
(In reply to comment #51)
> However, when I went to attach the archive and log file, I see no privacy
> checkbox as suggested in comment 42.  Is there any way to privately attach
> my files?

I'm not sure but I think that you can check the security flag only while creating a new bug. Anyway, feel free to send me the files directly via email.
Comment 58 Boris Zbarsky [:bz] 2011-05-30 17:50:24 PDT
There is in fact a per-attachment privacy flag, but only security group members can set it, and only security group members can see the resulting attachment...
Comment 59 justin.merrow 2011-06-02 22:27:34 PDT
After spending days pulling my hair out while firefox crashed over and over, I was finally able to fix this problem by clearing the cache.  Because I was seeing this crash with 100% consistency on a number of specific unrelated sites, I'm tempted to think that the problem is being caused by firefox doing something which is corrupting data in the cache.  I wish I could remember the exact circumstances under which I first saw this, but unfortunately I don't.  All of that said, the browser really should be robust enough to handle a corrupted cache - even if that means automatically dumping the cache on the next launch after a corrupt cache causes a crash, or, if you're reluctant to delete a user's data, just don't load the cache and give the user a message explaining the issue and giving them the option to immediately clear the cache permanently.
Comment 60 Marcia Knous [:marcia - use ni] 2011-06-06 15:56:38 PDT
There is a "Privacy" checkbox at the bottom of the attachment pane if you want to add what you captured. We will probably need more information than that in order to be able to reproduce this bug.


> I made a copy of my cache directory.  I also logged when the crash happened
> as suggested in comment 46.  I copied over all files that were modified on
> or after the 26th (when my most recent string of crashes happened) and
> 7zipped them up (36 megs).  However, when I went to attach the archive and
> log file, I see no privacy checkbox as suggested in comment 42.  Is there
> any way to privately attach my files?
Comment 61 Matthias Versen [:Matti] 2011-06-14 16:30:50 PDT
*** Bug 664320 has been marked as a duplicate of this bug. ***
Comment 62 Johnathan Nightingale [:johnath] 2011-06-27 14:51:20 PDT
Clearing tracking flag - this will be tracked by crashkill going forward.
Comment 63 Tyler Downer [:Tyler] 2011-07-02 08:56:48 PDT
*** Bug 668945 has been marked as a duplicate of this bug. ***
Comment 64 Asa Dotzler [:asa] 2011-07-17 22:18:55 PDT
Sheila, can you or someone else on crashkill update us on this? Do we still need to be tracking it for Firefox 6 Beta?
Comment 65 Scoobidiver (away) 2011-07-18 01:11:45 PDT
It is #5 top browser crasher in 6.0b2.
Comment 66 Alfred Kayser 2011-07-18 01:30:50 PDT
So, as it is now in the top 5, and there is a patch available, can this patch be baked on the nightly branch, and possibly even on Aurora, so that we can test whether this fixes the problems?
Comment 67 Boris Zbarsky [:bz] 2011-07-18 07:10:04 PDT
Doesn't the patch need review?
Comment 68 Jason Duell [:jduell] (needinfo me) 2011-07-19 12:52:28 PDT
Comment on attachment 518711 [details] [diff] [review]
V4: Check for even number of zero's

Review of attachment 518711 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 69 Michal Novotny (:michal) 2011-07-22 07:58:31 PDT
Comment on attachment 518711 [details] [diff] [review]
V4: Check for even number of zero's

> +        NS_ABORT_IF_FALSE(value < limit, "Cache Metadata corrupted");
>          if (strcmp(data, key) == 0)
>              return value;
>  
>          // Skip value part
>          data = value + strlen(value) + 1;
>      }
> +    NS_ASSERTION(data == limit, "Metadata corrupted");

To be consistent, either change NS_ABORT_IF_FALSE to NS_ASSERTION or the other way round.
Comment 70 Scoobidiver (away) 2011-07-27 06:44:29 PDT
Does the patch land on m-c?
Comment 71 Boris Zbarsky [:bz] 2011-07-27 08:37:56 PDT
Once someone updates it to comment 69, right?
Comment 72 Alfred Kayser 2011-07-27 08:50:24 PDT
Created attachment 548801 [details] [diff] [review]
V4: Replaced NS_ASSERTION with NS_ABORT_IF_FALSE
Comment 73 Matthias Versen [:Matti] 2011-07-28 14:36:10 PDT
*** Bug 675007 has been marked as a duplicate of this bug. ***
Comment 74 Matthias Versen [:Matti] 2011-07-28 14:37:04 PDT
*** Bug 669557 has been marked as a duplicate of this bug. ***
Comment 75 Josef Davies-Coates 2011-07-28 15:07:56 PDT
Just wanted to say that I was experiencing this bug and also have session manager installed (although it was reproducible with it disabled).
Comment 76 Johnny Stenback (:jst, jst@mozilla.com) 2011-08-01 14:46:28 PDT
What needs to happen here, we have a patch, but no reviews. It's a high top crasher, seems we need to proceed here if we know what's needed, or come up with additional steps to learn more about this top crash.
Comment 77 contrazz 2011-08-01 17:22:20 PDT
I reported that I was using Session Manager, but only mentioned it as a diagnostic tool.  I have no reason to believe that Session Manager is a contributor to the problem.
Comment 78 Jason Duell [:jduell] (needinfo me) 2011-08-01 21:35:38 PDT
Comment on attachment 548801 [details] [diff] [review]
V4: Replaced NS_ASSERTION with NS_ABORT_IF_FALSE

This just has minor fixups that michal wanted from last +r version, so I'm going to guessing we can forward the +r on it.
Comment 79 christian 2011-08-08 14:43:22 PDT
We haven't got any new information and don't think we need to track it for 6 (as it has been seen all the way back to 1.9.3)
Comment 80 Sheila Mooney 2011-08-08 14:44:56 PDT
So do we think we have a fix for this now? We aren't going to track it for 6.0 anymore but it regressed somehow with 5.0 so it would be great to get fixed.
Comment 81 Scoobidiver (away) 2011-08-09 00:50:09 PDT
It is now only #131 top browser crasher in 5.0 (drop 89 ranks in one week) and #108 in 6.0 (drop 63 ranks in one week).

The drop started with the release of version 0.7.6 of Session Manager on July 27 that removed Cache Fixer:
https://addons.mozilla.org/firefox/addon/session-manager/
See also 5.0 correlations by add-ons:
  memmove | nsCacheMetaData::SetElement(char const*, char const*)|EXCEPTION_ACCESS_VIOLATION_READ (31 crashes)
     32% (10/31) vs.   0% (308/90925) {1280606b-2510-4fe0-97ef-9b5a22eafe30} (Session Manager, https://addons.mozilla.org/addon/2324)
         26% (8/31) vs.   0% (149/90925) 0.7.5
          6% (2/31) vs.   0% (158/90925) 0.7.6
          0% (0/31) vs.   0% (1/90925) 0.7.6.1

I renamed the bug summary.
Comment 82 Jason Duell [:jduell] (needinfo me) 2011-08-22 15:11:44 PDT
Comment on attachment 548801 [details] [diff] [review]
V4: Replaced NS_ASSERTION with NS_ABORT_IF_FALSE

Michal: so it seems much of the crashing was Session manager related:  but do we still want to land this patch?  I'm guessing the answer is yes.
Comment 83 Michal Novotny (:michal) 2011-08-22 15:35:47 PDT
Yes, we should land this patch.
Comment 84 Ed Morley [:emorley] 2011-08-23 04:05:35 PDT
In my queue, which is being sent to try and then presuming green, I'll land on inbound later today.
Comment 86 Marco Bonardo [::mak] (Away 6-20 Aug) 2011-08-24 01:37:33 PDT
http://hg.mozilla.org/mozilla-central/rev/72f1510e3f31

Note You need to log in before you can comment on or make changes to this bug.