Closed Bug 529733 Opened 15 years ago Closed 13 years ago

Crash in nsCacheMetaData::SetElement

Categories

(Core :: Networking: Cache, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla9
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: MatsPalmgren_bugz, Assigned: alfredkayser)

References

()

Details

(Keywords: crash, regression, reproducible, Whiteboard: [watching crash-stats])

Crash Data

Attachments

(1 file, 4 obsolete files)

Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)]
is currently #45 in 1.9.3 top crash list (all crashes on Windows).
Looks like a regression from bug 394425.

memmove	 MEMCPY.ASM:271
nsCacheMetaData::SetElement	netwerk/cache/src/nsCacheMetaData.cpp:98
nsCacheEntryDescriptor::SetMetaDataElement	netwerk/cache/src/nsCacheEntryDescriptor.cpp:470
nsHttpChannel::AddCacheEntryHeaders	netwerk/protocol/http/src/nsHttpChannel.cpp:2603
nsHttpChannel::InitCacheEntry	netwerk/protocol/http/src/nsHttpChannel.cpp:2510
nsHttpChannel::ProcessNormal	netwerk/protocol/http/src/nsHttpChannel.cpp:1113
nsHttpChannel::ProcessResponse	netwerk/protocol/http/src/nsHttpChannel.cpp:1014
nsHttpChannel::OnStartRequest	netwerk/protocol/http/src/nsHttpChannel.cpp:5135
nsInputStreamPump::OnStateStart	netwerk/base/src/nsInputStreamPump.cpp:439
nsInputStreamPump::OnInputStreamReady	netwerk/base/src/nsInputStreamPump.cpp:395
nsInputStreamReadyEvent::Run	xpcom/io/nsStreamUtils.cpp:191
nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:527 
...

List of crashes with this signature in the past week:
http://crash-stats.mozilla.com/report/list?branch=1.9.3&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=memmove%20|%20nsCacheMetaData%3A%3ASetElement%28char%20const*%2C%20char%20const*%29
Hard to debug without a specific reproduction scenario...
Version: unspecified → Trunk
I've experienced this crash.  It happened repeatedly once I tried to restart after experiencing an [@RtlDeleteCriticalSection ] crash when trying to download from majorgeeks.com while watching a DivX movie via the DivX web player plugin: http://crash-stats.mozilla.com/report/index/bp-5f7a056e-63e8-4946-9575-fa9162091202

After that, I kept getting the [@memmove | nsCacheMetaData::SetElement(char const*, char const*) ] crash every time I tried to start Firefox, until I started Firefox in safe mode.  After that, everything was okay. So seems it could be Firefox having problems dealing with cache corruption issues.

Related crash reports:
http://crash-stats.mozilla.com/report/index/bp-ab949ab4-d0a0-49f2-9ae8-2c2392091202
http://crash-stats.mozilla.com/report/index/bp-2ca3ffc4-d325-4197-a3a1-4b0362091202
http://crash-stats.mozilla.com/report/index/bp-7874bf8e-9b7f-48ca-833c-60afe2091202
http://crash-stats.mozilla.com/report/index/bp-1b9ec0a1-b975-4f14-ac83-278df2091202
The crashes all seems to be related to corrupted cache data.
What seems to happen is that the MetaData string is corrupted, meaning that the key/value pair is no longer terminated with a '\0'.
In other words, when scanning the buffer, either a key or a value is not zero terminated within the available buffer. 

So, this patch adds two measures:
1. make sure that the buffer always has two additional zero's, so that any strlen's looking for the '\0' always end within the bufferspace.
2. make sure the memmove (that crashes) only is done when the source and destination ranges are within the buffer space.
Assignee: nobody → alfredkayser
Status: NEW → ASSIGNED
Attachment #419406 - Flags: review?(shaver)
Tested locally, and on tryserver, passing all tests.
Not blocking since this seems like a low frequency crash (not in the top 300 for Firefox 3.6), but we'd obviously take a safe fix more or less any time.
blocking2.0: ? → -
status2.0: --- → wanted
Attachment #419406 - Flags: review?(shaver) → review?(michal.novotny)
Comment on attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

Although this patch would fix the problem, I would prefer to fix the wrong memory handling. IMHO it should be sufficient to ensure that

1) the last byte of metadata is \0 when copying data in nsCacheMetaData::UnflattenMetaData()

2) pointer to value doesn't go outside limits
http://hg.mozilla.org/mozilla-central/annotate/3cdf3bd2608c/netwerk/cache/src/nsCacheMetaData.cpp#l53
http://hg.mozilla.org/mozilla-central/annotate/3cdf3bd2608c/netwerk/cache/src/nsCacheMetaData.cpp#l152
Attachment #419406 - Flags: review?(michal.novotny)
Hello, since today build (Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101020 Firefox/4.0b8pre) i getting crash with such signature every time i visiting URL 
http://lurkmore.ru/Caramelldansen

Windows gives me message about error in plugin-container.exe and Firefox crashing at the same time.
For me it seems that Firefox is crashing at moment when youtube video is loading.

Crash ids:

bp-8ef42504-a298-4fe7-878b-ff4522101020
bp-3dd67acd-6cb5-4dcb-bfa7-eff272101020
bp-e5cde7ef-e292-42d1-9a7e-ba1912101020
bp-8fa3ee22-9d3d-4699-8091-e08032101020
bp-1028b789-8da2-4c3a-a9a5-871372101020
bp-04024108-5c56-4588-9a59-e335e2101020

I hope it helps
Flash plugin is 10.1.85.3
Works fine for me using 20101020 with Flash 10.1.85.3 on WinXP.
V. Korn, I would guess one of your extensions is required for the crash
to occur.  Can you figure out which one it is?
It is #9 top crasher in 4.0b12pre and #65 top crasher in 4.0b11.
It is #67 top crasher in 4.0b12.

In reply to comment 8
> Can you figure out which one it is?
Session manager seems to be the likely culprit:
  memmove | nsCacheMetaData::SetElement(char const*, char const*)|EXCEPTION_ACCESS_VIOLATION_READ (54 crashes)
     89% (48/54) vs.   0% (245/62143) {1280606b-2510-4fe0-97ef-9b5a22eafe30} (Session Manager, https://addons.mozilla.org/addon/2324)
         74% (40/54) vs.   0% (225/62143) 0.7.4
         15% (8/54) vs.   0% (15/62143) 0.7.4pre
Keywords: reproducible
Summary: Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] → Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] mainly with Session Manager
I think that the patch of comment #6 will prevent some crashes.
I would add an extra check to ensure that:
91 // Update the value in place

92 newSize -= oldValueSize; 
doesn't result in a negative size.
This could happen if the data in the key/value string is malformed,
then oldValueSize could be much bigger than mMetaData + valueSize.
So, an extra check would be:
 NS_ASSERTION(newSize > oldValueSize, "oldValueSize out of range");
just before line 92.
And 'UnflattenMetadata' could at least check that the metadata string ends with 2 zero's, or even better check whether it is a valid { key \0 value \0 } \0 construct before using it.
Adding two zero's at the end will at least prevent getting out of bounds.
Requesting blocking, as this one is now in position 22 on the 4.0b13 top crasher list, and there is a simple and safe patch that will reduce the change of crashing.
blocking2.0: - → ?
Attachment #419406 - Flags: review?(cbiesinger)
We're done for RC unless there is a showstopper. This does not feel like a showstopper to me - #22 topcrash is meaningless to me without knowing:

 - volume / user
 - what platform it affects
relevant stats

its pretty much a start up crash with evidence people might be giving up on 4.0 when hitting it (low repeats).   I'm a bit confused on reading the data around some spikes in daily volume.  they really don't map well to release of betas when we might see higher volume of new installs and updates, but maybe there is a delayed (or press cycle) effect in play there.

daily crash volume
         memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
20110201 23
20110202 38
20110203 83
20110204 51
20110205 105
20110206 76
20110207 48
20110208 51    beta11
20110209 59
20110210 54
20110211 51
20110212 52
20110213 74
20110214 70
20110215 166
20110216 120
20110217 156
20110218 80
20110219 66
20110220 78
20110221 101
20110222 107
20110223 109
20110224 104
20110225 83     - beta12 released
20110226 67
20110227 57
20110228 117
20110301 114
20110302 83


Correlation to startup or time of session
83 total crashes for memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. on 20110302-crashdata.csv
19 startup crashes inside 30 sec.
50 startup crashes inside 3 min.
33 repeated crashes inside 3 min. of last crash

checking --- memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. 20110302-crashdata.csv
found in: 4.0b12 4.0b11 4.0b13pre 4.0b7 4.0b10
release total-crashes
              memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. crashes
                         pct.
all     358047  83      0.000231813
4.0b12  67246   72      0.0010707
4.0b11  15135   6       0.000396432
4.0b13pre 1586  3       0.00189155
4.0b7   2419    1       0.000413394
4.0b10  5515    1       0.000181324

os breakdown
memmove...nsCacheMetaData::SetElement.char.const.,.char.const..Total 83
Win5.1  0.14
Win6.0  0.05
Win6.1  0.81
blocking2.0: ? → .x+
Renominate if we believe that this is a bigger problem than we're seeing, but right now looks like it might be some corrupted caches, etc, not gonna hold release.
Attachment #419406 - Flags: review?(bzbarsky)
Comment on attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

r=me
Attachment #419406 - Flags: review?(bzbarsky) → review+
added [fx4-rc-ridealong] to get this on the radar 
- start up crashes on a brand new firefox 4 seen by hundreds of people a day can hurt adoption, especially if they hit unsuspecting high profile press and bloggers.

bz, you should remove the ridealong tag if there is inordinate risk, or the patch needs more time or baking.
Whiteboard: [fx4-rc-ridealong]
> +   // Clear the new bytes and the two fence bytes
> +   memset(buf + mBufferSize, 0, bufSize - mBufferSize + 2);

Why not to clear just the 2 fence bytes?


> And 'UnflattenMetadata' could at least check that the metadata string ends with
> 2 zero's, or even better check whether it is a valid { key \0 value \0 } \0
> construct before using it.
> Adding two zero's at the end will at least prevent getting out of bounds.

This would be IMO the correct solution and we wouldn't need these fence bytes. The difference is that with the current fix we unflatten incorrect metadata and use it. We only fail when we try to update it. On the other hand with a proper check in UnflattenMetadata() we would fail already while loading the cache entry.
Comment on attachment 419406 [details] [diff] [review]
V1: Make MetaData more crash proof

> +   // Clear the new bytes and the two fence bytes
> +   memset(buf + mBufferSize, 0, bufSize - mBufferSize + 2);

Why not to clear just the 2 fence bytes?


> And 'UnflattenMetadata' could at least check that the metadata string ends with
> 2 zero's, or even better check whether it is a valid { key \0 value \0 } \0
> construct before using it.
> Adding two zero's at the end will at least prevent getting out of bounds.

This would be IMO the correct solution and we wouldn't need these fence bytes. The difference is that with the current fix we unflatten incorrect metadata and use it. We only fail when we try to update it. On the other hand with a proper check in UnflattenMetadata() we would fail already while loading the cache entry.
This one is probably enough, but possibly we need to combine it with the crashproofing check of V1.
Attachment #517680 - Flags: review?(bzbarsky)
Comment on attachment 517680 [details] [diff] [review]
V2: Check the metadata structure before using it

I don't know enough about the metadata format to mark r+ here (Michal might be a better choice), but this patch doesn't look like it was ever tested... Specifically, the C++ "!=" operator doesn't do any assignment, so |odd| always ends up false.
Attachment #517680 - Flags: review?(bzbarsky) → review-
Michal, I have based this patch on your patch from the dup'ed bug, and made the check even more thorough/crash proof. Can you review this?
Attachment #419406 - Attachment is obsolete: true
Attachment #517680 - Attachment is obsolete: true
Attachment #518001 - Flags: review?(michal.novotny)
Attachment #419406 - Flags: review?(cbiesinger)
I'll review it today.
Comment on attachment 518001 [details] [diff] [review]
V3: More thorough check of metadata format

> +   // Check if the metadata ends with a zero byte
> +   // and if longer than 1 byte, with two zero's.
...
> +   // Check that there are an odd number of zero bytes
> +   // to match the pattern { key \0 value \0 } \0

This is wrong. There is no extra null at the end of the data. I.e. we should check just the ending null and the count of nulls should be even.


> +    NS_ASSERTION(data == limit, "Metadata corrupted");
...
> +    NS_ASSERTION(data <= limit, "Metadata corrupted");

Why did you omit these assertions from my patch? (those at the end of GetElement() and VisitElements()). Either we should add them too, or we could assume that the crash is caused only by unflattening corrupted metadata and in this case we don't need to add any assertion/abort in GetElement() and VisitElements().
Attachment #518001 - Flags: review?(michal.novotny) → review-
No reviewed patch, no ridealong status, but bsmedberg thinks we should keep it in the nomination queue and watch the crash stats on RC.
blocking2.0: .x+ → ?
Whiteboard: [fx4-rc-ridealong] → [watching crash-stats]
Summary: Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] mainly with Session Manager → Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] [@ memcpy | nsCacheMetaData::SetElement(char const*, char const*) ] mainly with Session Manager
> watch the crash stats on RC

here is what to watch for increases in.

there was a small residual crash that was 4.0x only of about 20-30 crashes per day up up until b10.  after b10 volume is progressively higher.  with the recent growth in users its hard to say if there was another regressiion spike around b11 or b12.

looks like a combined 100-150 crashes per day through b12.

         memcpy...nsCacheMetaData::SetElement.char.const...char.const..
date     total    breakdown by build
         crashes  count build, count build, ...

20110301   
20110302 4 4.0b13pre2011030103 	4 , 
20110303   
20110304 2 4.0b13pre2011030103 	2 , 
20110305 3  	2 4.0b13pre2011030403, 
        		1 4.0b13pre2011030103, 
20110306 5  	4 4.0b13pre2011030503, 
        		1 4.0b13pre2011030603, 
20110307 3  	2 4.0b13pre2011030703, 
        		1 4.0b112011020314, 
20110308 15  	9 4.0b13pre2011030703, 
        		3 4.0b13pre2011030803, 	3 4.0b13pre2011030603, 
20110309 9 4.0b13pre2011030803 	9 , 

        memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
date     total    breakdown by build
         crashes  count build, count build, ...

20110301 114  	109 4.0b122011022221, 
        		4 4.0b112011020314, 	1 4.0b13pre2011030103, 
20110302 83  	72 4.0b122011022221, 
        		6 4.0b112011020314, 	2 4.0b13pre2011030203, 
        		1 4.0b72010110414, 	1 4.0b13pre2011030103, 
        		1 4.0b102011012116, 
20110303 139  	108 4.0b122011022221, 
        		16 4.0b13pre2011030303, 	7 4.0b112011020314, 
        		6 4.0b82010121417, 	1 4.0b22010072019, 
        		1 4.0b13pre2011030203, 
20110304 142  	132 4.0b122011022221, 
        		4 4.0b82010121417, 	2 4.0b13pre2011030312, 
        		2 4.0b112011020314, 	1 4.0b13pre2011030303, 
        		1 4.0b13pre2011022803, 
20110305 118  	115 4.0b122011022221, 
        		2 4.0b112011020314, 	1 4.0b32010080519, 
20110306 66  	61 4.0b122011022221, 
        		2 4.0b112011020314, 	1 4.0b22010072019, 
        		1 4.0b13pre2011030312, 	1 4.02011030319, 
20110307 132  	130 4.0b122011022221, 
        		1 4.0b62010091408, 	1 4.0b112011020314, 
20110308 91  	74 4.0b122011022221, 
        		8 4.0b112011020314, 	6 4.0b13pre2011030312, 
        		3 4.0b102011012116, 
20110309 98  	68 4.0b122011022221, 
        		13 4.02011030319, 	12 4.0b13pre2011030312, 
        		3 4.0b13pre2011030603, 	1 4.0b92011011019, 
        		1 4.0b12010063014,
I'll ask review as soon as I have tested this more thoroughly.

As for the review comments:
1. It is indeed an even number of zero's to check for, with one zero byte at the end.
2. Added the two assert's, but I think they won't catch anything.

With the check for *even* number of zero bytes included, the metadata buffer handling cannot fail.
Attachment #518001 - Attachment is obsolete: true
blocking2.0: ? → -
bsmedberg said that it wouldn't be that important to be taken on branch, so we'll get it for Firefox 5.
Your comment was:

    update on the stats

    ound in: 4.0 4.0b12 4.0b13pre 4.0b11
    release total-crashes
                  memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
    crashes
                             pct.
    all      370362    102    0.000275406
    4.0    32055    53    0.00165341
    4.0b12    62897    46    0.000731354
    4.0b13pre    1376    2    0.00145349
    4.0b11    7416    1    0.000134844

    os breakdown
    memmove...nsCacheMetaData::SetElement.char.const.,.char.const..Total 102
    Win5.1  0.12
    Win6.0  0.09
    Win6.1  0.79


    Correlation to startup or time of session
    102 total crashes for
    memmove...nsCacheMetaData::SetElement.char.const.,.char.const.. on
    20110310-crashdata.csv
    49 startup crashes inside 30 sec.
    70 startup crashes inside 3 min.


     urls for testing
       9 http://www.mediaite.com/
       6 \N
       5 http://alayia-bos.mhcstudios.com/
       3 http://www.wowprogress.com/character/eu/crushridge/Bellecosce
       3 http://www.tumblr.com/dashboard
       3 http://www.sexadir.co.il/forum/default.asp
       3 http://tavria.org.ua/forum/private.php?do=showpm&pmid=242321
       2 http://www.slickdeals.net/
       2 http://www.poterie-du-monde.com/poteries-classiques,fr,2,31.cfm

    a little lower volume over the weekend.

             memmove...nsCacheMetaData::SetElement.char.const.,.char.const..

    date     total    breakdown by build
             crashes  count build, count build, ...

    20110310 102      53 4.02011030319, 
                    46 4.0b122011022221,     2 4.0b13pre2011030312, 
                    1 4.0b112011020314, 
    20110311 129      105 4.02011030319, 
                    18 4.0b122011022221,     3 4.0b13pre2011030312, 
                    1 4.0b82010121417,     1 4.0b13pre2011031003, 
                    1 4.0b102011012116, 
    20110312 90      73 4.02011030319, 
                    8 4.0b13pre2011031203,     4 4.0b122011022221, 
                    3 4.0b13pre2011031103,     1 4.0b32010080519, 
                    1 4.0b13pre2011030312, 
    20110313 93      77 4.02011030319, 
                    7 4.0b13pre2011031203,     5 4.0b122011022221, 
                    4 4.0b13pre2011031303,
I strongly disagree that this is not a major bug - it is. It is preventing me from visiting ne of my favorite forums at http://forums.groundspeak.com/GC/, and you can see my troubleshooting here http://forums.mozillazine.org/viewtopic.php?f=38&t=2135783. It crashes with all add-ons disabled so it is clearly a fault in FF4 and not the add-ons.
(In reply to comment #10)
> It is #67 top crasher in 4.0b12.
> 
> In reply to comment 8
> > Can you figure out which one it is?
> Session manager seems to be the likely culprit:
>   memmove | nsCacheMetaData::SetElement(char const*, char
> const*)|EXCEPTION_ACCESS_VIOLATION_READ (54 crashes)
>      89% (48/54) vs.   0% (245/62143) {1280606b-2510-4fe0-97ef-9b5a22eafe30}
> (Session Manager, https://addons.mozilla.org/addon/2324)
>          74% (40/54) vs.   0% (225/62143) 0.7.4
>          15% (8/54) vs.   0% (15/62143) 0.7.4pre


How did you come to that conclusion?  From my brief check, while a number of people seeing this crash have Session Manager installed, many do not.  Some don't have any add-ons installed.  For example:

bp-f7639368-cce8-4096-9348-59f5f2110328
bp-ef6d5e4d-67f7-4583-86db-d2e1f2110327
bp-f15ee523-155c-406b-b43d-18adf2110328


I will mention though that Session Manager has an implementation of "Cache fixer" in it (from a ways back) which removes the "dirty" flag from the _CACHE_MAP_ on startup which prevents the cache from being wiped on a crash. See http://imglikeopera.mozdev.org/cache_fixer.html.  

It is currently enabled by default, but I had planned to change that to be disabled by default.  If this is somehow corrupting the cache data, I can simply remove the feature.
Michael, it's quite possible that there are several things that can cause the cache corruption and that Session Manager is one of them.

What the numbers above are saying that of the 54 people who hit this crash, 48 had Session Saver installed and 6 did not (hence it's not the _only_ reason for the crash).  In general, of 62143 people submitting crash reports, for all crashes, 245 had Session Saver installed, so it's not like most our user base is using Session Saver.
up to over 600 crashes per day.

         memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
date     total    breakdown by build
         crashes  count build, count build, ...


20110326 506  	492 4.02011031805, 
        		6 4.02011030319, 	5 4.0b122011022221, 
        		1 4.2a1pre2011032603, 	1 4.0b82010121417, 
        		1 4.0b72010110414, 
20110327 610  	595 4.02011031805, 
        		13 4.0b122011022221, 	2 4.02011030319, 


> Created attachment 518711 [details] [diff] [review]
> V4: Check for even number of zero's
> I'll ask review as soon as I have tested this more thoroughly.

any progress on the testing and review?

are there try builds available for any one who sees this so they could help out in testing?
Resetting the 'dirty' flag of the cache, and thereby using a dirty cache will cause problems...
I had this crash going on since yesterday. It crashed over 10-20 times.

It all started when FF4 hanged on a URL that had bad url redirection loop.
The memory usage went up to 1.5GB and the CPU usage to 50%, basically using one CPU core to 100%.

I tried to quit FF and after a long wait of swapping and swapping it eventually crashed.

I tried to restore the session but it kept crashing every time.

In the end I tried to start FF without restoring session. It started ok. When I though tried to load a backup of previous session via session manager it crashed again.

So slowly slowly I started to narrow down which tab was causing the crash.
I identified the tab and URL and tried the same session that was causing the crash on a different pc with FF4. It didn't crash on other pc.

I then restarted FF4 (not restoring any session) and tried typing the URL that was causing the problem. It crashed again. So this proved to me that the problem was not the session manager or the state of the tab / session. It must have been the cache. So I cleared the cache and the crashes stopped.

So clearly the problem was with corrupted data in the cache in relation to that specific URL.
If anyone can reproduce this crash in future, please make a backup copy your
cache files so we can analyze them and try to figure out what the bug is.
Could bug 616117 be related to this?
Bug 634920 was duped to this, but this bug is for Windows. Can we either un-dupe, or broaden the platform for this?

I have a user who's seeing this crash on Mac. Seeing if I can get a tarball of ~/Library/Caches/Firefox.
> I have a user who's seeing this crash on Mac. Seeing if I can get a tarball of
> ~/Library/Caches/Firefox.

Great.  Please inform him that cache files may contain sensitive data and
when attaching it here he should select the Privacy checkbox to restrict
access to people with security bug access.
http://www.mozilla.org/projects/security/secgrouplist.html
OS: Windows XP → All
Hardware: x86 → All
(In reply to comment #42)
> > I have a user who's seeing this crash on Mac. Seeing if I can get a tarball of
> > ~/Library/Caches/Firefox.
> 
> Great.  Please inform him that cache files may contain sensitive data and
> when attaching it here he should select the Privacy checkbox to restrict
> access to people with security bug access.
> http://www.mozilla.org/projects/security/secgrouplist.html

Cache directory is 1.6GB uncompressed. Can you suggest a subset of files that would be helpful, rather than me schlepping the whole thing around?

Once the cache directory has been obtained, can I clear the cache to allow her to continue using Firefox without frequent crashes?

This was the last crash:

https://crash-stats.mozilla.com/report/index/1203ef99-943f-4980-b625-e461c2110416
> Cache directory is 1.6GB uncompressed. Can you suggest a subset of files

I'm not really an expert on the network cache but I think we only need
the Cache/ folder and its contents, not the *.mfasl files or urlclassifier*
(that should make it a lot smaller)

> Once the cache directory has been obtained, can I clear the cache...

Before you clear the cache, please check the disk usage (output from the
"df -k" command for example).  Also, load about:config and type "cache"
in the filter box, then note any values for browser.disk.cache.*
that has status "user set" (you can click the column head to sort them).

Thanks for your help, it's really appreciated.
(In reply to comment #44)
> > Cache directory is 1.6GB uncompressed. Can you suggest a subset of files
> 
> I'm not really an expert on the network cache but I think we only need
> the Cache/ folder and its contents, not the *.mfasl files or urlclassifier*
> (that should make it a lot smaller)

The Cache/ folder is 1.5GB. urlclassifier* is 45MB. XUL.mfasl is 1.7MB.

Of the 150251 cache files, only 11043 were modified within a day of the reported crash. That should bring the size of the cache archive down to perhaps 70MB compressed.

Is there a better way to narrow down a working set?
 

> Before you clear the cache, please check the disk usage...

Thanks Mats. There's plenty of free disk:

Filesystem    1024-blocks     Used Available Capacity  Mounted on
/dev/disk0s2    487919600 85087324 402576276    18%    /
devfs                 109      109         0   100%    /dev
map -hosts              0        0         0   100%    /net
map auto_home           0        0         0   100%    /home


These are the only user-set cache params:

browser.cache.disk.capacity: 1048576
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size_cached_value: 1048576
First of all, please backup the whole Cache directory (while firefox is not running). Even normal usage of the browser could evict broken entries so we would lose them. Then there are 2 options (let me know which do you prefer).

1) We could see which entry is broken from the log in case it crashes again. Start NSPR logging (see https://developer.mozilla.org/en/HTTP_Logging) and make sure you're logging http and cache activity (i.e. NSPR_LOG_MODULES=nsHttp:5,cache:5). When the browser crashes attach the log to this bug.

2) I could write a small utility that would go through the cache and point out the wrong entry.
(In reply to comment #46)
> First of all, please backup the whole Cache directory (while firefox is not
> running).

I have a complete snapshot.

> 2) I could write a small utility that would go through the cache and point out
> the wrong entry.

This would be the most convenient option for me. (And certainly most reusable!)

This is my mum's machine, and there's an 8-hour time difference, with commensurate latency; anything I can do over SSH without her involvement would make my day!
I can confirm that I had this crash for one particular site, going on for weeks, reproducible 100%, every single time I would go to the site it would crash - even with all extensions disabled (I do have session manager installed).

I cleared the cash and have not had a crash since.
(In reply to comment #49)
> *** Bug 659833 has been marked as a duplicate of this bug. ***

Should I save my cache first before I clear it?
I'm experiencing this crash as well.  I visit a page and Firefox crashes while loading it.  I'm always able to reproduce it on the page it crashed on after I restart Firefox.  Sometimes I need to refresh a few times to get it to happen.  I also got it to crash while in safe mode on one of these problematic pages, but, interestingly, it wasn't recognized as being related to this bug.  I do have Session Manager installed and enabled.  Windows 7 64, Firefox 4.0.1.

Here is a list of crashes I've reported since they first started happening in this pattern (roughly).  I didn't check all of them, but I know at least half refer to this bug.


bp-9d510644-18c3-45af-82b9-84a212110527    5/27/2011 4:14 AM
^^^Crash in safe mode on the same page, but not related to this bug

bp-ffb4270f-e62b-4c7e-a246-d7ed62110526    5/27/2011 2:25 AM
bp-dd11c7d6-2c67-46d4-8833-7ec2a2110526    5/27/2011 1:32 AM
bp-5e7775a5-bd1e-4e6f-8b3a-9739d2110526    5/27/2011 1:15 AM
bp-9b28e27e-f811-4b63-8a35-954a42110526    5/27/2011 1:14 AM
bp-fcdbea69-21f2-4d53-aca3-042472110526    5/27/2011 1:11 AM
bp-484df0a3-a51d-4735-be76-36e082110526    5/27/2011 12:41 AM
bp-72b1d5e6-012d-4ecd-a745-78aa52110526    5/27/2011 12:11 AM
bp-d47df0dc-b1f1-478a-9e21-7be192110526    5/27/2011 12:08 AM
bp-0b12e2a3-0e0b-4878-85c1-e38082110526    5/27/2011 12:07 AM
bp-f7e56991-254b-45fc-8cb8-1ad8e2110526    5/26/2011 11:25 PM
^^^I believe all of these were on the same page:
http://support.github.com/discussions/repos/5439-git-and-renaming-remote-branches

bp-ca5bbae9-8049-4b49-89b3-aa4892110525    5/25/2011 4:09 PM
bp-3bf6255b-7732-424a-a089-b94242110524    5/24/2011 9:09 PM
bp-d435a8be-fcf2-4bf6-82cc-597c02110524    5/24/2011 6:38 PM
bp-4e5e3ddd-3225-498d-b457-aa6342110521    5/21/2011 1:47 PM
bp-a014c557-4e70-49c9-872b-750e52110520    5/20/2011 4:58 AM
bp-c2fbffa3-0c83-4bf0-94ae-74b842110520    5/20/2011 4:55 AM
bp-1e1d35fd-430c-4a66-ba36-44c532110516    5/16/2011 9:50 PM
bp-6f31fdae-3f7d-46fc-b60c-610df2110516    5/16/2011 9:48 PM
^^^First crash related to this bug

bp-275bc3cd-dff6-4702-accd-883902110516    5/16/2011 3:30 PM
bp-83981cd5-225c-4776-90fc-8dc112110511    5/11/2011 3:07 PM
^^^First crash for about 3 months

I checked out what Firefox listed for my cache, and there is definitely something weird going on.  Some files say they've been accessed slightly less than a billion times, some 700,000.  These large numbers will often repeat themselves for different files.  Some files say they were last modified in 1982 or 2029.  I looked at some of these anomalous files and their contents usually don't make sense.  A GIF file will say that the server responded with some javascript, stuff like that.  I've also been experiencing images that don't make sense on some websites, I assume because of this cache issue.  For example, the background of this page had a bunch of stars on it before I refreshed.


I made a copy of my cache directory.  I also logged when the crash happened as suggested in comment 46.  I copied over all files that were modified on or after the 26th (when my most recent string of crashes happened) and 7zipped them up (36 megs).  However, when I went to attach the archive and log file, I see no privacy checkbox as suggested in comment 42.  Is there any way to privately attach my files?
5.0 regression that we really should get on top of.  looks like there are some thoughts on a patch, but they never made it to check-in state.  looks like it should work its way into the new networking team priorities.
5.0 regression? This bug was filed against 4.0 betas. chofmann, are you saying that this crash is much higher in 5.0 betas than 4.0 betas, or something else? I see it as #34 in the list, which isn't very high to be tracking against.
It is #72 top crasher in 4.0.1 and #25 top crasher in 5.0b2 (without hangs).
> 5.0 regression

er, sorry. 4.0 regression.  

it appears that it might also be rising in volume associated with major update offers and other increases in migration from 3.x up to 4.0.x and 5.0, then volume declines when the influx of new updates levels off.

we hit a peak of 339 crashes per day on may 18.

        memmove...nsCacheMetaData::SetElement.char.const.,.char.const..
date     total    breakdown by build
         crashes  count build, count build, ...

20110518 339  	321 4.0.12011041322, 
        		12 4.02011031805, 	2 5.02011042714, 
        		1 6.0a12011051703, 	1 6.0a12011051603, 
        		1 6.0a12011051003, 	1 4.0b122011022221,
I was having this problem on specific pages for awhile now, even with all add-ons and plugins disabled.

It finally resolved itself when I told Firefox to clear out its ~650MB (auto-managed cache size), which took a good couple of mins.

Now all is good. Very bizarre.
(In reply to comment #51)
> However, when I went to attach the archive and log file, I see no privacy
> checkbox as suggested in comment 42.  Is there any way to privately attach
> my files?

I'm not sure but I think that you can check the security flag only while creating a new bug. Anyway, feel free to send me the files directly via email.
There is in fact a per-attachment privacy flag, but only security group members can set it, and only security group members can see the resulting attachment...
After spending days pulling my hair out while firefox crashed over and over, I was finally able to fix this problem by clearing the cache.  Because I was seeing this crash with 100% consistency on a number of specific unrelated sites, I'm tempted to think that the problem is being caused by firefox doing something which is corrupting data in the cache.  I wish I could remember the exact circumstances under which I first saw this, but unfortunately I don't.  All of that said, the browser really should be robust enough to handle a corrupted cache - even if that means automatically dumping the cache on the next launch after a corrupt cache causes a crash, or, if you're reluctant to delete a user's data, just don't load the cache and give the user a message explaining the issue and giving them the option to immediately clear the cache permanently.
There is a "Privacy" checkbox at the bottom of the attachment pane if you want to add what you captured. We will probably need more information than that in order to be able to reproduce this bug.


> I made a copy of my cache directory.  I also logged when the crash happened
> as suggested in comment 46.  I copied over all files that were modified on
> or after the 26th (when my most recent string of crashes happened) and
> 7zipped them up (36 megs).  However, when I went to attach the archive and
> log file, I see no privacy checkbox as suggested in comment 42.  Is there
> any way to privately attach my files?
Crash Signature: [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] [@ memcpy | nsCacheMetaData::SetElement(char const*, char const*) ]
Crash Signature: [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] [@ memcpy | nsCacheMetaData::SetElement(char const*, char const*) ] → [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] [@ memcpy | nsCacheMetaData::SetElement(char const*, char const*) ]
Clearing tracking flag - this will be tracked by crashkill going forward.
Sheila, can you or someone else on crashkill update us on this? Do we still need to be tracking it for Firefox 6 Beta?
It is #5 top browser crasher in 6.0b2.
So, as it is now in the top 5, and there is a patch available, can this patch be baked on the nightly branch, and possibly even on Aurora, so that we can test whether this fixes the problems?
Doesn't the patch need review?
Comment on attachment 518711 [details] [diff] [review]
V4: Check for even number of zero's

Review of attachment 518711 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #518711 - Flags: review?(michal.novotny)
Comment on attachment 518711 [details] [diff] [review]
V4: Check for even number of zero's

> +        NS_ABORT_IF_FALSE(value < limit, "Cache Metadata corrupted");
>          if (strcmp(data, key) == 0)
>              return value;
>  
>          // Skip value part
>          data = value + strlen(value) + 1;
>      }
> +    NS_ASSERTION(data == limit, "Metadata corrupted");

To be consistent, either change NS_ABORT_IF_FALSE to NS_ASSERTION or the other way round.
Attachment #518711 - Flags: review?(michal.novotny) → review+
Does the patch land on m-c?
Once someone updates it to comment 69, right?
Attachment #518711 - Attachment is obsolete: true
Just wanted to say that I was experiencing this bug and also have session manager installed (although it was reproducible with it disabled).
What needs to happen here, we have a patch, but no reviews. It's a high top crasher, seems we need to proceed here if we know what's needed, or come up with additional steps to learn more about this top crash.
I reported that I was using Session Manager, but only mentioned it as a diagnostic tool.  I have no reason to believe that Session Manager is a contributor to the problem.
Comment on attachment 548801 [details] [diff] [review]
V4: Replaced NS_ASSERTION with NS_ABORT_IF_FALSE

This just has minor fixups that michal wanted from last +r version, so I'm going to guessing we can forward the +r on it.
Attachment #548801 - Flags: review+
We haven't got any new information and don't think we need to track it for 6 (as it has been seen all the way back to 1.9.3)
So do we think we have a fix for this now? We aren't going to track it for 6.0 anymore but it regressed somehow with 5.0 so it would be great to get fixed.
It is now only #131 top browser crasher in 5.0 (drop 89 ranks in one week) and #108 in 6.0 (drop 63 ranks in one week).

The drop started with the release of version 0.7.6 of Session Manager on July 27 that removed Cache Fixer:
https://addons.mozilla.org/firefox/addon/session-manager/
See also 5.0 correlations by add-ons:
  memmove | nsCacheMetaData::SetElement(char const*, char const*)|EXCEPTION_ACCESS_VIOLATION_READ (31 crashes)
     32% (10/31) vs.   0% (308/90925) {1280606b-2510-4fe0-97ef-9b5a22eafe30} (Session Manager, https://addons.mozilla.org/addon/2324)
         26% (8/31) vs.   0% (149/90925) 0.7.5
          6% (2/31) vs.   0% (158/90925) 0.7.6
          0% (0/31) vs.   0% (1/90925) 0.7.6.1

I renamed the bug summary.
Summary: Crash [@ memmove | nsCacheMetaData::SetElement(char const*, char const*)] [@ memcpy | nsCacheMetaData::SetElement(char const*, char const*) ] mainly with Session Manager → Crash in nsCacheMetaData::SetElement
Comment on attachment 548801 [details] [diff] [review]
V4: Replaced NS_ASSERTION with NS_ABORT_IF_FALSE

Michal: so it seems much of the crashing was Session manager related:  but do we still want to land this patch?  I'm guessing the answer is yes.
Yes, we should land this patch.
In my queue, which is being sent to try and then presuming green, I'll land on inbound later today.
http://hg.mozilla.org/mozilla-central/rev/72f1510e3f31
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 726417

restricting comments to avoid spam

Restrict Comments: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: