Closed Bug 519356 Opened 15 years ago Closed 13 years ago

startup Crash at [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] and [@ XPT_DestroyArena]

Categories

(Core :: General, defect, P2)

1.9.1 Branch
x86
Windows Vista
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- -

People

(Reporter: cmtalbert, Unassigned)

References

Details

(Keywords: crash, intl, topcrash, Whiteboard: [crashkill][tbird topcrash-])

Crash Data

Attachments

(1 file)

This appears to be caused by either running in windows compatibility mode on Windows Vista and/or malware.  It appears to be a startup crash, such that we are freeing something during startup after it has been freed.

We need to attempt to reproduce this by running it in Vista under compatibility mode.

Dbaron, can you get an update on the DLL correlation for this once you get more crash data?  That way we might understand what if any malware is horking our heap.
OS: Mac OS X → Windows Vista
The correlation data at http://dbaron.org/mozilla/topcrash-modules show:

  arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena (48 crashes)
    100% (48/48) vs.   2% (123/7100) shimeng.dll
     50% (24/48) vs.   0% (32/7100) aclayers.dll
     50% (24/48) vs.   1% (49/7100) AcLayers.dll
     50% (24/48) vs.   1% (56/7100) AcGenral.dll
     50% (24/48) vs.   1% (61/7100) acgenral.dll

I'm told those libraries are associated with Vista compatibility mode, so that this crash may just mean we crash on startup when running in Vista compatibility mode, or do so under some cases.
Blocking 1.9.2+ per CrashKill effort.
Flags: blocking1.9.2+
One data point - I've been running in XPSP2 compatibility mode here for a little bit in 3.5 and haven't seen any issues. Windows 200 also seems to work ok. Anything below that results in a failed launch, but I think that's to be expected.
Interesting results using this. Version info apis are hooked, so you get the XP interface on Vista, we would also probably make some code path decisions based on this.

I found some good info on all this on a a couple ms blogs:

http://www.nynaeve.net/?p=62
http://www.alex-ionescu.com/?p=39
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
Priority: -- → P2
What's interesting is that loading in one of those compat modes below 2000 results in the crashreporter coming up, but I can't get it to fall into windbg to save my life.  And while the crashreporter claims to have submitted the report successfully, I don't see any crashes with this signature in the last hour and I've crashed it at least 2 dozen times trying to get windbg to grab the crash.  I'm not sure if it is throttled or what.  There are no references to crashIDs in my crashes directory in the profile either :(

Will continue testing.
iirc if Firefox is directly launched (e.g. CreateProcess, possibly ShellExecute using the path to firefox.exe) by an app that is in compatibility mode it will inherit the compatibility mode from the app that launched it.
The vast majority of these crashes have very low uptimes, which suggests that they're crashes on startup.  Then again, the stack also suggests that it's on startup.
Can we try to run these pages on Windows 7 in compat mode?
Things we could try:
 * running in compatibility mode on both windows 7 and vista
 * starting on a clean profile in compatibility mode (since xpt reading is different on first startup)

Also, it looks like [@ XPCJSRuntime::TraceXPConnectRoots(JSTracer*, int)] is also strongly correlated with shimeng.dll, AcLayers.dll, and AcGenral.dll.
re: comment 8
yeah. and just about no crash reports have urls assoicated with the crash

domains of sites
 773 //
   5 \N//
   1 http://www.tuenti.com
   1 http://www.orkut.com.br
   1 http://www.online.no
   1 http://www.last.fm
   1 http://mlb.mlb.com
   1 about:blank//


784 total crashes for XPT_DestroyArena on 20090929-crashdata.csv
469 start up crashes inside 3 minutes

XPT_DestroyArena signature breakdown
signature distribution
       2
signature list
 781 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena
   3 XPT_DestroyArena

os breakdown
 378  Windows NT 5.1.2600 Service Pack 3
 333  Windows NT 5.1.2600 Service Pack 2
  21  Windows NT 5.1.2600 Dodatek Service Pack 3
  12  Windows NT 5.1.2600 Service Pack 1
  11  Windows NT 5.1.2600 Dodatek Service Pack 2
  11  Windows NT 5.1.2600
   4  Windows 4.10.67766446
   2  Windows NT 5.1.2600 Service Pack 3, v.3264
   2  Windows 4.0.950
   1  Windows NT 6.0.6002 Service Pack 2
   1  Windows NT 6.0.6000
   1  Windows NT 5.2.3790 Service Pack 2
   1  Windows NT 5.1.2600 Szervizcsomag 3
   1  Windows NT 5.1.2600 Szervizcsomag 2
   1  Windows NT 5.1.2600 Service Pack 2, v.2096
   1  Windows 4.10.67766446 Service Pack 3
Sounds like we should at least detect that we're running in compat mode, and if so, maybe offer to the user to restart in normal mode?  Maybe always restart in normal mode.  It'd be interesting to find out why we'd ever be launched in compat mode, but I think rs might be on to something -- if Firefox isn't open, and another app that is in compat mode attempts to launch the default browser to open a URL, we'll get launched in compat mode.

... is that really true?  That seems excessive, like I'm pretty sure IE wouldn't like running in compat mode if IE was the default browser.  Is there some way to prevent us from ever running in compat mode?
(In reply to comment #12)
>...
> ... is that really true?  That seems excessive, like I'm pretty sure IE
> wouldn't like running in compat mode if IE was the default browser.  Is there
> some way to prevent us from ever running in compat mode?
I'm very sure it is true when the app launches us using createprocess though I'm not sure about when it uses shellexecute and it specifies the path to firefox.exe. I am pretty sure it doesn't happen when using shellexecute and a file / protocol that firefox handles. Basically, apps shouldn't be using createprocess to launch other apps but there are a few that did so in the past that caused us to run in 256 color mode due to the app that launched us requiring 256 color mode.
Whiteboard: [crashkill]
I've tried every compatibility setting on Vista and Windows 7 with the 3.5.x nightly and the 3.5.4 release. I tried:
(on windows 7):
* Windows Vista
* Vista sp1
* Vista sp2
* XP sp 2
* XP sp 3
* 2000
* NT 4
* Server 2003

(on Windows Vista:)
* Server 2003
* XP SP 2
* 2000
* NT 4

I have not been successful getting this bug to reproduce in any of these settings.  I've never crashed at all in this compatibilty testing, unless I try to run under Windows 9x compat mode.
In today's crashkill meeting dmandelin asked me whether or not this might be related to the changes he's made in bug 515211.  I decided to look into it this afternoon.  

= Short answer =
No, it doesn't look related

= Long answer =
It seems that this comes about because we are throwing an access violation when trying to free the mStructArena member of an xptiWorkingSet object.

The mStructArena is used to allocate space for all sorts of objects of different sizes [1].  But, the xptiWorkingSet involved in this crash is allocated locally in xptiInterfaceInfoManager::AutoRegisterInterfaces [2] and deallocated a short time later when we return from that function thinking that no files have changed [3].

Following the code through (if I'm reading correctly, I'm building a 1.9.2 tree to test this out), we only make one call between these two points that allocates something on the mStructArena block.  That is done by xptiManifest::Read when it builds interface objects using xptiInterfaceEntry::NewEntry for the interfaces it finds in the xpti.dat manifest [4].

= Hypothesis =
Then from looking at the code, either that allocation is not allocating the proper size for the interfaceEntries it is trying to store or we are hitting some odd failure case where we fail to read the manifest but still believe that things have not changed - i.e. the value of ok returned from the xptiManifest::Read indicates an error that is fatal in some cases.

Because this crash is so strongly correlated with running in compatibility mode, I wonder if we are getting an inconsistent value for sizeof when we are trying to do the malloc.  For instance if the "compatible" mode has a different default byte packing size for structs than the "normal" mode on a given OS, perhaps we'd see some sort of inconsistency.  But, were that the case, I would expect this to be more reproducible than it is.

= Requests =
I'm building 1.9.2 right now to test my theory and make sure that I'm right about the xptiInterfaceEntry::NewEntry being the only thing allocating to the mStructArena.  If I could get a full dump of one of these crashes that would help me to see what that "ok" value is coming back from xptiManifest::Read and might shed more light on what is going on.  How do I get my hands on one of those dumps for this crash?

[1]: http://mxr.mozilla.org/mozilla1.9.2/ident?i=GetStructArena
[2]: http://mxr.mozilla.org/mozilla1.9.2/source/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp#1893
[3]: http://mxr.mozilla.org/mozilla1.9.2/source/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp#1923
[4]: http://mxr.mozilla.org/mozilla1.9.2/source/xpcom/reflect/xptinfo/src/xptiManifest.cpp#635
(In reply to comment #15) 
> The mStructArena is used to allocate space for all sorts of objects of
> different sizes [1].  But, the xptiWorkingSet involved in this crash is
> allocated locally in xptiInterfaceInfoManager::AutoRegisterInterfaces [2] and
> deallocated a short time later when we return from that function thinking that
> no files have changed [3].
> 
> Following the code through (if I'm reading correctly, I'm building a 1.9.2 tree
> to test this out), we only make one call between these two points that
> allocates something on the mStructArena block.  That is done by
> xptiManifest::Read when it builds interface objects using
> xptiInterfaceEntry::NewEntry for the interfaces it finds in the xpti.dat
> manifest [4].
I did some debugging this morning on Windows 7 with my shiny new 1.9.2 debug build.  During a first start up with a new profile, we hit this code on the second registration startup.  The xptiInterfaceEntry::NewEntry code is definitely the only call to alloc in the structArena when we enter xptiInterfaceInfoManager::AutoRegisterInterfaces and leave via the switch statement with mode == NO_FILES_CHANGED.

Running inside compatibility mode didn't change anything in the structArena alignment settings or the way that it allocated items, so my hypothesis about byte alignment and packing doesn't look valid.
Filed bug 527579 for a minidump of this crash.
Never nominated, but marking blocking1.9.2- to explicitly mark [CrashKill] bugs as either blocking or not.  If we can get a patch before RC, we should really consider taking it.
Flags: blocking1.9.2+ → blocking1.9.2-
blocking2.0: --- → ?
Could this be related to the arena_run_dalloc valgrind error (called by arena_dalloc_small) in bug 457223?
that bug has now been marked  WFM and the comments indicated that it dealt mostly with a crash that was very near start up, where this bug might not be.

I've got some new data that indicates the biggest number of people that hit this signature are hitting the crash within 3 seconds of startup.

data from 2010 01 29
seconds after startup
  count
0 640 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
1 244 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
2  50 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
3  20 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
4   7 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
5   9 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
6   2 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
7   8 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
8   2 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
9   5 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
10  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
11  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
12  2 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
13  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
14  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
15  3 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
16  2 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
17  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
19  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
60  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
69  1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
131 1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
467 1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena  
500 1 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena
#23 top crash and jumped 180 slots yesterday in very early firefox 3.6.2 data, so it might also be related to migration and update.
#14 for v3.0.3 Thunderbird. Been a steady topcrash since Thunderbird v3.0

all startup crashes. mostly 0-1 sec, maybe 5 is >1sec going as high as 24sec
70-80% of comments are not in english.  all ms-windows

no 1.9.2 or 1.9.3 crashes, bbut we have very few testers there

typical crash is bp-067dc681-a31f-461b-a03a-ae7f82100307 v3.0.3, 24sec uptime
0	mozcrt19.dll	arena_dalloc_small	 objdir-tb/mozilla/memory/jemalloc/src/jemalloc.c:4102
1	mozcrt19.dll	arena_dalloc	objdir-tb/mozilla/memory/jemalloc/src/jemalloc.c:4225
2	mozcrt19.dll	free	objdir-tb/mozilla/memory/jemalloc/src/jemalloc.c:6012
3	xpcom_core.dll	XPT_DestroyArena	xpcom/typelib/xpt/src/xpt_arena.c:177
4	xpcom_core.dll	xptiWorkingSet::~xptiWorkingSet	xpcom/reflect/xptinfo/src/xptiWorkingSet.cpp:229
5	xpcom_core.dll	xptiInterfaceInfoManager::Release	xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp:50
6	xpcom_core.dll	xptiInterfaceInfoManager::FreeInterfaceInfoManager	xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp:112
Whiteboard: [crashkill] → [crashkill][tbird crash]
Looking at a sample of all the crashes for this signature on 2010 04 26 it looks look 100% of reports have addon compatibility marked "unknown"
Where does this crash stand now in crash stats?  A quick search gets three crashes in the past five weeks.   Clearing the nom for now, but let's get updated info on this and re-nom if it's still an issue.
blocking2.0: ? → ---
I still see 888-1048 crashes per day over the month of May and a ton of crashes in this query spanning within 1 week of 05/28/2010

http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=arena_dalloc_small%20|%20arena_dalloc%20|%20free%20|%20XPT_DestroyArena&date=05%2F28%2F2010%2014%3A20%3A37&range_value=1&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&signature=arena_dalloc_small%20|%20arena_dalloc%20|%20free%20|%20XPT_DestroyArena

I'm guessing that unless we explictly fixed something this still lurks on the trunk waiting for high volume of beta, RC, and Final users to expose.  This is also 99% WinXP so we might not see as many trunk users running that OS.

here is a sample of stats from 2010 05 27 that suggests this bug is still alive and well.  The stats and comments suggest that the crash might be associated with the following factors

1. Vista compatibility mode binaries  (commnent 2)
2. Executing on/porting WinXP as the OS  (commnent 11 and os_breakdown below)
3. Possible Firefox addon compatibility checking turned off (stats below)
4. boom!  crash within 3 second of start up.

#3 is strage because the small sample of reports I looked at show no addons installed.

here are the stats from 2010 05 27

signature list
 919 arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena

Correlation to releases

checking --- arena_dalloc_small...arena_dalloc...free...XPT_DestroyArena 20100527-crashdata.csv
found in: 3.6.3 3.5.9 3.6 3.5b4 3.5.7 3.5.6 3.5.5 3.5.8 3.5 3.0.19 3.0.13 3.5b99 3.6b5 3.5.3 3.0.5 3.6b2 3.6.2 3.1b2 3.0.10 3.6.4 3.5.1 3.0.11 3.0.1 3.6b1 3.5.4 3
.5.2 3.1b3 3.0 3.6b4 3.5.10 3.0.6 3.0.4 3.0.18 3.0.16 3.0.15 3.0.12
release total-crashes
              arena_dalloc_small...arena_dalloc...free...XPT_DestroyArena crashes
                         pct.
all     367855  919     0.00249827
3.6.3   255738  481     0.00188083
3.5.9   32443   189     0.0058256
3.6     12079   33      0.00273201
3.5b4   738     27      0.0365854
3.5.7   1591    25      0.0157134
3.5.6   821     13      0.0158343
3.5.5   1604    13      0.00810474
3.5.8   1076    12      0.0111524
3.5     1320    12      0.00909091
3.0.19  11571   11      0.000950652
3.0.13  163     10      0.0613497
3.5b99  180     9       0.05
3.6b5   836     8       0.00956938
3.5.3   1467    8       0.00545331
3.6b5   836     8       0.00956938
3.5.3   1467    8       0.00545331
3.0.5   1060    8       0.00754717
3.6b2   449     7       0.0155902
3.6.2   3599    14      0.00388997
3.1b2   284     7       0.0246479
3.0.10  972     7       0.00720165
3.6.4   23009   6       0.000260768
3.5.1   506     3       0.00592885
3.0.11  534     3       0.00561798
3.0.1   1600    3       0.001875
3.6b1   512     2       0.00390625
3.5.4   1100    29      0.0263636
3.5.2   1112    2       0.00179856
3.1b3   817     2       0.00244798
3.0     1174    2       0.00170358
3.6b4   719     1       0.00139082
3.5.10  339     1       0.00294985
3.0.6   431     1       0.00232019
3.0.4   615     1       0.00162602
3.0.18  275     1       0.00363636
3.0.16  129     1       0.00775194
3.0.15  513     1       0.00194932
3.0.12  72      1       0.0138889

os breakdown
arena_dalloc_small...arena_dalloc...free...XPT_DestroyArenaTotal 919
Win5.1  0.99
Win6.0  0.002
Win6.0  0.002

addons_checked
   2 checked
 917 [unknown]
blocking2.0: --- → ?
Its also interesting that a pretty high pct of the comments for these crashes are non-english.  From the few that were in english they hint at possible profile corruption.   Maybe limi could use e-mail contacts from this signature in his search for corrupt profiles that might cause slowdowns and crashes.


Ever since the automatic update done on mozilla friday 5-21-10 the next time I opened my browser it crashed.  Please advise.  Mozilla firefox is the only brower I
 have so I am dead in the water at this point. What can be done to fix this problem?
"My firefox has ""crashed"" twice this week.  It comes up in ""Safe  | Mode"" distorting all the screen with oversized and distorted icon/print."
Depends on: profile-corrupt
Summary: Crash at [@arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] → startup Crash at [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena]
The relevant code is debug-only in firefox 4, so this doesn't block. It would be interesting to see whether this crash got moved somewhere else or disappeared entirely.
blocking2.0: ? → -
with respect to Thunderbird, agree with prior assessments about high correlation to XP, and non-English.  This will be a top 30 crash for version 3.1.3
Whiteboard: [crashkill][tbird crash] → [crashkill][tbird topcrash]
adding intl, given the disproportionate non-english comments

some @ XPT_DestroyArenaeg bp-1166e1b5-8fd6-4d3a-8da4-4c0752100904 TB3.1.2, and firefox examples [1] ... of which more than a tiny % are not startup crashes and each of these have URL)
bp-10a5a0ef-db5d-4a6c-92c7-a83d72100913
bp-b0af4d76-5944-4635-8a87-68e282100914
bp-2429b555-1a1f-474f-b55b-1eacc2100906
bp-dd9065dc-dd9a-488f-b9d1-0812b2100916

[1] https://crash-stats.mozilla.com/report/list?product=Firefox&build_id=&query_search=signature&query_type=exact&query=XPT_DestroyArena&date=09%2F16%2F2010%2016%3A02%3A25&range_value=4&range_unit=weeks&hang_type=any&process_type=all&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&signature=XPT_DestroyArena&missing_sig=&page=1
Severity: normal → critical
Keywords: crash, intl
Summary: startup Crash at [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] → startup Crash at [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] and [@ XPT_DestroyArena]
Crash Signature: [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] [@ XPT_DestroyArena]
As expected, crash does not exist in Thunderbird version 5.
only 3 in past month for version 3.1.11
bp-968a458c-5b43-43d4-9428-de5772110705 is one

For firefox, the crashes ended long ago, with only one for 4.0.1 in the past 16 weeks.
https://crash-stats.mozilla.com/report/list?product=Firefox&branch=1.9.3&branch=2.0&branch=2.2&branch=5&branch=6&branch=7&branch=8&query_search=signature&query_type=exact&query=arena_dalloc_small%20%7C%20arena_dalloc%20%7C%20free%20%7C%20XPT_DestroyArena&reason_type=contains&date=07%2F26%2F2011%2015%3A39%3A31&range_value=16&range_unit=weeks&hang_type=any&process_type=any&do_query=1&admin=1&signature=arena_dalloc_small%20%7C%20arena_dalloc%20%7C%20free%20%7C%20XPT_DestroyArena
Status: NEW → RESOLVED
Crash Signature: [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] [@ XPT_DestroyArena] → [@ arena_dalloc_small | arena_dalloc | free | XPT_DestroyArena] [@ XPT_DestroyArena]
Closed: 13 years ago
Resolution: --- → WORKSFORME
Whiteboard: [crashkill][tbird topcrash] → [crashkill][tbird topcrash-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: