Closed Bug 436375 Opened 16 years ago Closed 16 years ago

crashes [@arena_dalloc_small] (Tablet PC) (a11y related?)

Categories

(Core :: General, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 432467

People

(Reporter: ted, Unassigned)

References

()

Details

A friend of mine has been seeing lots of crashes on RC1. I got him to try today's nightly, and he's still crashing. No extensions enabled, and he's hitting it pretty frequently. His recent crash report (in the URL field) has nsAccessible on the stack, and he is using a tablet PC, so I suspect some a11y badness here.
The line indicated by the stack says:
   return mDOMNode ? GetRole(aRole) : NS_ERROR_FAILURE;  // Node already shut down 

However, there is a bunch of JavaScript related stuff on top of that, so I tentatively suspect that we're not the direct cause. Alexander, Ginn, any thoughts?
If it is a release build and it crashed in free(), we don't know whether the stack is related to the real problem or not.

Right, I'm just making educated guesses here given the a11y on the stack, and the fact that he's using a Tablet PC. He's experiencing these crashes pretty frequently, and he's not using any binary extensions.
Also, stack from the crash report, for easier access:
Frame	Module	Signature	Source
0	mozcrt19.dll	arena_dalloc_small	jemalloc.c:4094
1	mozcrt19.dll	arena_dalloc	jemalloc.c:4196
2	mozcrt19.dll	free	jemalloc.c:6009
3	xul.dll	XPCWrappedNative::CallMethod	mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2564
4	xul.dll	XPC_WN_CallMethod	mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1473
5	js3250.dll	js_Invoke	mozilla/js/src/jsinterp.c:1297
6	js3250.dll	js_InternalInvoke	mozilla/js/src/jsinterp.c:1369
7	js3250.dll	js_TryMethod	mozilla/js/src/jsobj.c:4782
8	js3250.dll	js_DefaultValue	mozilla/js/src/jsobj.c:4081
9	js3250.dll	js_ValueToString	mozilla/js/src/jsstr.c:2694
10	js3250.dll	js_ReportUncaughtException	mozilla/js/src/jsexn.c:1297
11	js3250.dll	JS_CallFunctionValue	mozilla/js/src/jsapi.c:5055
12	xul.dll	nsXPCWrappedJSClass::CallQueryInterfaceOnJSObject	mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp:279
13	xul.dll	nsXPCWrappedJSClass::DelegatedQueryInterface	mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp:668
14	xul.dll	nsXPCWrappedJS::AggregatedQueryInterface	mozilla/js/src/xpconnect/src/xpcwrappedjs.cpp:144
15	xul.dll	nsBindingManager::GetBindingImplementation	mozilla/content/xbl/src/nsBindingManager.cpp:1171
16	xul.dll	nsAccessible::GetFinalRole	mozilla/accessible/src/base/nsAccessible.cpp:2040 
Related to bug 418142?
After some more poking, shaver suspects heap corruption.
I am on a Tablet PC and see a lot of these crashes. I just checked a few of the last of my crash reports and at least one of them seems to point to accessibility:

http://crash-stats.mozilla.com/report/index/c0f0a605-2cbb-11dd-9584-001321b13766

Crash report unrelated to accessibility:
http://crash-stats.mozilla.com/report/index/d5d1803d-2d76-11dd-b2a2-001cc45a2ce4
http://crash-stats.mozilla.com/report/index/7fab10ab-2cf3-11dd-bd65-001321b13766

Summary: crashes [@arena_dalloc_small] (a11y related?) → crashes [@arena_dalloc_small] (Tablet PC) (a11y related?)
Crash also happens when running minefield (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008052907 Minefield/3.0pre) in safe mode. I can fairly consistently cause the crash by doing the following:

1. Using the location bar, navigate to gmail.
2. CTRL-T to create a new tab, then using the location bar, navigate to gmail in the new tab.
3. While the second tab is loading, close the first tab, then goto step 2.

It only takes two or three iterations before it crashes.
Here's a mozcrt19.dll built with MOZ_MEMORY_DEBUG:
http://people.mozilla.com/~tmielczarek/mozcrt19.zip

stuart suggests you set this environment variable:
MALLOC_OPTIONS=12fJ

I dunno where this output goes, maybe run with -console ?
Try swapping that dll into your firefox install dir (maybe back up the old copy if you expect to get back to a known state) and reproducing the problem.
I am running minefield with the MOZ_MEMORY_DEBUG dll, but I can't figure out how to capture its output. (When I run it with -console, the console also crashes when the browser does.)

Setting MALLOC_OPTIONS=12fJ causes the browser to fail to start.
Jason: any ideas on how we can debug this?
The actual cause of the crash is likely to occur well before the crash actually happens, which could make this tough to figure out.  If the steps to reproduce in comment #8 suffice in general, I can try to figure this crash out with a modified build.
Status: NEW → ASSIGNED
Jason: unfortunately, the crash seems to be limited to the Tablet PC OS. Timeless suggested some tools for heap debugging on Win32, but those aren't going to be useful in conjunction with jemalloc, are they?
If the bug is a double-free or something like that, then it would probably be possible to find the problem with a non-jemalloc build and the heap debug tools.  If the problem is more subtle memory corruption due to a buffer overflow, then this is going to be really hard to figure out.
Here's a build of the RC2 code, without jemalloc:
http://mavra.perilith.com/~luser/firefox-3.0pre.en-US.win32.zip

timeless suggested that you could use gflags.exe to do Windows heap debugging to try to determine the cause of this crash:
http://msdn.microsoft.com/en-us/library/ms792858.aspx

A few other links about using gflags:
http://support.microsoft.com/kb/300966
http://support.microsoft.com/kb/286470
Status: ASSIGNED → NEW
I have the same problem with my tablet notebook. The firefox crash randomly and sometimes it crash on start up. Sometimes, it crash after I use it over onoe hours. There is no particular occasion it crash or I do not recognised. 

It crash even on safe mode. I can confirm that it is because of the tablet notebook as I transfer all my profile to firefox portable and run in the tablet notebook, it crash. Then I run it again in other Vista PC, there is no crash with the same add-on and setting. 

It seems happened after I upgrade to Vista SP1. However, even I uninstall SP1, it still happened. 
With these steps I was able to reliably reproduce a crash on my computer - even though not all crash reports were arena_dalloc_small crashes.

I was able to get a crash every single time with the following steps:
* Login to Gmail
* Click on "Documents" link in Gmail to open Google Documents
* While Google Documents is still loading in new tab, close Gmail tab (using tab close button)

After a second or two (towards the end of loading Google Documents) Firefox will crash. If it does not, go to location bar and start typing - this will definitely crash it for me.


Here are some of the crash reports I got with this method:
http://crash-stats.mozilla.com/report/index/93eac411-3136-11dd-8fb9-0013211cbf8a
http://crash-stats.mozilla.com/report/index/80fe975c-3136-11dd-a298-001cc45a2c28
http://crash-stats.mozilla.com/report/index/e306679e-3137-11dd-889f-001cc45a2c28
I am not 100% certain this is the same bug, but since the way to reproduce it looks very much the same, it might well be.

Note that I am using the mozcrt19.dll provided in comment #9!

Ok, I've got Ryan running this build with heap debuggging, and the stack he gets from that is:
ChildEBP RetAddr
0012fb98 104fdaff xul!nsVoidArray::EnumerateForwards+0x19 [d:\build\mozilla-1.9\obj-firefox-opt-libxul\xpcom\build\nsvoidarray.cpp @ 676]
0012fba8 104e1192 xul!nsCOMArray_base::Clear+0xf [d:\build\mozilla-1.9\obj-firefox-opt-libxul\xpcom\build\nscomarray.cpp @ 159]
0012fc0c 104deba0 xul!nsDocAccessible::FlushPendingEvents+0x3ad [d:\build\mozilla-1.9\mozilla\accessible\src\base\nsdocaccessible.cpp @ 1659]
0012fc14 10520289 xul!nsDocAccessible::FlushEventsCallback+0xe [d:\build\mozilla-1.9\mozilla\accessible\src\base\nsdocaccessible.cpp @ 1676]
0012fc2c 105203f9 xul!nsTimerImpl::Fire+0x94 [d:\build\mozilla-1.9\mozilla\xpcom\threads\nstimerimpl.cpp @ 400]
0012fc34 105176bc xul!nsTimerEvent::Run+0x1b [d:\build\mozilla-1.9\mozilla\xpcom\threads\nstimerimpl.cpp @ 492]
0012fc54 104fed69 xul!nsThread::ProcessNextEvent+0xc3 [d:\build\mozilla-1.9\mozilla\xpcom\threads\nsthread.cpp @ 511]
0012fc68 1049c3e1 xul!NS_ProcessNextEvent_P+0x20 [d:\build\mozilla-1.9\obj-firefox-opt-libxul\xpcom\build\nsthreadutils.cpp @ 227]
0012fc7c 103d3a59 xul!nsBaseAppShell::Run+0x28 [d:\build\mozilla-1.9\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp @ 170]
0012fc88 100082bf xul!nsAppStartup::Run+0x1e [d:\build\mozilla-1.9\mozilla\toolkit\components\startup\src\nsappstartup.cpp @ 182]
0012ff10 0040133c xul!XRE_main+0x14b0 [d:\build\mozilla-1.9\mozilla\toolkit\xre\nsapprunner.cpp @ 3172]
0012ff48 0040144e firefox!NS_internal_main+0x1b0 [d:\build\mozilla-1.9\mozilla\browser\app\nsbrowserapp.cpp @ 159]
0012ff7c 004015d6 firefox!wmain+0xf3 [d:\build\mozilla-1.9\mozilla\toolkit\xre\nswindowswmain.cpp @ 89]
0012ffc0 7c817067 firefox!__tmainCRTStartup+0x10f [f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 594]
0012fff0 00000000 kernel32!BaseProcessStart+0x23

The line in FlushPendingEvents is:
mEventsToFire.Clear();
So I guess one of the entries in mEventsToFire has already been freed elsewhere, hence the heap corruption.
Looks like a dupe of bug 432467.

Can you try the patch? Thanks!
Ted, the "Patch v2" attachment in bug 432467 contains a possible fix for this. Originally, this seemed to be confined to certain web apps like Plone Editor, but it appears the problem is more widespread than we all thought.
Here's a build with the patch from bug 432467:
http://mavra.perilith.com/~luser/firefox-3.0pre.en-US.win32.zip

Ryan (or anyone else experiencing this crash), can you try using this build and see if it fixes the crashing for you?
Well, I can't get the crash to occur using that patched build, so I guess that's the solution. Thanks, everybody!
Thanks for all the debugging, Ryan.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
Thanks for all your help. I use the firefox on comment #23. It seems OK but I need to use several days more. However, what happen when there is new release of firefox? Do I keep using this or use the upgrade firefox?

Today firefox 3.0 RC 2 is release. Should I upgrade it and does it fix the problem?
(In reply to comment #28)
> Today firefox 3.0 RC 2 is release. Should I upgrade it and does it fix the
> problem?

Please continue using the test build from comment #23. Firefox 3.0 RC2 does not fix this problem yet.
Official firefox 3.0 is coming. Should I upgrade it? 
(In reply to comment #30)
> Official firefox 3.0 is coming. Should I upgrade it? 
> 

no, it's not fixed yet. It's the same bits as RC2 on Windows, AFAIK.
I am seeing this bug (I see jemalloc-related crashes all the time.) according to the steps in #18. I am not on a tablet OS though (afaik, anyway). I triggered this three times in a row now by closing the gmail tab after opening documents, but every time it refuses to report the crash. 

I assume there is something in my XP that can be disabled so it does not behave like a Tablet OS somehow.
(In reply to comment #32)
> I am seeing this bug (I see jemalloc-related crashes all the time.) according
> to the steps in #18. I am not on a tablet OS though (afaik, anyway). I
> triggered this three times in a row now by closing the gmail tab after opening
> documents, but every time it refuses to report the crash. 

John, can you try the build linked to in comment#23 and see if it fixes your issue?
It does solve the gmail-closing bug. I will report back after my work day here ends, I never see a day without crashes normally so.
No crashes all of yesterday. I'd say its good.
Fix for this was checked in on Gecko 1.9.0.1 for Firefox 3.0.1 today.
You need to log in before you can comment on or make changes to this bug.