Closed Bug 527339 Opened 15 years ago Closed 13 years ago

remedy crashes at [@ nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*)][@ nsCycleCollectingAutoRefCnt::unmarkPurple()] caused by threadsafety problems in extensions

Categories

(Core :: XPCOM, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dbaron, Assigned: dbaron)

References

Details

(Keywords: crash, Whiteboard: [crashkill][crashkill-thirdparty])

Crash Data

I was expecting bug 521750 to fix the crashes at nsGlobalWindow::cycleCollection::UnmarkPurple in addition to the ones that it did, in fact, fix. I think we ought to be able to fix those as well, though I need to think through things some more. (That they're not fixed could even just be a mistake in bug 521750's patch.) I took a look at the three most recent nsGlobalWindow::cycleCollection::UnmarkPurple crashes in Firefox 3.5.5. First, I looked at bp-9c6a990d-8b1f-4f22-abf7-22d012091108 The entirety of the method (excluding the out-of-the-main path handling for the argument being null) is: nsGlobalWindow::cycleCollection::UnmarkPurple: 69318B58 mov eax,dword ptr [esp+8] 69318B5C test eax,eax 69318B5E je 6943D9CB 69318B64 add eax,0FFFFFFD0h 69318B67 mov ecx,dword ptr [eax+6Ch] 69318B6A mov ecx,dword ptr [ecx+4] <== CRASH HERE 69318B6D add ecx,ecx 69318B6F or ecx,1 69318B72 mov dword ptr [eax+6Ch],ecx 69318B75 ret 8 When the crash happens, ECX=0x00000033 and EAX=0x02286940. The corresponding code (from NS_IMPL_CYCLE_COLLECTION_CLASS_BODY_NO_UNLINK in nsCycleCollectionParticipant.h) is: NS_IMETHOD_(void) UnmarkPurple(nsISupports *s) \ { \ Downcast(s)->UnmarkPurple(); \ } \ with UnmarkPurple inlined (from NS_DECL_CYCLE_COLLECTING_ISUPPORTS in nsISupportsImpl.h): void UnmarkPurple() \ { \ mRefCnt.unmarkPurple(); \ } \ which in turn has nsCycleCollectingAutoRefCnt::unmarkPurple inlined: void unmarkPurple() { NS_ASSERTION(IsPurple(), "must be purple"); nsrefcnt refcount = NS_CCAR_TAGGED_TO_PURPLE_ENTRY(mTagged)->mRefCnt; mTagged = NS_CCAR_REFCNT_TO_TAGGED(refcount); } This means 0xFFFFFFD0 is presumably the offset from nsGlobalWindow's canonical nsISupports* pointer to its base, 0x6C is presumably the offset from nsGlobalWindow to its mRefCnt (I didn't check either of these), and 0x4 is (easy to check) the offset of mTagged within nsPurpleBufferEntry. So the crash is happening because mRefCnt.mTagged is 0x33, which means that the assertion on the first line of unmarkPurple is failing, and that's why it's crashing. Checking this by looking at the two other crashes: Second, bp-720ffae8-ba60-4b9e-958a-8e8e82091108 same crash location, this time ECX=0x000000d9 (and EAX=0x00944420) Third, bp-c66b9165-2807-4835-861c-b1bd82091108 same crash location, this time ECX=0x00000109 (and EAX=0x00984420)
I looked at a few of the nsCycleCollectingAutoRefCnt::unmarkPurple() crashes as well. Again, most have small odd integer crash addresses. I looked at one that didn't have a small odd integer crash address, bp-7bbed095-c36f-4e0f-9c14-5d41d2091108 , and it a crash on effectively the same instruction, except not inlined. The mRefCnt's address was 0x12234b34, and its mTagged was 0x00e800d7. I also looked at two that did have a small odd integer crash address (the normal case): bp-9ad19fef-3f0c-41cf-b367-bd59f2091108 : same instruction. The mRefCnt's address was 0x03c8f8f0, and its mTagged was 0x00000023. bp-bee57f6e-8330-4754-9af7-29e5b2091108 : same instruction. The mRefCnt's address was 0x06df14f0, and its mTagged was 0x00000021. (In reply to comment #0) > 0x4 is (easy to check) the offset of mTagged within nsPurpleBufferEntry. and I clearly meant the offset of *mRefCnt* within nsPurpleBufferEntry.
Summary: remedy crashes at [@ nsGlobalWindow::cycleCollection::UnmarkPurple] caused by threadsafety problems in extensions → remedy crashes at [@ nsGlobalWindow::cycleCollection::UnmarkPurple][@ nsCycleCollectingAutoRefCnt::unmarkPurple()] caused by threadsafety problems in extensions
Whiteboard: [crashkill]
Summary: remedy crashes at [@ nsGlobalWindow::cycleCollection::UnmarkPurple][@ nsCycleCollectingAutoRefCnt::unmarkPurple()] caused by threadsafety problems in extensions → remedy crashes at [@ nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*)][@ nsCycleCollectingAutoRefCnt::unmarkPurple()] caused by threadsafety problems in extensions
Whiteboard: [crashkill] → [crashkill][crashkill-thirdparty]
dbaron: There also appear to be a number of crashes with a signature of nsGenericElement::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&). Do you think those crashes are related / the same as this one or a different cycle collector crash we should get on file? (That signature is currently #45 for 3.5.5.)
Those are different (different extension correlations, much less multicore correlation).
I was noticing in 20091214*3.5.5*core* data that these are nearly all multicore now, and its the top volume signature where nearly all crashes are on multicore systems. I wonder what could explain that one crash on a 0 core system. nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*)|EXCEPTION_ACCESS_VIOLATION (879 crashes) 0% (1/879) vs. 1% (659/104396) x86 with 0 cores 0% (0/879) vs. 42% (43876/104396) x86 with 1 cores 92% (812/879) vs. 53% (55076/104396) x86 with 2 cores 0% (4/879) vs. 0% (341/104396) x86 with 3 cores 7% (62/879) vs. 4% (4096/104396) x86 with 4 cores 0% (0/879) vs. 0% (347/104396) x86 with 8 cores 0% (0/879) vs. 0% (1/104396) x86 with 16 cores
Blocks: 534896
there is a similar looking stack with the signature [@ nsPurpleBuffer::SelectPointers(GCGraphBuilder&) ] that also ranks around #64. stacks look like http://crash-stats.mozilla.com/report/index/2e752450-585b-4762-bd27-8f7222091220 Frame Module Signature [Expand] Source 0 xul.dll nsPurpleBuffer::SelectPointers xpcom/base/nsCycleCollector.cpp:898 1 xul.dll nsCycleCollector::BeginCollection xpcom/base/nsCycleCollector.cpp:2517 2 xul.dll XPCCycleCollectGCCallback js/src/xpconnect/src/nsXPConnect.cpp:390 3 js3250.dll js_GC js/src/jsgc.cpp:3534 4 js3250.dll JS_GC js/src/jsapi.cpp:2438 5 xul.dll nsXPConnect::Collect js/src/xpconnect/src/nsXPConnect.cpp:477 6 xul.dll nsCycleCollector::Collect xpcom/base/nsCycleCollector.cpp:2421 more reports at http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=nsPurpleBuffer%3A%3ASelectPointers%28GCGraphBuilder%26%29&version=Firefox%3A3.6b5 if this is the same the two combined might rank this near the top 10.
There is a comment in the recent crash data indicating someone crashed in this stack "Using Google Analytics "Site Overlay" feature" but I have not yet been able to reproduce it on a Windows 7 or Win XP box.
This crash is related to having specific binary addons installed (see bug 521750 and dependencies, of which this is a followup), and the reports at http://people.mozilla.com/crash_analysis/ .
I got this crash sometimes with the loudmo contextual ad assistent, which got installed when installed with the Chameleon Tom plugin: http://plugin.chameleontom.com/ http://support.mozilla.com/no/forum/1/532304?forumId=1&comments_threshold=0&comments_parentId=532304&comments_offset=20&comments_per_page=20&thread_style=commentStyle_plain The loudmo contextual ad assistent is indeed a binary extension that seems to add a proxy that loads the site on their own server or something like that. If wanted, I can attach the extension to the bug.
Blocks: 567949
PLEASE.... See at least the last post from me in the following: https://support.mozilla.com/en-US/forum/1/297901 I am TIRED of Firefox crashing every two minutes with this same error. If it's supposed to be related to some add-on / plugin, HOW DO I FIND THIS? I have gone through each one - disabled, running Firefox for an extended period of time / no crash - reenabled, no crash...but when everything is enabled, CRASH. I don't know where to go next and I am frustrated as hell. I see that we are going on two months since the last update here. Let me supply links to my last 3 crashes: http://crash-stats.mozilla.com/report/index/bp-e9212e89-9c64-40ce-b7d0-86d0c2100602 http://crash-stats.mozilla.com/report/index/992d0c53-fc24-439f-85f2-269e22100602 http://crash-stats.mozilla.com/report/index/bp-9f758508-378f-40fe-b36f-eee062100528 PLEASE ADVISE. Thank you. Andy Cooper
Andy, I just started a build of Firefox 3.6.3 plus the patch in http://hg.mozilla.org/mozilla-central/rev/0d5553264cbf (which is expected to be in Firefox 4.0). This patch will make the build crash *faster*, but in a more useful way -- so that we'd likely be able to figure out from the crash report what is causing the crash. (What the patch does is make us intentionally crash whenever an extension does what was causing these random crashes.) The build should be ready in a few hours; I'll try to remember to post the link here when it is. (Whatever you do -- delete the build after you're done testing, since if you continue using it you won't get Firefox software updates in the future.) When it's ready, you can download the build, unzip it in its own directory, and try running that instance of Firefox (after quitting your other Firefox). When it crashes, try submitting a crash report, and post the ID here. (I'm actually not sure the crash-reporting system will actually work with a build generated this way, but we'll see...)
Thanks, David! I will wait for notification of this build, and then I will run it and submit details when it crashes. Andy
The build in question is: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/dbaron@mozilla.com-261460fae2a1/tryserver-win32/firefox-3.6.3.en-US.win32.zip (If you can't deal with .zip, there's also an installer in the same directory... but you don't want to install over your normal installation.)
No, I got it, David. Thanks! It's already running, with all add-ons and plugins intact. Once I get a crash, I will post the link here. Andy
David... Well, so far, NO CRASHES. It appears that somehow, my installation of Firefox 3.5.6 must be corrupt. Is there an easy way to do a complete uninstall / fresh installation, saving my bookmarks, add-ons, etc., in the process? Thanks... Andy
3.5.6? The crashes in comment 9 were from 3.6.3. But anyway... Yes, there certainly is, although I haven't done it for a while, so you're probably better off checking the support site than having me get it wrong. (The basic idea would be to download the installer for 3.6.3, then uninstall the old one and rerun the installer. It's the details of exactly how you should use the uninstaller that you should probably double-check -- and also whether there's anything else you should delete (or, to be safe, move aside) between the uninstall and reinstall.)
Severity: normal → critical
An interesting data point: this crash is also present in 4.0b1, which likely means that it's not due to refcounting on the wrong thread, since 4.0b1 has the fix to bug 549743.
the 4.0b1 seem highly correlated to Internet Download Manger 100% (538/540) vs. 50% (8971/17949) idmmzcc.dll 99% (535/540) vs. 50% (9040/17949) idmmkb.dll
Depends on: 578443
Well, I'm back again. Using 3.6.8, getting CONSTANT crashes with this error / signature when accessing Google GMAIL. Frustrating, to say the least. Andy
#8 ranking crash in 3.6.13, but not showing up in any of the firefox 4 beta 9 data, so maybe this was fixed along the way.
Crash Signature: [@ nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*)] [@ nsCycleCollectingAutoRefCnt::unmarkPurple()]
Still there but really low volume < 10 on releases. Removing the top crash keyword.
Crash Signature: [@ nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*)] [@ nsCycleCollectingAutoRefCnt::unmarkPurple()] → [@ nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*)] [@ nsCycleCollectingAutoRefCnt::unmarkPurple()]
Keywords: topcrash
Now that we've taken the approach in bug 549743 this bug is no longer relevant. Other crashes with similar signatures should be filed as separate bugs; crashes with those signatures in Firefox 4 or later are not related to this bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
No longer blocks: 534896
You need to log in before you can comment on or make changes to this bug.