Closed Bug 956284 Opened 10 years ago Closed 10 years ago

Fault in cycle collector: overflowing refcount

Categories

(Core :: DOM: Workers, defect)

28 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla29
Tracking Status
firefox26 --- unaffected
firefox27 --- unaffected
firefox28 + verified
firefox29 + verified
firefox-esr24 --- unaffected
b2g18 --- unaffected
b2g-v1.1hd --- unaffected
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: whimboo, Assigned: ttaubert)

References

Details

(5 keywords, Whiteboard: [mozmill])

Attachments

(1 file)

By running our Mozmill tests for Firefox Aurora we noticed a lot of aborts caused by overflowing refcounts in Firefox:

Report:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-aurora_functional/16324/console

Message:
Fault in cycle collector: overflowing refcount (ptr: 0x10fd748b0)
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226

So far this problem only seems to exist on OS X across different platform versions and always seem to happen on exit.
Whiteboard: [mozmill]
My guess is that we end up having refcnt 0 - 1 for some reason. Don't or didn't we have some
worker related bug for this?
Also happens for holly branch builds with slightly different lines of code:

http://mm-ci-master.qa.scl3.mozilla.com:8080/job/project_functional/233/console

Fault in cycle collector: overflowing refcount (ptr: 0x10de89870)
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
Finally I got a crash report for the holly build: bp-fe34fe81-c94a-4e50-a419-62d612140103

Stack:

0 	libmozalloc.dylib 	mozalloc_abort(char const*) 	memory/mozalloc/mozalloc_abort.cpp
1 	XUL 	Abort 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
2 	XUL 	NS_DebugBreak 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
3 	XUL 	GCGraphBuilder::DescribeRefCountedNode(unsigned int, char const*) 	xpcom/base/nsCycleCollector.cpp
4 	XUL 	nsDOMEventTargetHelper::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 	content/events/src/nsDOMEventTargetHelper.cpp
5 	XUL 	nsXHREventTarget::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 	content/base/src/nsXMLHttpRequest.cpp
6 	XUL 	mozilla::dom::workers::XMLHttpRequest::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 	dom/workers/XMLHttpRequest.cpp
7 	XUL 	GCGraphBuilder::Traverse(PtrInfo*) 	xpcom/base/nsCycleCollector.cpp
8 	XUL 	nsCycleCollector::MarkRoots(js::SliceBudget&) 	xpcom/base/nsCycleCollector.cpp
9 	XUL 	nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*) 	xpcom/base/nsCycleCollector.cpp
10 	XUL 	nsCycleCollector_collect(nsICycleCollectorListener*) 	xpcom/base/nsCycleCollector.cpp
11 	XUL 	mozilla::CycleCollectedJSRuntime::OnGC(JSGCStatus) 	xpcom/base/CycleCollectedJSRuntime.cpp
12 	XUL 	Collect 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/js/src/../../../../js/src/jsgc.cpp
13 	XUL 	mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) 	dom/workers/WorkerPrivate.cpp
14 	XUL 	(anonymous namespace)::GarbageCollectRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) 	dom/workers/WorkerPrivate.cpp
15 	XUL 	mozilla::dom::workers::WorkerRunnable::Run() 	dom/workers/WorkerRunnable.cpp
16 	XUL 	mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked() 	dom/workers/WorkerPrivate.cpp
17 	XUL 	mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) 	dom/workers/WorkerPrivate.cpp
18 	XUL 	(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() 	dom/workers/RuntimeService.cpp
19 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
20 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	/builds/slave/m-in-osx64-0000000000000000000/build/xpcom/glue/nsThreadUtils.cpp
21 	XUL 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/glue/../../../../ipc/glue/MessagePump.cpp
22 	XUL 	MessageLoop::Run() 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
23 	XUL 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp
24 	libnss3.dylib 	_pt_root 	
25 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x1899 	
26 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x172a 	
27 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x5fc9 	
28 	libnss3.dylib 	null_signal_handler
Keywords: crashreportid
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> There's bug 937220.

So does it mean it is a dupe? Given the stack I would say so. If it helps I could try to find a pattern to get it reproduced.
This actually also looks like it could be a top mac crash, as it shows up overall 24 in the top crash list, which has to be fairly bad as it is OSX only.
  https://crash-stats.mozilla.com/report/index/ba3b3008-31b7-4310-b521-2e9d32140102
Depends on: 937220
Oops, the other bug talks about that already, never mind.
Component: XUL → DOM: Workers
What are the steps to reproduce this?  Maybe that would help with a fix.
Group: core-security
Probably just a dupe, so I'm going to set it as sec-other.
Keywords: sec-other
I can see this on and off during our Mozmill tests. I don't have a pattern yet but I will try to nail it down, and hopefully get manual steps.
Tracking in case it's not a dupe, but if it is we can untrack.
Blocks: 959562
Any luck tracking this down yet, Henrik?
Flags: needinfo?(hskupin)
No,sorry. I'm completely filled up with work on bug 956315. I asked Cosmin to do the investigation on the test side via bug 959562. You might want to trigger him.
Flags: needinfo?(hskupin)
Could this bug in some way be related to bug 937220 which also has been started with Firefox 28?
I think there's a reasonable chance it is the same issue.
See Also: → 937220
Henrik, have you made any headway with the investigation here? If not, can this be offloaded to someone else on your team?
Nope, sorry. I was hoping to get an update from bug 937220. Cosmin, would you be able to continue here in finding a minimzed testcase for our testrun?
Flags: needinfo?(cosmin.malutan)
I tried to bisect this and it looks like https://hg.mozilla.org/mozilla-central/rev/8196cbbcadcd from bug 935032 might be the culprit. Does that sound plausible or is my testcase broken?
That seems a little unlikely to me.
This is almost certainly a regression from bug 928312.
The refcount overflows for a XMLHttpRequest instance that tries to request "resource://gre/modules/workers/lz4_internal.js".
It looks like XMLHttpRequest::Unpin() is getting called even when XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted?

MaybePin() bails out after calling mWorkerPrivate->AddFeature() unsuccessfully.
(In reply to Tim Taubert [:ttaubert] from comment #22)
> It looks like XMLHttpRequest::Unpin() is getting called even when
> XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted?
> 
> MaybePin() bails out after calling mWorkerPrivate->AddFeature()
> unsuccessfully.

Mmm, that was something I had convinced myself couldn't happen.  AddFeature fails because the worker is shutting down, presumably?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> Mmm, that was something I had convinced myself couldn't happen.  AddFeature
> fails because the worker is shutting down, presumably?

Yes, this is all happening on shutdown.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #24)
> I think the fix is probably to change
> http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.
> cpp#1642 to set aRv to a failure code before returning.

Yeah, I wondered why we check aRv even though MaybePin() doesn't touch it.
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Flags: needinfo?(cosmin.malutan)
Blocks: 937220
No longer depends on: 937220
https://hg.mozilla.org/mozilla-central/rev/7393aa7b33ad
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
I checked the latest tinderbox build and passed our tests. It didn't failed in 30 runs so I can say is fixed.
Thanks Cosmin! Lets see how the Nightly build from today will behave. If all is fine please mark this bug as verified, and also update status-firefox29 to verified.
Comment on attachment 8366748 [details] [diff] [review]
0001-Bug-956284-Set-error-code-for-MaybePin-if-AddFeature.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 928312
User impact if declined: Possible crash on shutdown.
Testing completed (on m-c, etc.): First reports after landing are looking good.
Risk to taking this patch (and alternatives if risky): Small patch with a low risk.
String or IDL/UUID changes made by this patch: None.
Attachment #8366748 - Flags: approval-mozilla-aurora?
I ran the tests that failed against Nightly build for 50 times and it passes.
This is where the work actually happened, so I'll set this to sec-high.
Keywords: sec-othersec-high
Keywords: regression
Attachment #8366748 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
No more crashes happened today for our Mozmill testruns for Aurora builds on OS X. So all is working fine!
Status: RESOLVED → VERIFIED
No longer blocks: 937220
Keywords: qawanted
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: