Fault in cycle collector: overflowing refcount

VERIFIED FIXED in Firefox 28, Firefox OS v1.3

Status

()

Core
DOM: Workers
--
critical
VERIFIED FIXED
4 years ago
a year ago

People

(Reporter: whimboo, Assigned: ttaubert)

Tracking

(5 keywords)

28 Branch
mozilla29
x86_64
Mac OS X
crash, crashreportid, csectype-intoverflow, regression, sec-high
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox26 unaffected, firefox27 unaffected, firefox28+ verified, firefox29+ verified, firefox-esr24 unaffected, b2g18 unaffected, b2g-v1.1hd unaffected, b2g-v1.2 unaffected, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

Details

(Whiteboard: [mozmill])

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
By running our Mozmill tests for Firefox Aurora we noticed a lot of aborts caused by overflowing refcounts in Firefox:

Report:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-aurora_functional/16324/console

Message:
Fault in cycle collector: overflowing refcount (ptr: 0x10fd748b0)
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226

So far this problem only seems to exist on OS X across different platform versions and always seem to happen on exit.
(Reporter)

Updated

4 years ago
Whiteboard: [mozmill]
My guess is that we end up having refcnt 0 - 1 for some reason. Don't or didn't we have some
worker related bug for this?
(Reporter)

Comment 2

4 years ago
Also happens for holly branch builds with slightly different lines of code:

http://mm-ci-master.qa.scl3.mozilla.com:8080/job/project_functional/233/console

Fault in cycle collector: overflowing refcount (ptr: 0x10de89870)
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
status-firefox27: --- → ?
status-firefox28: --- → affected
status-firefox29: --- → affected
tracking-firefox28: --- → ?
tracking-firefox29: --- → ?
(Reporter)

Comment 4

4 years ago
Finally I got a crash report for the holly build: bp-fe34fe81-c94a-4e50-a419-62d612140103

Stack:

0 	libmozalloc.dylib 	mozalloc_abort(char const*) 	memory/mozalloc/mozalloc_abort.cpp
1 	XUL 	Abort 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
2 	XUL 	NS_DebugBreak 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
3 	XUL 	GCGraphBuilder::DescribeRefCountedNode(unsigned int, char const*) 	xpcom/base/nsCycleCollector.cpp
4 	XUL 	nsDOMEventTargetHelper::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 	content/events/src/nsDOMEventTargetHelper.cpp
5 	XUL 	nsXHREventTarget::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 	content/base/src/nsXMLHttpRequest.cpp
6 	XUL 	mozilla::dom::workers::XMLHttpRequest::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 	dom/workers/XMLHttpRequest.cpp
7 	XUL 	GCGraphBuilder::Traverse(PtrInfo*) 	xpcom/base/nsCycleCollector.cpp
8 	XUL 	nsCycleCollector::MarkRoots(js::SliceBudget&) 	xpcom/base/nsCycleCollector.cpp
9 	XUL 	nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*) 	xpcom/base/nsCycleCollector.cpp
10 	XUL 	nsCycleCollector_collect(nsICycleCollectorListener*) 	xpcom/base/nsCycleCollector.cpp
11 	XUL 	mozilla::CycleCollectedJSRuntime::OnGC(JSGCStatus) 	xpcom/base/CycleCollectedJSRuntime.cpp
12 	XUL 	Collect 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/js/src/../../../../js/src/jsgc.cpp
13 	XUL 	mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) 	dom/workers/WorkerPrivate.cpp
14 	XUL 	(anonymous namespace)::GarbageCollectRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) 	dom/workers/WorkerPrivate.cpp
15 	XUL 	mozilla::dom::workers::WorkerRunnable::Run() 	dom/workers/WorkerRunnable.cpp
16 	XUL 	mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked() 	dom/workers/WorkerPrivate.cpp
17 	XUL 	mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) 	dom/workers/WorkerPrivate.cpp
18 	XUL 	(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() 	dom/workers/RuntimeService.cpp
19 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
20 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	/builds/slave/m-in-osx64-0000000000000000000/build/xpcom/glue/nsThreadUtils.cpp
21 	XUL 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/glue/../../../../ipc/glue/MessagePump.cpp
22 	XUL 	MessageLoop::Run() 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
23 	XUL 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp
24 	libnss3.dylib 	_pt_root 	
25 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x1899 	
26 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x172a 	
27 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x5fc9 	
28 	libnss3.dylib 	null_signal_handler
Keywords: crashreportid
(Reporter)

Comment 5

4 years ago
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> There's bug 937220.

So does it mean it is a dupe? Given the stack I would say so. If it helps I could try to find a pattern to get it reproduced.
This actually also looks like it could be a top mac crash, as it shows up overall 24 in the top crash list, which has to be fairly bad as it is OSX only.
  https://crash-stats.mozilla.com/report/index/ba3b3008-31b7-4310-b521-2e9d32140102
Depends on: 937220
Oops, the other bug talks about that already, never mind.
Component: XUL → DOM: Workers
What are the steps to reproduce this?  Maybe that would help with a fix.
Group: core-security
Probably just a dupe, so I'm going to set it as sec-other.
Keywords: sec-other
(Reporter)

Comment 10

4 years ago
I can see this on and off during our Mozmill tests. I don't have a pattern yet but I will try to nail it down, and hopefully get manual steps.
Tracking in case it's not a dupe, but if it is we can untrack.
tracking-firefox28: ? → +
tracking-firefox29: ? → +
(Reporter)

Updated

4 years ago
Blocks: 959562
Any luck tracking this down yet, Henrik?
Flags: needinfo?(hskupin)
(Reporter)

Comment 13

4 years ago
No,sorry. I'm completely filled up with work on bug 956315. I asked Cosmin to do the investigation on the test side via bug 959562. You might want to trigger him.
Flags: needinfo?(hskupin)
(Reporter)

Comment 14

4 years ago
Could this bug in some way be related to bug 937220 which also has been started with Firefox 28?
I think there's a reasonable chance it is the same issue.
(Reporter)

Updated

4 years ago
See Also: → bug 937220
Henrik, have you made any headway with the investigation here? If not, can this be offloaded to someone else on your team?
(Reporter)

Comment 17

4 years ago
Nope, sorry. I was hoping to get an update from bug 937220. Cosmin, would you be able to continue here in finding a minimzed testcase for our testrun?
Flags: needinfo?(cosmin.malutan)
(Assignee)

Comment 18

4 years ago
I tried to bisect this and it looks like https://hg.mozilla.org/mozilla-central/rev/8196cbbcadcd from bug 935032 might be the culprit. Does that sound plausible or is my testcase broken?
That seems a little unlikely to me.
This is almost certainly a regression from bug 928312.
(Assignee)

Comment 21

4 years ago
The refcount overflows for a XMLHttpRequest instance that tries to request "resource://gre/modules/workers/lz4_internal.js".
(Assignee)

Comment 22

4 years ago
It looks like XMLHttpRequest::Unpin() is getting called even when XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted?

MaybePin() bails out after calling mWorkerPrivate->AddFeature() unsuccessfully.
(In reply to Tim Taubert [:ttaubert] from comment #22)
> It looks like XMLHttpRequest::Unpin() is getting called even when
> XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted?
> 
> MaybePin() bails out after calling mWorkerPrivate->AddFeature()
> unsuccessfully.

Mmm, that was something I had convinced myself couldn't happen.  AddFeature fails because the worker is shutting down, presumably?
(Assignee)

Comment 25

4 years ago
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> Mmm, that was something I had convinced myself couldn't happen.  AddFeature
> fails because the worker is shutting down, presumably?

Yes, this is all happening on shutdown.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #24)
> I think the fix is probably to change
> http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.
> cpp#1642 to set aRv to a failure code before returning.

Yeah, I wondered why we check aRv even though MaybePin() doesn't touch it.
(Assignee)

Comment 27

4 years ago
Created attachment 8366748 [details] [diff] [review]
0001-Bug-956284-Set-error-code-for-MaybePin-if-AddFeature.patch

Let's give it a shot.
Attachment #8366748 - Flags: review?(khuey)
(Assignee)

Updated

4 years ago
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Flags: needinfo?(cosmin.malutan)
(Reporter)

Updated

4 years ago
Blocks: 937220
No longer depends on: 937220
https://hg.mozilla.org/mozilla-central/rev/7393aa7b33ad
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
status-firefox29: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
I checked the latest tinderbox build and passed our tests. It didn't failed in 30 runs so I can say is fixed.
(Reporter)

Comment 31

4 years ago
Thanks Cosmin! Lets see how the Nightly build from today will behave. If all is fine please mark this bug as verified, and also update status-firefox29 to verified.
(Assignee)

Updated

4 years ago
Blocks: 928312
status-firefox26: --- → unaffected
status-firefox27: ? → unaffected
(Assignee)

Comment 32

4 years ago
Comment on attachment 8366748 [details] [diff] [review]
0001-Bug-956284-Set-error-code-for-MaybePin-if-AddFeature.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 928312
User impact if declined: Possible crash on shutdown.
Testing completed (on m-c, etc.): First reports after landing are looking good.
Risk to taking this patch (and alternatives if risky): Small patch with a low risk.
String or IDL/UUID changes made by this patch: None.
Attachment #8366748 - Flags: approval-mozilla-aurora?
I ran the tests that failed against Nightly build for 50 times and it passes.
status-firefox29: fixed → verified
This is where the work actually happened, so I'll set this to sec-high.
Keywords: sec-other → sec-high
Keywords: regression

Updated

4 years ago
Attachment #8366748 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 35

4 years ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/c6ff9eed5e83
status-firefox28: affected → fixed
status-b2g18: --- → unaffected
status-b2g-v1.1hd: --- → unaffected
status-b2g-v1.2: --- → unaffected
status-b2g-v1.3: --- → fixed
status-b2g-v1.4: --- → fixed
status-firefox-esr24: --- → unaffected
(Reporter)

Comment 36

4 years ago
No more crashes happened today for our Mozmill testruns for Aurora builds on OS X. So all is working fine!
Status: RESOLVED → VERIFIED
status-firefox28: fixed → verified
No longer blocks: 937220
Duplicate of this bug: 937220
Keywords: qawanted
Blocks: 962846
status-b2g-v1.3T: --- → fixed
Group: core-security
Keywords: csectype-intoverflow
You need to log in before you can comment on or make changes to this bug.