Closed Bug 959562 Opened 10 years ago Closed 10 years ago

Testruns on mac gets aborted due to "Fault in cycle collector"

Categories

(Mozilla QA Graveyard :: Infrastructure, defect, P1)

All
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cosmin-malutan, Assigned: whimboo)

References

Details

(Keywords: crash)

Testruns on mac gets aborted due to Cycle Collector crash,
It fails more that 10 times per day.
What I observed is that it fails either in:
>restartTests/testAddons_uninstallTheme/test1.js
or a couple of tests after.
So far i couldn't reproduce it.
 
As a next step I will reduce the test so I can run more testruns.

>02:27:39 TEST-SKIPPED | test5.js | Bug 783484 -  Test failure 'Shutdown expected but none detected before end of test
>02:27:42 TEST-START | restartTests/testAddons_uninstallTheme/test1.js | setupModule
>02:27:42 TEST-START | restartTests/testAddons_uninstallTheme/test1.js | testInstallTheme
>02:27:44 TEST-PASS | restartTests/testAddons_uninstallTheme/test1.js | testInstallTheme
>02:27:44 TEST-START | restartTests/testAddons_uninstallTheme/test1.js | teardownModule
>02:27:44 TEST-END | restartTests/testAddons_uninstallTheme/test1.js | finished in 2382ms
>02:28:45 Fault in cycle collector: overflowing refcount (ptr: 0x110fdf600)
>02:28:45 [62765] ###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226
>02:28:45 [62765] ###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226
>02:28:45 RESULTS | Passed: 22
>02:28:45 RESULTS | Failed: 0
>02:28:45 RESULTS | Skipped: 5
>02:28:46 Traceback (most recent call last):
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 349, in run
>02:28:46     self.run_tests()
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 573, in run_tests
>02:28:46     TestRun.run_tests(self)
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 300, in run_tests
>02:28:46     self._mozmill.run(tests, self.options.restart)
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/mozmill/__init__.py", line 409, in run
>02:28:46     frame = self.run_test_file(frame or self.start_runner(),
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/mozmill/__init__.py", line 326, in start_runner
>02:28:46     self.create_network()
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/mozmill/__init__.py", line 287, in create_network
>02:28:46     self.jsbridge_port)
>02:28:46   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_functional/mozmill-env-mac/python-lib/jsbridge/__init__.py", line 44, in wait_and_create_network
>02:28:46     raise Exception("Cannot connect to jsbridge extension, port %s" % port)
>02:28:46 Exception: Cannot connect to jsbridge extension, port 49256
>02:28:46 Report document created at 'http://mozauto.iriscouch.com/mozmill-daily/4766c96947f50f00f16d44482a17a475'
Depends on: 956284
This crash is happening way more often as I have thought. Lets try to get a reproducible testcase so it can be killed. Cosmin, you haven't marked which versions are affected. So please do so.
Priority: -- → P1
This affects only aurora(Firefox 28.0a2) so far.
Locally I ran 60 testrans today and I couldn't reproduce it.
Tomorrow we will try to reproduce by triggering jobs on staging.
On the core bug I have also seen this crash on a project branch means based on Nightly:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/project_functional/233/console

Haven't you seen those crashes for those builds anymore? If that is the case it could have been fixed on mozilla-central already.
Hi Henrik, that is a good catch.
I didn't saw that when I looked over the mails but the mentioned failure is on the holly branch.
I haven't seen the failure on nightly. We will look into this tomorrow.
An update here:
I reproduced this once on mm-osx-109-4 node, I will try to minimize.
testcase wanted as per bug 937220 comment 29
Keywords: testcase-wanted
So far I'm sure it fails in restart tests, and only in testrun, and keeping in mind that this is garbage collector issue it might be that we keep a reference to objects from FF in the persisted object.

I will try/catch the deleting of the persisted object in our teardownModules.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/delete#Returns
An update here:
It doesn't' throw when deleting the persisted object so .

Here is the crash report, though I saw that in bug 956284 crash signature is updated.
https://crash-stats.mozilla.com/report/index/923ad5f0-d510-4065-b695-1d16a2140117

I'm still minimizing the testrun.
I minimized the testrun and I started to work on a minimized testcase, the bug reproduces with the two restart tests installTheme and uninstallTheme, in testrun or in simple test ran with profile, running with profile might not be related as I didn't tested this enough but I reproduced the failure once in 10 rans with profile. Bug 959562.
Given that we haven't had much progress here during the last days I will take this bug. I will have more details in a bit given that it reproduces quiet often with a debug build on OS X. So far I can see an assertion when it happens:

Assertion failure: rc != 0 (destroyed timer off its target thread!), at ../../../xpcom/threads/TimerThread.cpp:259

That might be the cause. I will push further information to the Firefox bug which is marked as dependent.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Severity: normal → major
Keywords: crash
(In reply to Henrik Skupin (:whimboo) from comment #10)
> Assertion failure: rc != 0 (destroyed timer off its target thread!), at
> ../../../xpcom/threads/TimerThread.cpp:259

That assertion actually seems to be a different bug. Most likely identical to bug 941751. I will have to investigate that first.
Andrew, I remember we had other reports on "fault in cycle collector" already, is there anything we can do there?
Kyle is looking at it.
The underlying platform bug has been fixed on bug 956284. So this failure should be gone now. Thanks goes to ttaubert! Lets keep it open until it finally landed on the aurora branch.
Status: ASSIGNED → NEW
This has been fixed now and is no longer an issue.
Status: NEW → RESOLVED
Closed: 10 years ago
Keywords: testcase-wanted
Resolution: --- → FIXED
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.