Closed Bug 1186835 Opened 9 years ago Closed 6 years ago

Jenkins intermittently fails to save report due to a stack overflow on bitbar jobs

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jlorenzo, Unassigned)

References

Details

No build is green since bitbar is up, mainly due to:
> 22:28:25 [htmlpublisher] Archiving at BUILD level /var/lib/jenkins/jobs/flame-kk-319.b2g-inbound.tinderbox.ui.functional.sanity.bitbar/workspace/tests/python/gaia-ui-tests/report to /var/lib/jenkins/jobs/flame-kk-319.b2g-inbound.tinderbox.ui.functional.sanity.bitbar/builds/2015-07-22_22-11-26/htmlreports/HTML_Report
> 22:28:25 Recording test results
> 22:28:46 FATAL: null
> 22:28:46 java.lang.StackOverflowError
> 22:28:46 	at hudson.tasks.junit.TestResultAction.load(TestResultAction.java:189)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:128)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:60)
> 22:28:46 	at hudson.tasks.test.AbstractTestResultAction.findCorrespondingResult(AbstractTestResultAction.java:240)
> 22:28:46 	at hudson.tasks.test.TestResult.getPreviousResult(TestResult.java:142)
> 22:28:46 	at hudson.tasks.junit.SuiteResult.getPreviousResult(SuiteResult.java:283)
> 22:28:46 	at hudson.tasks.junit.CaseResult.getPreviousResult(CaseResult.java:444)
> 22:28:46 	at hudson.tasks.junit.CaseResult.freeze(CaseResult.java:573)
> 22:28:46 	at hudson.tasks.junit.SuiteResult.freeze(SuiteResult.java:325)
> 22:28:46 	at hudson.tasks.junit.TestResult.freeze(TestResult.java:627)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.load(TestResultAction.java:192)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:128)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:60)
> 22:28:46 	at hudson.tasks.test.AbstractTestResultAction.findCorrespondingResult(AbstractTestResultAction.java:240)
> 22:28:46 	at hudson.tasks.test.TestResult.getPreviousResult(TestResult.java:142)
> 22:28:46 	at hudson.tasks.junit.SuiteResult.getPreviousResult(SuiteResult.java:283)
> 22:28:46 	at hudson.tasks.junit.CaseResult.getPreviousResult(CaseResult.java:444)
> 22:28:46 	at hudson.tasks.junit.CaseResult.freeze(CaseResult.java:573)
> 22:28:46 	at hudson.tasks.junit.SuiteResult.freeze(SuiteResult.java:325)
> 22:28:46 	at hudson.tasks.junit.TestResult.freeze(TestResult.java:627)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.load(TestResultAction.java:192)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:128)
> 22:28:46 	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:60)
> 22:28:46 	at hudson.tasks.test.AbstractTestResultAction.findCorrespondingResult(AbstractTestResultAction.java:240)
>
> and the stacktrace continues with always the same pattern.
It looks like Jenkins is trying to determine the result of the previous build...
In the log I see:
19:42:03 [Testdroid] - Releasing device session
19:42:03 [Testdroid] - [ERROR] - Failed to release device session Failed to execute API call: /me/device-sessions/68473/release. Reason: fxos.testdroid.com:443 failed to respond

I guess that's where the problem is.
(In reply to Martijn Wargers [:mwargers] (QA) from comment #2)
> In the log I see:
> 19:42:03 [Testdroid] - Releasing device session
> 19:42:03 [Testdroid] - [ERROR] - Failed to release device session Failed to
> execute API call: /me/device-sessions/68473/release. Reason:
> fxos.testdroid.com:443 failed to respond
> 
> I guess that's where the problem is.

That's actually quite common, the plugin often needs to reauthenticate by the time the test run is complete. You should see that message followed by a successful attempt to release the session. If the second attempt fails the build should explicitly fail.
Depends on: 1186509
Blocks: 1188360
Because of bug 1188298, I couldn't notice any increase of decrease of this intermittent failure. One thing I noted though: on the same job, some builds are able to craft the results[1], some don't[2]. Then, this is intermittent.

Also, I took a look at the 4 other jobs that are not on bitbar. I didn't see the same error on the last 10 runs for each of them. I suspect this is a bitbar only error.

[1] http://jenkins1.qa.scl3.mozilla.com/view/UI/job/flame-kk-319.b2g-inbound.tinderbox.ui.functional.sanity.bitbar/2494/console
[2] http://jenkins1.qa.scl3.mozilla.com/view/UI/job/flame-kk-319.b2g-inbound.tinderbox.ui.functional.sanity.bitbar/2495/console
Summary: Jenkins fails to save report due to a stack overflow → Jenkins intermittently fails to save report due to a stack overflow on bitbar jobs
We've recently seen this in Web QA's Jenkins and I believe I've tracked it down to the Test Stability plugin. I've raised https://issues.jenkins-ci.org/browse/JENKINS-31660 and disabled this plugin for the affected jobs.
Thanks for the heads up. I'll deactivate this plugin too, and see what's happening.
Jenkins has restarted, with Test Stability v1.0 disabled.
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.