Closed Bug 1492695 Opened Last year Closed 5 months ago

Intermittent Jit [taskcluster:error] Task aborted - max run time exceeded

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

Filed by: btara [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=200325471&repo=mozilla-central

https://queue.taskcluster.net/v1/task/GffubWMNQMazJp1tauaEyA/runs/0/artifacts/public/logs/live_backing.log

02:54:40     INFO -  TEST-PASS | tests\jit-test\jit-test\tests\basic\testMathMinMax.js | Success (code 0, args "--no-baseline --no-ion") [18.1 s]
02:54:40     INFO -  {"action": "test_start", "jitflags": "--no-baseline --no-ion", "pid": 4344, "source": "jittests", "test": "basic\\testMathMinMax.js", "thread": "main", "time": 1537412062.613}
02:54:40     INFO -  {"action": "test_end", "extra": {"jitflags": "--no-baseline --no-ion"}, "jitflags": "--no-baseline --no-ion", "message": "Success", "pid": 4344, "source": "jittests", "status": "PASS", "test": "basic\\testMathMinMax.js", "thread": "main", "time": 1537412080.695}
[taskcluster:error] Aborting task...
[taskcluster 2018-09-20T02:54:48.462Z] SUCCESS: The process with PID 1100 (child process of PID 4344) has been terminated.
[taskcluster 2018-09-20T02:54:48.462Z] ERROR: The process with PID 4004 (child process of PID 4344) could not be terminated.
[taskcluster 2018-09-20T02:54:48.462Z] Reason: There is no running instance of the task.

These lastest classifications are all about jittest-1proc android-hw-p2-8-0 Task aborted - max run time exceeded errors, like these tests that are actually perma now https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=android%2C8.0%2Cpixel2%2Caarch64%2Copt%2Cjit&group_state=expanded&tochange=fd2bf318a8b29e7c1ab67b985c19c2c218a76d6e&fromchange=7b326aa4930cad966e0a01d2d3d3d585d35002e2&selectedJob=242138775
All are timing out after 60mins and started somewhere between the above range. The tests are queued up for a long time and the backfills/retriggers method makes it hard to investigate.

Recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=242138775&repo=autoland&lineNumber=1818
I was looking through the logcats also, couldn't find anything relevant so this is just a shot in the dark for me, maybe this is relevant:
DatabaseProcessor: processLocalDevices: failed to get the network info.

Geoff, Bob could you please take a look over this as see what's going on here? Thank you.

Flags: needinfo?(gbrown)
Flags: needinfo?(bob)
Flags: needinfo?(kwright)

It looks like the mozharness "install" step is hung and tests are not run at all.

This change might be to blame?

https://hg.mozilla.org/mozilla-central/diff/60c512bab3e9/testing/mozharness/scripts/android_hardware_unittest.py#l1.30

I think this was the only thing allowing "--test-suite=jittest-chunked" with suite configuration for "jittest". (Full disclosure: I asked ahal to make this change -- sorry!)

Let's check...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c8809d8ba4359131f96d6172043934c50f937e2

I'm still waiting on that try push, but should be able to sort this out.

Assignee: nobody → gbrown
Flags: needinfo?(kwright)
Flags: needinfo?(gbrown)
Flags: needinfo?(bob)

Yeah, there's a backlog on those jobs and only 16 devices they run on. Today on autoland I waited around 300 mins just for the jobs to start. Thanks for the try push.

I still haven't managed to verify this on try, but it seems like the best
explanation for the timeouts.

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1fae6818d4e4
Fix android-hw jittest suite name to avoid timeouts; r=bc
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Regressions: 1546922
Regressions: 1250737
You need to log in before you can comment on or make changes to this bug.