Closed
Bug 1430423
Opened 7 years ago
Closed 7 years ago
mochitest debug task runtime exceeded
Categories
(Taskcluster :: Workers, enhancement)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: apavel, Unassigned)
Details
This task was not killed after 1.5 hours.
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=156174592&repo=autoland
Comment 1•7 years ago
|
||
It looks like it was killed after about 100 minutes. I think we guarantee *at least* maxRunTime. I suspect this is because generic-worker is polling for the timeout, or something? Pete?
Component: General → Generic-Worker
Flags: needinfo?(pmoore)
Comment 2•7 years ago
|
||
Sorry, maybe I misinterpreted the question -- the description doesn't actually spell it out.
Is the issue that this task is taking longer than its maxRunTime (in which case the PI team should probably consider either adjusting chunking or increasing maxRunTime) or is the issue that the task was allowed to run a few minutes longer than the maxRunTime?
Flags: needinfo?(apavel)
| Reporter | ||
Comment 3•7 years ago
|
||
Sorry for not being more precise. The task took more that it's maximum runtime.
From IRC:
Aryx> it should run max for 1.5 hours https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/mochitest.yml#52
9:53 PM but the log shows it runs for 1h 45min.
From the task description:
Started:Sat Jan 13, 16:12:31
Ended:Sat Jan 13, 17:56:05
Duration:104 minute(s)
Let me know if there is any other info I can provide.
Flags: needinfo?(apavel)
| Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
$ curl -L 'https://public-artifacts.taskcluster.net/IdEw_m-WTJGabhhM1ETaQg/0/public/logs/live_backing.log' | gunzip | grep '^\[taskcluster'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0[taskcluster 2018-01-13T14:12:37.623Z] Worker Type (gecko-t-win7-32) settings:
[taskcluster 2018-01-13T14:12:37.625Z] {
[taskcluster 2018-01-13T14:12:37.625Z] "aws": {
[taskcluster 2018-01-13T14:12:37.627Z] "ami-id": "ami-50221e30",
[taskcluster 2018-01-13T14:12:37.627Z] "availability-zone": "us-west-1c",
[taskcluster 2018-01-13T14:12:37.627Z] "instance-id": "i-0aea2a475a121a6de",
[taskcluster 2018-01-13T14:12:37.628Z] "instance-type": "c4.2xlarge",
[taskcluster 2018-01-13T14:12:37.629Z] "local-ipv4": "10.143.46.30",
[taskcluster 2018-01-13T14:12:37.630Z] "public-hostname": "ec2-54-183-203-97.us-west-1.compute.amazonaws.com",
[taskcluster 2018-01-13T14:12:37.630Z] "public-ipv4": "54.183.203.97"
[taskcluster 2018-01-13T14:12:37.631Z] },
[taskcluster 2018-01-13T14:12:37.632Z] "config": {
[taskcluster 2018-01-13T14:12:37.633Z] "deploymentId": "39813c3c2266",
[taskcluster 2018-01-13T14:12:37.633Z] "runTaskAsCurrentUser": true
[taskcluster 2018-01-13T14:12:37.634Z] },
[taskcluster 2018-01-13T14:12:37.635Z] "generic-worker": {
[taskcluster 2018-01-13T14:12:37.635Z] "go-arch": "386",
[taskcluster 2018-01-13T14:12:37.636Z] "go-os": "windows",
[taskcluster 2018-01-13T14:12:37.637Z] "go-version": "go1.7.5",
[taskcluster 2018-01-13T14:12:37.637Z] "release": "https://github.com/taskcluster/generic-worker/releases/tag/v8.2.0",
[taskcluster 2018-01-13T14:12:37.638Z] "version": "8.2.0"
[taskcluster 2018-01-13T14:12:37.639Z] },
[taskcluster 2018-01-13T14:12:37.640Z] "machine-setup": {
[taskcluster 2018-01-13T14:12:37.640Z] "ami-created": "2017-10-27 11:07:12.659Z",
[taskcluster 2018-01-13T14:12:37.641Z] "manifest": "https://github.com/mozilla-releng/OpenCloudConfig/blob/39813c3c2266ae063ff510b34c9b0b2836032b4b/userdata/Manifest/gecko-t-win7-32.json"
[taskcluster 2018-01-13T14:12:37.643Z] }
[taskcluster 2018-01-13T14:12:37.644Z] }
[taskcluster 2018-01-13T14:12:37.645Z] === Task Starting ===
[taskcluster 2018-01-13T14:12:37.645Z] Executing command 0: c:\mozilla-build\python\python.exe -u mozharness\scripts\desktop_unittest.py --cfg mozharness\configs\unittests\win_taskcluster_unittest.py --mochitest-suite=plain-chunked --e10s --no-read-buildbot-config --installer-url https://queue.taskcluster.net/v1/task/GYeXN-kJSniZnmn51B-V0A/artifacts/public/build/target.zip --test-packages-url https://queue.taskcluster.net/v1/task/GYeXN-kJSniZnmn51B-V0A/artifacts/public/build/target.test_packages.json --download-symbols true --mochitest-suite=plain-chunked --e10s --total-chunk=5 --this-chunk=3
[taskcluster 2018-01-13T15:42:30.319Z] Aborting task - max run time exceeded!
[taskcluster 2018-01-13T15:42:30.633Z] Exit Code: 0
[taskcluster 2018-01-13T15:42:30.634Z] User Time: 46.8003ms
[taskcluster 2018-01-13T15:42:30.634Z] Kernel Time: 93.6006ms
[taskcluster 2018-01-13T15:42:30.636Z] Wall Time: 1h29m52.88s
[taskcluster 2018-01-13T15:42:30.638Z] Peak Memory: 2252800
[taskcluster 2018-01-13T15:42:30.639Z] Result: IDLENESS_LIMIT_EXCEEDED
[taskcluster 2018-01-13T15:42:30.639Z] === Task Finished ===
[taskcluster 2018-01-13T15:42:30.640Z] Task Duration: 1h29m52.994s
100 675k 100 675k 0 0 5459k 0 --:--:-- --:--:-- --:--:-- 5495k
[taskcluster 2018-01-13T15:54:43.513Z] Killing process since task reclaim resulted in exception: Not updating status of task IdEw_m-WTJGabhhM1ETaQg run 0 from Aborted to Reclaimed. This is because you can only update to status Reclaimed if the previous status was one of: [Claimed Reclaimed] (current status: Aborted)
So the worker thinks we had:
[taskcluster 2018-01-13T14:12:37.645Z] === Task Starting ===
[taskcluster 2018-01-13T15:42:30.639Z] === Task Finished ===
[taskcluster 2018-01-13T15:42:30.640Z] Task Duration: 1h29m52.994s
Note, after maxRunTime is exceeded, task is stopped, artifacts are then uploaded, and finally task is resolved. Perhaps artifact upload took 14 mins?
Flags: needinfo?(pmoore)
Comment 6•7 years ago
|
||
Please also note this is a very old generic-worker from April 2017, that will be upgraded soon in bug 1399401. In later workers, you see exactly how long artifact uploads take, so it is easier to diagnose such things.
Comment 7•7 years ago
|
||
Sounds like that's the root of the issue, then..
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
| Assignee | ||
Updated•6 years ago
|
Component: Generic-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•