Closed Bug 1430423 Opened 7 years ago Closed 7 years ago

mochitest debug task runtime exceeded

Categories

(Taskcluster :: Workers, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: apavel, Unassigned)

Details

This task was not killed after 1.5 hours. Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=156174592&repo=autoland
It looks like it was killed after about 100 minutes. I think we guarantee *at least* maxRunTime. I suspect this is because generic-worker is polling for the timeout, or something? Pete?
Component: General → Generic-Worker
Flags: needinfo?(pmoore)
Sorry, maybe I misinterpreted the question -- the description doesn't actually spell it out. Is the issue that this task is taking longer than its maxRunTime (in which case the PI team should probably consider either adjusting chunking or increasing maxRunTime) or is the issue that the task was allowed to run a few minutes longer than the maxRunTime?
Flags: needinfo?(apavel)
Sorry for not being more precise. The task took more that it's maximum runtime. From IRC: Aryx> it should run max for 1.5 hours https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/mochitest.yml#52 9:53 PM but the log shows it runs for 1h 45min. From the task description: Started:Sat Jan 13, 16:12:31 Ended:Sat Jan 13, 17:56:05 Duration:104 minute(s) Let me know if there is any other info I can provide.
Flags: needinfo?(apavel)
$ curl -L 'https://public-artifacts.taskcluster.net/IdEw_m-WTJGabhhM1ETaQg/0/public/logs/live_backing.log' | gunzip | grep '^\[taskcluster' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0[taskcluster 2018-01-13T14:12:37.623Z] Worker Type (gecko-t-win7-32) settings: [taskcluster 2018-01-13T14:12:37.625Z] { [taskcluster 2018-01-13T14:12:37.625Z] "aws": { [taskcluster 2018-01-13T14:12:37.627Z] "ami-id": "ami-50221e30", [taskcluster 2018-01-13T14:12:37.627Z] "availability-zone": "us-west-1c", [taskcluster 2018-01-13T14:12:37.627Z] "instance-id": "i-0aea2a475a121a6de", [taskcluster 2018-01-13T14:12:37.628Z] "instance-type": "c4.2xlarge", [taskcluster 2018-01-13T14:12:37.629Z] "local-ipv4": "10.143.46.30", [taskcluster 2018-01-13T14:12:37.630Z] "public-hostname": "ec2-54-183-203-97.us-west-1.compute.amazonaws.com", [taskcluster 2018-01-13T14:12:37.630Z] "public-ipv4": "54.183.203.97" [taskcluster 2018-01-13T14:12:37.631Z] }, [taskcluster 2018-01-13T14:12:37.632Z] "config": { [taskcluster 2018-01-13T14:12:37.633Z] "deploymentId": "39813c3c2266", [taskcluster 2018-01-13T14:12:37.633Z] "runTaskAsCurrentUser": true [taskcluster 2018-01-13T14:12:37.634Z] }, [taskcluster 2018-01-13T14:12:37.635Z] "generic-worker": { [taskcluster 2018-01-13T14:12:37.635Z] "go-arch": "386", [taskcluster 2018-01-13T14:12:37.636Z] "go-os": "windows", [taskcluster 2018-01-13T14:12:37.637Z] "go-version": "go1.7.5", [taskcluster 2018-01-13T14:12:37.637Z] "release": "https://github.com/taskcluster/generic-worker/releases/tag/v8.2.0", [taskcluster 2018-01-13T14:12:37.638Z] "version": "8.2.0" [taskcluster 2018-01-13T14:12:37.639Z] }, [taskcluster 2018-01-13T14:12:37.640Z] "machine-setup": { [taskcluster 2018-01-13T14:12:37.640Z] "ami-created": "2017-10-27 11:07:12.659Z", [taskcluster 2018-01-13T14:12:37.641Z] "manifest": "https://github.com/mozilla-releng/OpenCloudConfig/blob/39813c3c2266ae063ff510b34c9b0b2836032b4b/userdata/Manifest/gecko-t-win7-32.json" [taskcluster 2018-01-13T14:12:37.643Z] } [taskcluster 2018-01-13T14:12:37.644Z] } [taskcluster 2018-01-13T14:12:37.645Z] === Task Starting === [taskcluster 2018-01-13T14:12:37.645Z] Executing command 0: c:\mozilla-build\python\python.exe -u mozharness\scripts\desktop_unittest.py --cfg mozharness\configs\unittests\win_taskcluster_unittest.py --mochitest-suite=plain-chunked --e10s --no-read-buildbot-config --installer-url https://queue.taskcluster.net/v1/task/GYeXN-kJSniZnmn51B-V0A/artifacts/public/build/target.zip --test-packages-url https://queue.taskcluster.net/v1/task/GYeXN-kJSniZnmn51B-V0A/artifacts/public/build/target.test_packages.json --download-symbols true --mochitest-suite=plain-chunked --e10s --total-chunk=5 --this-chunk=3 [taskcluster 2018-01-13T15:42:30.319Z] Aborting task - max run time exceeded! [taskcluster 2018-01-13T15:42:30.633Z] Exit Code: 0 [taskcluster 2018-01-13T15:42:30.634Z] User Time: 46.8003ms [taskcluster 2018-01-13T15:42:30.634Z] Kernel Time: 93.6006ms [taskcluster 2018-01-13T15:42:30.636Z] Wall Time: 1h29m52.88s [taskcluster 2018-01-13T15:42:30.638Z] Peak Memory: 2252800 [taskcluster 2018-01-13T15:42:30.639Z] Result: IDLENESS_LIMIT_EXCEEDED [taskcluster 2018-01-13T15:42:30.639Z] === Task Finished === [taskcluster 2018-01-13T15:42:30.640Z] Task Duration: 1h29m52.994s 100 675k 100 675k 0 0 5459k 0 --:--:-- --:--:-- --:--:-- 5495k [taskcluster 2018-01-13T15:54:43.513Z] Killing process since task reclaim resulted in exception: Not updating status of task IdEw_m-WTJGabhhM1ETaQg run 0 from Aborted to Reclaimed. This is because you can only update to status Reclaimed if the previous status was one of: [Claimed Reclaimed] (current status: Aborted) So the worker thinks we had: [taskcluster 2018-01-13T14:12:37.645Z] === Task Starting === [taskcluster 2018-01-13T15:42:30.639Z] === Task Finished === [taskcluster 2018-01-13T15:42:30.640Z] Task Duration: 1h29m52.994s Note, after maxRunTime is exceeded, task is stopped, artifacts are then uploaded, and finally task is resolved. Perhaps artifact upload took 14 mins?
Flags: needinfo?(pmoore)
Please also note this is a very old generic-worker from April 2017, that will be upgraded soon in bug 1399401. In later workers, you see exactly how long artifact uploads take, so it is easier to diagnose such things.
Sounds like that's the root of the issue, then..
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.