Closed
Bug 1394779
Opened 8 years ago
Closed 8 years ago
unable to backfill or add new jobs both TC and BBB
Categories
(Taskcluster :: General, enhancement)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla57
People
(Reporter: jmaher, Assigned: bstack)
References
Details
Attachments
(2 files)
we are stuck on a lot of perf regressions as backfill attempts yesterday and today have resulted in no jobs being added (this is both the 'backfill' and the 'add new jobs' actions from treeherder).
For a talos job that uses BBB, I see this:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=201c0c94bae0f87ce4b9af5ba21465761b0fc987&selectedJob=126717516&filter-searchStr=action
and for an android job that is 100% TC, I see this:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2932577f253f4d2fe8f459bb281a1d92695c417a&selectedJob=126717261&filter-searchStr=action
the action task failures are identical with this text:
[taskcluster 2017-08-29 12:07:47.966Z] === Task Starting ===
[taskcluster:error] Failure to properly start execution environment.
[taskcluster:error] (HTTP code 404) no such container - invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/builds/worker/bin/run-task\\\": stat /builds/worker/bin/run-task: no such file or directory\"\n"
[taskcluster 2017-08-29 12:07:48.291Z] === Task Finished ===
[taskcluster 2017-08-29 12:07:48.353Z] Artifact "public" not found at "/builds/worker/artifacts"
[taskcluster 2017-08-29 12:07:48.687Z] Unsuccessful task run with exit code: -1 completed in 1.371 seconds
Reporter | ||
Comment 1•8 years ago
|
||
:garndt, can you find someone on the TC team to look into this and get this resolved? I guess if this isn't a TC issue, possibly you would know what team should be working on it? I assume TC give the use of GO code.
Flags: needinfo?(garndt)
Comment 2•8 years ago
|
||
That's an old-style actions.yml task, so that will be going away soon. It's using an old decision task image (0.1.7, newest is 0.1.10). Wander just moved everything from /home/worker to /builds/worker, but that directory does not exist on this image.. or on 0.1.10. So I think the fix is to revert that change to actions.yml.
Flags: needinfo?(garndt)
Reporter | ||
Comment 3•8 years ago
|
||
thanks for the reply :dustin- will this work retroactively on the tree? I assume so since I don't see actions.yml in-tree
Comment 4•8 years ago
|
||
No, it won't, but it's a one-line patch so you could push it to try. The rename only landed yesterday, though.
Comment 5•8 years ago
|
||
Wander, as a side-note -- I see that .taskcluster.yml still has /home/worker. Should we fix that up and generate a new decision image, so that everything is consistently /builds/worker?
Flags: needinfo?(wcosta)
Comment hidden (mozreview-request) |
Reporter | ||
Comment 7•8 years ago
|
||
using the trick to edit an action task and s/builds/home/, then create a new task- worked to get green action tasks. I have jobs for the taskcluster tests, but I do not have jobs for the BBB yet, I will try a few more times there.
Reporter | ||
Comment 8•8 years ago
|
||
and this trick worked for the BBB jobs as well.
Comment 9•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> Wander, as a side-note -- I see that .taskcluster.yml still has
> /home/worker. Should we fix that up and generate a new decision image, so
> that everything is consistently /builds/worker?
I really don't know, that's why I kept it untouched.
Flags: needinfo?(wcosta)
Comment 10•8 years ago
|
||
mozreview-review |
Comment on attachment 8902240 [details]
Bug 1394779: decision image still uses /home;
https://reviewboard.mozilla.org/r/173766/#review179116
Attachment #8902240 -
Flags: review?(wcosta) → review+
Comment 11•8 years ago
|
||
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6490ba9e0ec7
decision image still uses /home; r=wcosta
Comment 12•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 13•8 years ago
|
||
This was only a partial fix. It fixes action tasks that use action.yml but for backfilling, the command is hardcoded here:
https://hg.mozilla.org/integration/autoland/file/6b9d06ba6f769234530ae67d8353377d58a93fd0/taskcluster/taskgraph/actions/registry.py#l243
Either we push a change out for this as well (along with the other references to builds/worker in that file), or we can try to get bug 1394883 landed.
Comment 14•8 years ago
|
||
I think bug 1394883 is close to landing (of course, it won't help with regressions)
Reporter | ||
Comment 15•8 years ago
|
||
this is still broken:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&filter-searchStr=action&tochange=9815926c3bc14b22941415a5e036d1be2bc87fdf&fromchange=192a10e664c726add22ea36d1b050bc6daf81f54&selectedJob=129780086
:garndt, can you help make sure this is resolved asap- we are wasting a lot of time backfilling
Flags: needinfo?(garndt)
Reporter | ||
Updated•8 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 16•8 years ago
|
||
as a note, this was from pushes on September 8th, luckily add new jobs worked!
Comment 17•8 years ago
|
||
We took a brief look into this issue but was unsure what the steps to reproduce were. What job was being backfilled? I see some other backfill requests successfully completing.
On the failed actions, we noticed that the action task and action task ID were not filled out in the task payload (shows as "null" whereas a successful run has much more data there).
I am not sure how an action task could be scheduled without providing that information but some STR would help track it down.
Flags: needinfo?(garndt) → needinfo?(jmaher)
Reporter | ||
Comment 18•8 years ago
|
||
typically I am trying to backfill an AWSY job, here are some repro steps:
1) go to mozilla inbound and filter on awsy: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=awsy
1.5) I narrowed the range down to focus on specific revisions: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=awsy&fromchange=bc1f526a6152eb8a810c78041678b249c0906314&tochange=0e2f9e7b7fd7ab31640383e64c8b7bf4c602d828
2) select linux64 awsy: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=awsy&fromchange=bc1f526a6152eb8a810c78041678b249c0906314&tochange=0e2f9e7b7fd7ab31640383e64c8b7bf4c602d828&selectedJob=130126553
3) from the popup pane, click the '...' and click 'backfill'.
4) verify green bar with text as a dialog popup saying | Request sent to backfill job via actions.json ...|
5) look at the action tasks: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=action&fromchange=bc1f526a6152eb8a810c78041678b249c0906314&tochange=0e2f9e7b7fd7ab31640383e64c8b7bf4c602d828
6) verify we have a red Bk job
As a note, I didn't need to backfill that specific job, but went through the exercise in detail, it is not that big of a deal to backfill a random job
Flags: needinfo?(jmaher)
Comment 19•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → bstack
Assignee | ||
Updated•8 years ago
|
Attachment #8907427 -
Flags: review?(cdawson)
Assignee | ||
Comment 20•8 years ago
|
||
Thanks for the good repro steps! Figured out what I had missed the first time around. Hopefully this patch fixes it although backfilling is difficult to test.
Comment 21•8 years ago
|
||
I've started pulse_actions back again just in case there's something the action tasks are not handling.
Let me know when you think I can shut it off again.
Updated•8 years ago
|
Attachment #8907427 -
Flags: review?(cdawson) → review+
Comment 22•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/9d5515d0626eb552a083d329e309d9ee0a6797f3
Bug 1394779 - Fix backfilling with actions.json tasks (#2768)
Assignee | ||
Updated•8 years ago
|
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•