Decision task frequently fails with mach try auto
Categories
(Firefox Build System :: Task Configuration, defect)
Tracking
(Not tracked)
People
(Reporter: sg, Assigned: ahal)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
In the last days, I experienced Decision Task failures/timeouts on try pushes frequently. Today, it only worked on the fourth attempt.
all failed, then finally
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fc665bd3fd471d955dd7c816b077c9165685970
succeeded (I changed the commit message on the last one, that's why it shows a different revision, but the content was exactly the same).
This has cost me a lot of time, not sure if others are affected as well.
Comment 1•5 years ago
|
||
I've asked the Taskcluster team to look into this.
Treeherder mainly displays what happens on Taskcluster.
Comment 2•5 years ago
|
||
Hi marco, ahal,
This seems to be an issue with mach try auto.
Is there a way to make it more obvious under which component/repo should issues be filed against?
Assignee | ||
Comment 3•5 years ago
|
||
Fyi ./mach try auto is very experimental atm (we haven't announced it anywhere yet), so expect issues.
I think what's happening here is that for some reason the bugbug
service is failing to compute the results for this push, then the taskgraph isn't propagating the error properly. It would also help if ./mach try auto
enabled verbose logging in the Decision task to help see what's going on.
Reporter | ||
Comment 4•5 years ago
|
||
Oh, interesting. Sorry I didn't mention that these used mach try auto
. Since it didn't fail deterministically, I thought it were an infrastructure issue. (mach try auto
is incredibly useful, so it would be really great if it worked reliably)
Assignee | ||
Comment 5•5 years ago
|
||
Yikes.. I forgot to increment i
in the timeout code:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/optimize/bugbug.py#46
So my guess was correct. I'll fix the timeout so that this doesn't wait 30 minutes to fail. Though the underlying cause seems to be that the service just isn't processing this push (it presumably keeps returning 202
).
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
Updated•5 years ago
|
Comment 8•5 years ago
|
||
(In reply to Simon Giesecke [:sg] [he/him] from comment #4)
Oh, interesting. Sorry I didn't mention that these used
mach try auto
. Since it didn't fail deterministically, I thought it were an infrastructure issue. (mach try auto
is incredibly useful, so it would be really great if it worked reliably)
Have you seen failures with specific patches, or generically? I'm going to add more logging in the bugbug service so I can more easily find out what happens when things go wrong.
Comment 9•5 years ago
|
||
Just a suggestion, while this is still experimental, instead of pushing again you could retrigger the decision task.
Comment 10•5 years ago
|
||
bugherder |
Reporter | ||
Comment 11•5 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #8)
(In reply to Simon Giesecke [:sg] [he/him] from comment #4)
Oh, interesting. Sorry I didn't mention that these used
mach try auto
. Since it didn't fail deterministically, I thought it were an infrastructure issue. (mach try auto
is incredibly useful, so it would be really great if it worked reliably)Have you seen failures with specific patches, or generically? I'm going to add more logging in the bugbug service so I can more easily find out what happens when things go wrong.
I am not completely sure, but I guess the failed attempt were all changing quite basic things in mfbt
or xpcom/ds
.
(In reply to Marco Castelluccio [:marco] from comment #9)
Just a suggestion, while this is still experimental, instead of pushing again you could retrigger the decision task.
Unfortunately, due to an issue with my account, I can't retrigger any tasks at the moment. Hope this will be resolved soon.
Comment 12•5 years ago
|
||
I made quite a few improvements in the bugbug HTTP service, so this should be fixed.
Updated•5 years ago
|
Description
•