Closed Bug 1372510 Opened 3 years ago Closed 3 years ago

running linux64 opt builds on try when not requesting them

Categories

(Taskcluster :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla56

People

(Reporter: jmaher, Assigned: ahal)

References

Details

Attachments

(1 file)

I am pushing to try testing windows specific tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=74a28492c19855a0821bb50ae9187787df83c687

but I keep noticing that linux builds are taking place.  Looking at the gecko decision task output, I don't see the build as showing up:
https://public-artifacts.taskcluster.net/GvhbVBmlQ8-pQDGIQA_qXQ/0/public/target-tasks.json

I verified that flake8 and the 'doc' jobs do not depend on the linux64-opt build by using task inspector and viewing the dependencies.  I also check for tier-3 and excluded jobs.

we really shouldn't be building stuff that is not requested, or at least make it easier to find out why we are running this build (i.e. it should be in the gecko decision task).
What happened here is that one of the jobs (in the `-j` sense) required a build, and was included in the target set because `-j all` is the default. But then that job was optimized away because none of the required files were affected, so it didn't run. However, the optimization process didn't further optimize away the now-un-required build job.

Given the current nature of Try, I don't think we could do that -- we have to allow optimizing away tasks selected by try syntax, since that syntax specifies `-j all` most of the time.  But if we allow that optimization to "cascade" then it would be impossible to run just a build in a try push (`-b o -p win64 -t none -u none`) since that build would have no dependencies and thus be optimized away.

Trying to fix any of that will upset the delicate balance of other behaviors, I think.

I think that the fix here is splitting try into "try this" (run exactly what is requested, meaning *not* all jobs by default) and "just try it" (run what would run for a normal push, based on what has changed).

Andrew, thoughts?
Flags: needinfo?(ahalberstadt)
Doh, I didn't realize this was even happening as all my try pushes for that bug were modifying mochitest.

Unfortunately I don't think we can do your proposal. If we did, then stuff like ESLint wouldn't run on a try push with try syntax, which would lead to lots of backouts. I think once we actually get serious about fixing try server (and dedicating person months of time to it), then we could figure out something along the lines of your proposal, but I don't see any quick and easy fixes along those lines.

Here are some other potential ways to fix this. Most of them are terrible (reasons why in brackets), but I'm just brainstorming out loud:
  1. Add build dependency after we know whether task got optimized (probably not possible, or requires hackiness)
  2. Reverse optimization for job tasks. Start with all job tasks optimized, then add them back in if files-changed matches
     (goes against the very nature of taskcluster, probably invasive change)
  3. Exclude tasks with 'require-build' from the default --jobs in try syntax (tasks won't run when they should on try
     unless explicitly specified, a bit hacky)

I'd like to spend a bit of time looking into 1) to see if it would be possible/how hacky it would be. Failing that, I think we should settle on 3) until we can come up with a better long term solution like you proposed.
Blocks: 1048446
Flags: needinfo?(ahalberstadt)
Hm, I thought we were fairly well agreed on that proposal - can you bring your objections up on the mailing list?

I think #3 will fix the reported issue, but it amounts to just giving up on this particular special case -- there are probably lots of cases where this sort of thing happens.

If we want to keep with the legacy syntax, probably the best fix is to tag tasks with one of three levels of required-ness: user requested it explicitly; added by default (including ridealong jobs); or not requested.  Then we would never optimize those that are requested explicitly, but would optimize the other two, including optimizing away tasks whose dependencies have been optimized away.

But I think that's just doubling down on an already bad design and will inevitably lead to dozens of additional bugs and unexpected behavior, so I'd prefer to leave this relatively minor issue unfixed until we put a better design in place.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
(In reply to Dustin J. Mitchell [:dustin] from comment #3)
> Hm, I thought we were fairly well agreed on that proposal - can you bring
> your objections up on the mailing list?

I think I do agree with your proposal.. But only as long as there's a better "try this" mechanism than try syntax. I think for now, we need to make sure jobs are run no matter what.


> I think #3 will fix the reported issue, but it amounts to just giving up on
> this particular special case -- there are probably lots of cases where this
> sort of thing happens.

I consider this a fairly serious regression as we're now running linux64 builds on every try push no matter what. So yes, you're right it is giving up on this special case (for now). But I think we need to get a fix in quickly rather than implement the perfect solution. We should file a follow-up bug for this (though even then I'd be tempted to punt until we re-implement interacting with try).


> If we want to keep with the legacy syntax, probably the best fix is to tag
> tasks with one of three levels of required-ness: user requested it
> explicitly; added by default (including ridealong jobs); or not requested. 
> Then we would never optimize those that are requested explicitly, but would
> optimize the other two, including optimizing away tasks whose dependencies
> have been optimized away.
> 
> But I think that's just doubling down on an already bad design and will
> inevitably lead to dozens of additional bugs and unexpected behavior, so I'd
> prefer to leave this relatively minor issue unfixed until we put a better
> design in place.

I agree with the first part of this paragraph, but not the last. I think not being able to easily schedule mochitest-harness tasks is a relatively minor issue, and running extraneous builds is the major issue.
Comment on attachment 8877180 [details]
Bug 1372510 - Exclude 'job' tasks that require a build from the default try selection,

https://reviewboard.mozilla.org/r/148550/#review152958

The trychooser UI doesn't seem to know `-j` at all, so I suspect these jobs will (almost) never be run in try with this change, and thus might cause backouts.  Is that OK?
Comment on attachment 8877180 [details]
Bug 1372510 - Exclude 'job' tasks that require a build from the default try selection,

https://reviewboard.mozilla.org/r/148550/#review152972

Hm, I hadn't seen comment 5 when I reviewed.  It sounds like that might be OK, in which case r+..
Attachment #8877180 - Flags: review?(dustin) → review+
Yeah, I think it's OK for now. The only task this affects currently is the mochitest-harness task (aka mochitest selftests). I'm probably one of like 3 people who would ever need to run that task. I'll also find some time to fix bug 1368438 this week so that at least it can be run with -j from ./mach try.

But I do agree this isn't ideal. I'll file a follow-up to tackle this properly, though as mentioned in comment 5, I probably won't bother working on it until we start tackling the try problem more seriously (hopefully in Q3).

Thanks for bearing with me through all this.
Thanks for your help!
I had a flake8 and test error. Latest push (trivial) fixes those:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e36054b3f62f94508d5bfdb64b415fe00c305aa4
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/31ee46d1814d
Exclude 'job' tasks that require a build from the default try selection, r=dustin
https://hg.mozilla.org/mozilla-central/rev/31ee46d1814d
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Duplicate of this bug: 1372720
You need to log in before you can comment on or make changes to this bug.