Closed Bug 1473048 Opened 2 years ago Closed Last year

Don't run coverage builds for when "mach try all"

Categories

(Testing :: Code Coverage, enhancement)

enhancement
Not set

Tracking

(firefox-esr60 fixed, firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr60 --- fixed
firefox63 --- fixed

People

(Reporter: marco, Assigned: sparky)

References

Details

Attachments

(1 file)

Should we run them on "mach try all"? Or are the results mostly equivalent to a debug build?
Flags: needinfo?(jmaher)
the chunking/runtime is different- it seems overkill.  Also the builds will take considerably longer as we can do artifact builds for debug when someone specifies artifact and they are not doing c++/rust coding.

another thing is we have some coverage only failures, but that is minimal.  If we are considering replacing debug proper with coverage, that could be a fun conversation- remember we do things to in mozconfig:
https://searchfox.org/mozilla-central/source/browser/config/mozconfigs/linux64/code-coverage

ac_add_options --disable-install-strip
ac_add_options --disable-elf-hack
ac_add_options --disable-sandbox
ac_add_options --disable-profiling
ac_add_options --disable-warnings-as-errors
ac_add_options --enable-coverage


I am not sure if any of those would have us skip real errors that would be caught by a traditional debug build.
Flags: needinfo?(jmaher)
Here's another bug regarding this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1397722
And another one that was closed but has some messages from when it started again: https://bugzilla.mozilla.org/show_bug.cgi?id=1305242
Blocks: 1397722, 1305242
:sparky- is this a bug you could work on sometime next week?
Flags: needinfo?(gmierz2)
Yup, I'll look into it this next week.
Assignee: nobody → gmierz2
Flags: needinfo?(gmierz2)
Here's another bug about this problem (resolved because we found another solution): https://bugzilla.mozilla.org/show_bug.cgi?id=1391483
This is an interesting issue. From what I see we have a couple problems to deal with here:

(1): Running '-p all' is the only way to schedule linux64-ccov tests with try option syntax.
(2): We can only schedule it with |mach try fuzzy|.
(3): Both seem to depend on the full-task-graph created by the decision taskgraph.

Working on a possible solution by using the build.py transforms here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/build.py
Using build.py almost worked: https://treeherder.mozilla.org/#/jobs?repo=try&revision=09e099ddbd4e8e2e0dcac0342dfc7150b8ef67ea

Through |mach try fuzzy| it was enabled because this command uses 'mozilla-central' as the default project. But through the try option syntax it was disabled since it uses 'try'. Now, because the gecko decision task uses 'try' as the project as well, the scheduled tasks were prevented from running.
I was hoping we could make a little hack to get around this but there's no hope there: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1eb90f94da8ea4d9dfdbd247e04d79f1aa30a71

I tried the patch above, where I disabled it based on the 'project' flag, but it didn't work because |mach try fuzzy| uses 'mozilla-central' to find the tasks, but the decision task on taskcluster then uses the 'try' flag which removes them.

:ahal, is this a bug in `try fuzzy` or expected behaviour? This doesn't block me, I am just curious about why it's different.

I found a way around this issue by using the `config` argument in the transforms to decide based on the `try_mode` flag in parameters.yml: https://dxr.mozilla.org/mozilla-central/source/taskcluster/docs/parameters.rst#82

Here's a working build (just need to add jsdcov there as well): https://treeherder.mozilla.org/#/jobs?repo=try&revision=818a95fb926763c2705a8867a186ee1c4c73de63 

I double checked if everything still works on mozilla-central and it does, or at least I think based on the target_tasks output by this command: 

mach taskgraph decision --base-repository https://hg.mozilla.org/mozilla-central --head-repository https://hg.mozilla.org/mozilla-central --head-ref 000 --head-rev d0d2e0f4b33c --project mozilla-central --message "try: -b do -p all -u none -t none" --owner me --level 1 --pushlog-id 1 --pushdate 11

Swapping `--project mozilla-central` with `--project try` shows that it still works with the 'mozilla-central' project but is completely disabled on try when try option syntax is used. I don't think that's how it's scheduled on mozilla-central but at least we can be sure that this change only prevents try option syntax from triggering any coverage builds (win, osx, linux, and android) on only try.
Flags: needinfo?(ahal)
Sorry, can you clarify the question?

Fwiw, I think the solution to this bug should be entirely in taskcluster/taskgraph/try_option_syntax.py. Manipulating the task definition just to appease try syntax doesn't seem desirable. Unfortunately (or maybe fortunately) try_option_syntax.py is one giant hack so you'll have to do some experimenting. I think this is the if statement where you'll want to return False for ccov builds:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/try_option_syntax.py#601
Flags: needinfo?(ahal)
Here's where that task_matches() function gets called:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#110

So any task that returns True there will get scheduled on the try push.
The 'project' entry in the parameters file or object when running |mach try fuzzy| is 'mozilla-central' but when running the regular decision task (D) on try it uses 'try' as the value for the 'project' entry. Is there a reason for the difference?

I think that's around the area where the bug first started when windows 8 was added: https://bug1391483.bmoattachments.org/attachment.cgi?id=8899135

So like you said it is a big hack to play with that so I'd rather use that as a last option. I found a better way of fixing this problem without changing the task definitions: https://hg.mozilla.org/try/rev/e708f408284dcbb87dd75e078f6c70f29f3ee1bc

Or is that the method you are suggesting I don't use?
And if you're wondering, I think the 'mozilla-central' project parameter I mention comes from here: https://dxr.mozilla.org/mozilla-central/source/tools/tryselect/tasks.py?q=path%3Atryselect+project&redirect_type=single#50
(In reply to Greg Mierzwinski [:sparky] from comment #12)
> The 'project' entry in the parameters file or object when running |mach try
> fuzzy| is 'mozilla-central' but when running the regular decision task (D)
> on try it uses 'try' as the value for the 'project' entry. Is there a reason
> for the difference?

Yes, this is expected. When generating the taskgraph with |mach try fuzzy| the end goal is to prevent a backout. So by default we want to select from the same set of tasks that run on mozilla-central (which is typically all the tasks that might get you backed out). If we included the set of tasks that were 'try-only', then we'd either be wasting resources (as they can't cause a backout anyway) or causing confusion (as they are very error prone).


> So like you said it is a big hack to play with that so I'd rather use that
> as a last option. I found a better way of fixing this problem without
> changing the task definitions:
> https://hg.mozilla.org/try/rev/e708f408284dcbb87dd75e078f6c70f29f3ee1bc
> 
> Or is that the method you are suggesting I don't use?

Yes, that is what I meant not to do :) (although it's not like it's a huge deal or anything).

IMO the rules for how try syntax behaves should live in try_option_syntax.py as much as possible. While it is hacky, at least it keeps the hack in one place, rather than spreading the hack throughout the task generation code. It's also a bit of a case of the "tail wagging the dog". Try syntax should adapt to changes in the taskgraph. We shouldn't adapt the taskgraph to modify try syntax.

All that being said, I think fixing this in try_option_syntax.py isn't terribly difficult. Reading the code, I think you just have to add a new clause similar to this one:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/try_option_syntax.py#549
Thanks for the explanation! It's a neat way of accomplishing that backout purpose.

Ok sounds good, I'll implement it that way. I'm more worried that any changes in that function will break something else other than ccov, but I think you're right and that it won't be as bad as I expect.
This patch completely disables *ccov, and *jsdcov builds and tests when scheduling them through try option syntax as these build variations use a lot of resources and are rarely needed to be scheduled. The only way to schedule code coverage from now on will be with the |mach try fuzzy| tool.
Here's a try run for the patch showing that it's still possible to use |mach try fuzzy| after these changes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83bb87afa8a95b78f8e1f1411f332c33fa26f10d

I checked that it's ignored in the try option syntax by using the following command (changing the message):

mach taskgraph decision --base-repository https://hg.mozilla.org/mozilla-central --head-repository https://hg.mozilla.org/mozilla-central --head-ref 000 --head-rev d0d2e0f4b33c --project try --message "try: -b do -p all -u all -t none" --owner me --level 1 --pushlog-id 1 --pushdate 11
Comment on attachment 9002920 [details]
Bug 1473048 - Prevent code coverage builds from running when using try option syntax. r?ahal

Andrew Halberstadt [:ahal] has approved the revision.
Attachment #9002920 - Flags: review+
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bdd6d0de763d
Prevent code coverage builds from running when using try option syntax. r=ahal
https://hg.mozilla.org/mozilla-central/rev/bdd6d0de763d
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.