Closed Bug 1648292 Opened 4 years ago Closed 4 years ago

Eliminate vanilla opt tests from trunk (except try)

Categories

(Testing :: General, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1650208

People

(Reporter: egao, Assigned: bhearsum)

References

(Blocks 1 open bug)

Details

Problem Statement

Mozilla currently runs both the vanilla opt and the shippable/opt tests on trunk.

This is true for the following platforms:

  • linux1804-64
  • window7-32
  • windows10-64
  • android-hw-p2-8-0-android-aarch64

This leads to a level of duplication for these platforms between the vanilla opt and the shippable variants.

Update 2020/06/25: the numbers used in this comment is not entirely accurate (incorrect methodologies used). :jmaher was able to arrive at a more accurate number.

Premise

The opt builds behave almost like a subset of the shippable/opt builds in that bugs that manifest in shippable/opt should also manifest itself in opt, with the exception of compiler bugs.

In the past, developer turnaround when running tests on tryserver or locally have been given as justification for keeping the opt build and tests around. But there is no reason why opt should run on non-try branches if that is the reason.

Looking at the current revision of test-platforms.yml, we see that in all cases of platforms listed above, the shippable/opt acts as a superset of the vanilla opt:

linux1804-64/opt:
    build-platform: linux64/opt
    test-sets:
        - awsy
        - desktop-screenshot-capture
        - linux1804-tests
        - mochitest-headless
        - web-platform-tests
        - web-platform-tests-backlog

linux1804-64-shippable/opt:
    build-platform: linux64-shippable/opt
    test-sets:
        - awsy
        - desktop-screenshot-capture
        - linux1804-tests
        - marionette-headless
        - mochitest-headless
        - mochitest-webgpu
        - web-platform-tests
        - web-platform-tests-backlog
        - web-platform-tests-wdspec-headless

So we likely will not be losing coverage by turning off the opt variant.

Metrics

All values are from the Taskcluster Build & Test Suite Report.

These values are for mozilla-central only.

2020/03/01 - 2020/06/01 Extrapolated 2020
platform opt shippable opt shippable
linux1804 $1619 $414 $6000 $1400
windows7-32 $4146 $2087 $16000 $8300
windows10-64 $4662 $1908 $17500 $7800
android-hw-p2-8-0 $2783 $305* $9300 $6800
  • android-hw data does not exist for 2020/03 and 2020/04, so values were extrapolated from the 2020/05 value of ~$580.

Proposal

Except for try, let's restrict the test suites to run on either shippable/opt or opt, not both. This level of duplication is unnecessary.

The exception would be in try where the faster turnaround time of opt is welcome.

Q&A

Q1. Wouldn't developers encounter regressions when they test against opt on try but the patch is run entirely against shippable/opt?

A1. While I do not have hard data on regression to back this statement up, the impression I get from speaking to a couple of people is that regressions on opt will also manifest itself on shippable/opt if it was not addressed. (I may be paraphrasing incorrectly here.) In other words, if problem is fixed on opt, it is also applicable to shippable/opt.
The only regressions unique to shippable/opt likely come from the compiler.

Q2. How much money would we save?

A2. Using the chart above, if opt tests were discontinued for mozilla-central only, we stand to save roughly ~$50,000. This is calculated by summing the extrapolated 2020 totals for the opt variant.

Q3. What about autoland?

A3. autoland consumes a lot more than mozilla-central since it runs a lot more frequently. Turning off the duplication there would yield potential savings of easily double the extrapolated 2020 values for mozilla-central for each platform. I think a ballpark of $90,000-$100,000 is doable.

Q4. What about builds?

A4. Costs a lot...a lot...of money. Will be in a separate bug.

If I'm not mistaken, we are already not running opt on autoland, just shippable.

correct, it runs on central in parallel with shippable and then also runs as primary on try server- so the real duplication is on mozilla-central.

keys to watch out for here, opt includes:

  • asan
  • tsan
  • ccov
  • shippable
  • opt

so we really want to see usage of opt only, which this raw bigquery gets us closer:
select workerType, CAST(sum(execution) as INT64) from taskclusteretl.derived_task_summary where date>'2020-06-20' and date<'2020-06-26' and project='mozilla-central' and kind='test' and collection='opt' and platform not like '%shippable%' and platform not like '%san%' and platform not like '%ccov%' group by workerType

this returns:

workerType	f0_
t-win7-32	829631
t-win10-64	1683005
t-linux-large	1818292
t-win7-32-gpu	235886
t-linux-xlarge	769449
t-win10-64-gpu-s	341875
gecko-t-win10-64-ref-hw	505711
gecko-t-bitbar-gw-perf-p2	70796
gecko-t-bitbar-gw-unit-p2	19567
gecko-t-win64-aarch64-laptop	99231

I verified the hardware (*ref-hw, bitbar, win64-aarch64) are all running shippable, it is just a platform name syntax issue.

doing some estimates with math, I think we might be able to save $1K/month by removing opt specific jobs.

I can look at this. It's a bit trickier than what we did on autoland due to needing to keep support for opt on try - but I'm sure we can find a way.

Assignee: nobody → bhearsum

Thanks to :jmaher whom used better methodologies and pointed out that my use of Google Data Studio's values for platform/opt lumped in all opt type tests, so there was some maths that needed to be done.

While unfortunately it isn't the ~100k/year that I had for, ~12k/year is helpful at least.

(In reply to Edwin Takahashi (:egao) from comment #6)

Thanks to :jmaher whom used better methodologies and pointed out that my use of Google Data Studio's values for platform/opt lumped in all opt type tests, so there was some maths that needed to be done.

While unfortunately it isn't the ~100k/year that I had for, ~12k/year is helpful at least.

Yeah, you're not the first one to hit this. You can make this a bit easier by adding this join to your queries: JOIN (SELECT metadata.name AS name, taskId FROM taskclusteretl.task_definition) USING (taskId) -- then name can be queried, which matches exactly with what you see on Treeherder.

Can we kill opt on release branches too? ie: everywhere except try

is it on release branches? I thought it was only on m-c+ try

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #9)

is it on release branches? I thought it was only on m-c+ try

Heh, maybe they're not! In that case, it sounds like the more generous phrasing here is "only run regular opt builds on try".

If we disable opt builds on central then IMO there is no point in keeping the configs around. Having them be "try-only" means that it's only a matter of time until they break and become useless. Either we care about them (and they should at least run on central as a sanity check) or we don't (and we should just remove them from the configs entirely).

Discussion on newsgroups has consistently shown that developers still find value in them, so to me it seems like we can't remove them on central (maybe it's fine to remove them on other release branches, but I don't know).

(In reply to Andrew Halberstadt [:ahal] from comment #11)

If we disable opt builds on central then IMO there is no point in keeping the configs around. Having them be "try-only" means that it's only a matter of time until they break and become useless. Either we care about them (and they should at least run on central as a sanity check) or we don't (and we should just remove them from the configs entirely).

Discussion on newsgroups has consistently shown that developers still find value in them, so to me it seems like we can't remove them on central (maybe it's fine to remove them on other release branches, but I don't know).

I created this original bug, after discussing with :ajones about the amount of duplication we have at a variety of levels.

My counterpoint is that developers likely find it useful because opt builds quickly and being able to turn around results quickly is a net positive in try. However, for non-try branches, it does not make sense to continue running vanilla opt because it does not lead to the released code/binary. Neither is it unique in the sense of catching regressions, because supposedly regressions would also be caught by shippable-opt tests.

Isn't there precedents in something like this, where the configuration is not run on mozilla-central but developers are permitted to run them elsewhere? Granted, those platforms are typically quite niche or up-and-coming, so there's that.

I think ultimately it makes no sense to run tests on mozilla-central that doesn't lead to released code; doesn't find unique regressions; and costs us to the tune of ~12k a year for no reason.

yes, developers have a need for opt builds for local building/testing and faster turnaround on try server. A shippable build is an opt build + a few extra stages, there is not much that will not be ignored by disabling opt builds/tests on central (we only run opt builds/tests on central, not anywhere else).

It is my understanding that we get zero value from running these builds/tests on mozilla-central. In the current landscape of trimming costs, this could have a slight risk of breakage on try, but it is a very low risk. We are already making optimizations with higher risk, and I would rather exhaust the list of low risk items.

Are there additional reasons to keep these running on m-c? If not, I would vote for doing this and reverting our decision when there is a real problem.

Callek and I were talking elsewhere about this bug and we came up with a couple of alternative ideas to accomplish this:

  1. Combine the current opt builds and the instrumented builds. The latter are the ones that Joel referred to that are essentially the same as opt. This has the side benefit of letting us run unit tests against a faster build even for pushes that want perf tests. (Right now, if your push asks for a perf test, even unit tests will run against the shippable build - which means you wait longer for those results.)

  2. Always do shippable builds on try, but grab a profile from elsewhere (probably central - or maybe the most recent commit that the push is based on that we can find a profile for). This means that all unit tests and perf tests will always run on try against a pgo'ed build (arguably making the unit test results slightly more valid). We're not sure how valid it is to re-use profiles - so that may be a downside.

My comment was mostly directed at the builds themselves, I agree it makes less sense to run the tests there. But we should at least not break the builds. Combining opt and the instrumented builds seems like a good idea to me (though I have no idea how feasible it is).

As for tests, I think the approach we should take is to not put shippable in the test labels. So instead of both:

test-linux1804-64-shippable/opt-mochitest-plain-e10s-1
test-linux1804-64/opt-mochitest-plain-e10s-1

we'd only have:

test-linux1804-64/opt-mochitest-plain-e10s-1

Then what build platform that depends on (shippable or not) could be decided in the configs/transforms on a per project basis. That way we would inherently only have one set of "opt-like" tests on any given project.

The downside to that approach is that there would be no "easy" way to run shippable tests on try. So to get around that we could craft a --enable-pgo flag to |mach try| that turns that back on.

I'll also note that Tom Prince had the idea of hiding this abstraction away in the build tasks themselves, but that is outside my domain of expertise and seems a bit scarier with broader implications.

This is somewhat intertwined with bug 1650208, but I want to try to summarize our desired end state here. Implementation details aside, I think that we want the following:

project         | build type | unit tests? | perf tests?
--------------------------------------------------------
autoland        | opt        | yes         | no
                | shippable* | no          | yes*
mozilla-central | shippable  | yes         | yes
try             | opt        | yes*        | no
                | shippable  | yes*        | yes*

* Doesn't run by default, but can run

For try specifically, ahal's comment #15 about combining the opt/shippable tests is probably something we should do here, too - it should have a notable impact on try push cost by getting rid of duplicated unit tests tasks when perf tests + unit tests are selected.

I know there's some comments about possibly wanting to continue doing just opt builds on mozilla-central, but I'm having trouble finding justification for that when the instrumented builds are already running per push. My understanding (which could be wrong) is that it's exceedingly unlikely that we'll uncaught bustage for the regular opt builds because of that. Additionally, we've decided in bug 1650208 that we're going to run opt instead of shippable on autoland, so any merges from there have a virtually nil chance of being broken for opt builds.

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(egao)
Flags: needinfo?(ahal)

(In reply to Andrew Halberstadt [:ahal] from comment #16)

I'll also note that Tom Prince had the idea of hiding this abstraction away in the build tasks themselves, but that is outside my domain of expertise and seems a bit scarier with broader implications.

I think this is probably not viable given bug 1650208 where we want to run opt by default as well as shippable for backstop pushes/backfills.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #14)

Callek and I were talking elsewhere about this bug and we came up with a couple of alternative ideas to accomplish this:

  1. Combine the current opt builds and the instrumented builds. The latter are the ones that Joel referred to that are essentially the same as opt. This has the side benefit of letting us run unit tests against a faster build even for pushes that want perf tests. (Right now, if your push asks for a perf test, even unit tests will run against the shippable build - which means you wait longer for those results.)

  2. Always do shippable builds on try, but grab a profile from elsewhere (probably central - or maybe the most recent commit that the push is based on that we can find a profile for). This means that all unit tests and perf tests will always run on try against a pgo'ed build (arguably making the unit test results slightly more valid). We're not sure how valid it is to re-use profiles - so that may be a downside.

I think it's probably best to punt on either of these optimizations for now. Both of them are trying to solve the "unit tests on try pushes take longer when perf tests are selected" problem, which is an optimization for a small subset of try pushes.

Table makes sense to me (though I wouldn't call myself an authoritative source, especially w.r.t perf).

Flags: needinfo?(ahal)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #17)

This is somewhat intertwined with bug 1650208, but I want to try to summarize our desired end state here. Implementation details aside, I think that we want the following:

project         | build type | unit tests? | perf tests?
--------------------------------------------------------
autoland        | opt        | yes         | no
                | shippable* | no          | yes*
mozilla-central | shippable  | yes         | yes
try             | opt        | yes*        | no
                | shippable  | yes*        | yes*

* Doesn't run by default, but can run

For try specifically, ahal's comment #15 about combining the opt/shippable tests is probably something we should do here, too - it should have a notable impact on try push cost by getting rid of duplicated unit tests tasks when perf tests + unit tests are selected.

I know there's some comments about possibly wanting to continue doing just opt builds on mozilla-central, but I'm having trouble finding justification for that when the instrumented builds are already running per push. My understanding (which could be wrong) is that it's exceedingly unlikely that we'll uncaught bustage for the regular opt builds because of that. Additionally, we've decided in bug 1650208 that we're going to run opt instead of shippable on autoland, so any merges from there have a virtually nil chance of being broken for opt builds.

This makes sense to me and it achieves the goal that outlined in the initial description. I am in favor.

Flags: needinfo?(egao)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #17)

This is somewhat intertwined with bug 1650208, but I want to try to summarize our desired end state here. Implementation details aside, I think that we want the following:

project         | build type | unit tests? | perf tests?
--------------------------------------------------------
autoland        | opt        | yes         | no
                | shippable* | no          | yes*
mozilla-central | shippable  | yes         | yes
try             | opt        | yes*        | no
                | shippable  | yes*        | yes*

* Doesn't run by default, but can run

For try specifically, ahal's comment #15 about combining the opt/shippable tests is probably something we should do here, too - it should have a notable impact on try push cost by getting rid of duplicated unit tests tasks when perf tests + unit tests are selected.

I know there's some comments about possibly wanting to continue doing just opt builds on mozilla-central, but I'm having trouble finding justification for that when the instrumented builds are already running per push. My understanding (which could be wrong) is that it's exceedingly unlikely that we'll uncaught bustage for the regular opt builds because of that. Additionally, we've decided in bug 1650208 that we're going to run opt instead of shippable on autoland, so any merges from there have a virtually nil chance of being broken for opt builds.

This works for me.
Bug 1636271 would make "opt" on mozilla-central effectively free (as it wouldn't really run on most m-c pushes), but not running "opt" at all on m-c is very likely easier to implement.

Flags: needinfo?(mcastelluccio)

This got fixed as part of bug 1650208.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.