Closed Bug 1591466 Opened 5 years ago Closed 4 years ago

Use a reduced optimal TP6 pageset to reduce testing load

Categories

(Testing :: Raptor, task, P1)

Product:

Component:

Version:

Version 3

Type:

task

Priority:

P1

Severity:

S3

Tracking

(firefox79 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla79

Tracking Flags:

Tracking

Status

firefox79

---

fixed

People

(Reporter: sparky, Assigned: Bebe)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ci-costs-2020:done])

Attachments

(2 files)

Bug 1591466 - split raptor tests into tier-1 (high value) and tier-2 (lower value). r=sparky 4 years ago Joel Maher ( :jmaher ) (UTC -8) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1591466 - do not adjust tier and optimization for mobile. r=sparky 4 years ago Joel Maher ( :jmaher ) (UTC -8) 47 bytes, text/x-phabricator-request		Details \| Review

Greg Mierzwinski [:sparky]

Reporter

Description

•

5 years ago

This bug is for adding an optimal, reduced TP6 pageset for desktop and mobile testing in an effort to reduce the test load.

See this document for background information: https://docs.google.com/document/d/1pMn77DzYIRQ8dB1hOjp0YDyNFkZtD51gM81uD9S8DF0/edit

The following graphs show the results of an analysis to find a reduced subset for desktop and mobile (tp6, and tp6m):

TP6: https://mozilla.slack.com/files/U9KF08E14/FPSUG1SH2/tp6_hist_with_uniques_dupes_removed.png

TP6M: https://mozilla.slack.com/files/U9KF08E14/FPTCZVCCV/tp6m_hist_with_uniques_dupes_removed.png

Using only the tests which uniquely caught a regression (in red), for tp6m, we find that we can catch 13/16 regressions/improvements ~= 81%. If we include those which caught improvements, then we could catch 15/16 regressions ~= 94%. Using warm and cold variations of those tests would allow us to catch 16/16 regressions = 100%.

For desktop, using the same method (only picking tests with red bars), we can catch ~85% of regressions. Including the ones which caught improvements, and using both warm and cold varieties of all of these tests, we can catch 100% of regressions.

Robert Wood [:rwood]

Updated

•

5 years ago

Priority: -- → P2

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

4 years ago

using :sparky's tool chain:
https://github.com/gmierz/moz-current-tests/tree/master/high-value-tests

I found that looking at specific bugs (53 out of 61) that are determine to not be test only fixes or infra fixes we have 19 tests that we find as high value:
['raptor-tp6m-espn-geckoview', 'raptor-motionmark-htmlsuite-firefox', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-stylebench-firefox', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-slides-firefox-cold', 'raptor-speedometer-firefox', 'raptor-tp6-twitch-firefox-cold', 'raptor-tp6-fandom-firefox', 'raptor-tp6-twitter-firefox', 'raptor-tp6-facebook-firefox-cold', 'raptor-wasm-misc-baseline-firefox', 'raptor-tp6-tumblr-firefox', 'raptor-tp6-yandex-firefox-cold', 'raptor-tp6-wikia-firefox', 'raptor-assorted-dom-firefox', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-tp6-twitch-firefox', 'raptor-tp6-bing-firefox']

After we validate this and ensure we can update this data easier, it will be realistic to adjust tier status in the taskcluster .yml files. We can also apply this to talos tests.

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

4 years ago

doing the same analysis on talos, here are the tests that are high value for Talos:
['tabswitch', 'tsvgx', 'displaylist_mutate', 'tscrollx', 'sessionrestore', 'tp5n', 'tart', 'perf_reftest_singletons', 'startup_about_home_paint_realworld_webextensions', 'tp5o', 'kraken', 'ts_paint_webext', 'tsvgr_opacity', 'startup_about_home_paint', 'tp5o_scroll']

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

4 years ago

:davehunt, is it ok to move forward with marking tests as tier1/2 as outlined above?

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 4

•

4 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

:davehunt, is it ok to move forward with marking tests as tier1/2 as outlined above?

Yes, but can we limit this to Raptor for now? I would suggest filing a separate bug for Talos.

Flags: needinfo?(dave.hunt)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

4 years ago

yeah, talos should be considered separate (I filed bug 1626045)

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

4 years ago

:bc, can you pick this up in the next week or two?

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Comment 7

•

4 years ago

sure

Assignee: nobody → bob

Status: NEW → ASSIGNED

Flags: needinfo?(bob)

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

4 years ago

I spent some time sanitizing data and cross referencing it in detail. There are 24 tests (5 on android) to consider:
['raptor-tp6m-espn-geckoview-cold', 'raptor-speedometer-firefox', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-slides-firefox-cold', 'raptor-tp6-slides-firefox', 'raptor-tp6-google-mail-firefox-cold', 'raptor-stylebench-firefox', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-tp6-twitch-firefox-cold', 'raptor-tp6-google-firefox-cold', 'raptor-tp6-twitch-firefox', 'raptor-tp6-tumblr-firefox', 'raptor-tp6m-espn-geckoview', 'raptor-tp6-fandom-firefox', 'raptor-tp6-bing-firefox', 'raptor-tp6-tumblr-firefox-cold', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-wasm-misc-firefox', 'raptor-tp6m-google-maps-geckoview-cold', 'raptor-tp6m-bing-geckoview', 'raptor-tp6-reddit-firefox-cold', 'raptor-tp6m-ebay-kleinanzeigen-search-geckoview', 'raptor-tp6-instagram-firefox', 'raptor-assorted-dom-firefox']

This is using a full year of data. Limiting this to 6 months of data, we have 9 tests (1 on android) to consider:
['raptor-tp6m-google-maps-geckoview-cold', 'raptor-tp6-tumblr-firefox-cold', 'raptor-tp6-slides-firefox', 'raptor-tp6-yandex-firefox-cold', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-tp6-twitch-firefox-cold', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-slides-firefox-cold', 'raptor-tp6-google-mail-firefox-cold']

As our goal is to keep these running and sheriffed full time just the tier-2 tests would run less frequently and alerts would show up a day or two later, there is little risk to this.

:esmyth, do you have concerns or other thoughts?

Flags: needinfo?(esmyth)

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

4 years ago

as discussed in an email thread, we feel that 6 months is a more representative sample, which would be the smaller pageset. This would apply across the board.

They key here is monthly we would re-evaluate this work to ensure that we adjust tier-1 tests as needed- since all tests will be sheriffed, just the tier-2 tests will be sheriffed up to a couple days later.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

See Also: → 1626045

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

Whiteboard: [ci-costs-2020:todo]

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

4 years ago

here are the tests we are going to run less frequently:
['raptor-tp6-outlook-firefox-cold', 'raptor-tp6-netflix-firefox-cold', 'raptor-tp6m-google-restaurants-geckoview-cold', 'raptor-tp6m-booking-geckoview-cold', 'raptor-tp6-yahoo-mail-firefox', 'raptor-tp6-microsoft-firefox-cold', 'raptor-tp6m-bing-restaurants-geckoview-cold', 'raptor-tp6m-wikipedia-geckoview-cold', 'raptor-tp6-yahoo-mail-firefox-cold', 'raptor-tp6m-bing-geckoview-cold', 'raptor-motionmark-htmlsuite-firefox', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-tp6-facebook-firefox', 'raptor-tp6-yandex-firefox-cold', 'raptor-tp6m-instagram-geckoview-cold', 'raptor-tp6-pinterest-firefox', 'raptor-tp6-apple-firefox-cold', 'raptor-tp6-instagram-firefox-cold']

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 11

•

4 years ago

I've spoken with Eric and he doesn't have any concerns with the identified tests. We discussed some related issues, which I'll follow up with separately and do not block this effort.

Flags: needinfo?(esmyth) → needinfo?(fstrugariu)

Stephen Donner [:stephend] Not actively reading bugmail

Comment 12

•

4 years ago

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

Assignee: bob → fstrugariu

Greg Mierzwinski [:sparky]

Reporter

Comment 13

•

4 years ago

:bebe, can you provide an update here? I think this is waiting on the taskcluster split changes right?

Florin Strugariu [:Bebe]

Assignee

Comment 14

•

4 years ago

This is waiting for Bug 1633874 - Update taskcluster settings to the new raptor file structure

after that we can generate a list of test to split these in the discussed lists

Flags: needinfo?(fstrugariu)

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

4 years ago

We can do the split. I think we should make a tp6 and tp6-tier2 that is scheduled. The difference is that pages as lower value (tier-2) will be in the test-subtests of the tp6-tier2 job while tp6 regular job will have the higher value test-subtests.

This should allow for easier moving of tests between tiers. as well as creating test-sets.yml where we can schedule things or set a specific raptor-tp6-tier2 as push-interval-25 while rpator-tp6 is push-interval-10

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

4 years ago

:bebe, does this way of thinking make sense to you?

Flags: needinfo?(fstrugariu)

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

4 years ago

:davehunt, can you answer this or help get this moving?

Flags: needinfo?(dave.hunt)

Sylvestre Ledru [:Sylvestre]

Updated

•

4 years ago

Blocks: cost-reduction

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 18

•

4 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)

:davehunt, can you answer this or help get this moving?

Your suggestion in comment 15 sounds good to me. I'm going to leave the needinfo open for Bebe, who is the assignee.

Severity: normal → S3

Depends on: 1633874

Flags: needinfo?(dave.hunt)

Priority: P2 → P1

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

4 years ago

queried all raptor tests since december 1, 2019, here is the latest set

Chosen tests: ['raptor-tp6m-youtube-geckoview-cold', 'raptor-tp6-twitter-firefox-cold', 'raptor-tp6-twitch-firefox-cold', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-google-mail-firefox-cold', 'raptor-tp6-amazon-firefox-cold', 'raptor-tp6-slides-firefox-cold', 'raptor-tp6-tumblr-firefox-cold', 'raptor-tp6m-google-maps-geckoview-cold', 'raptor-tp6-imgur-firefox-cold-mozproxy-replay', 'raptor-webaudio-firefox']

Rejected tests: ['raptor-tp6m-google-restaurants-geckoview-cold', 'raptor-tp6m-amazon-geckoview-cold', 'raptor-tp6m-wikipedia-geckoview-cold', 'raptor-tp6-paypal-firefox-cold', 'raptor-tp6-slides-firefox-cold-mozproxy-replay', 'raptor-tp6-fandom-firefox-cold', 'raptor-tp6-imgur-firefox-cold', 'raptor-tp6-pinterest-firefox-cold-mozproxy-replay', 'raptor-tp6-bing-firefox-cold', 'raptor-tp6m-cnn-ampstories-geckoview-cold', 'raptor-tp6m-cnn-geckoview-cold', 'raptor-tp6m-ebay-kleinanzeigen-search-geckoview-cold', 'raptor-tp6-docs-firefox-cold', 'raptor-tp6-binast-instagram-firefox-mozproxy-replay', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-motionmark-htmlsuite-firefox', 'raptor-tp6m-google-geckoview-cold', 'raptor-tp6-linkedin-firefox-cold', 'raptor-tp6-sheets-firefox-cold', 'raptor-tp6m-aframeio-animation-geckoview-cold', 'raptor-tp6-twitter-firefox-cold-mozproxy-replay', 'raptor-tp6-imdb-firefox-cold', 'raptor-tp6m-facebook-cristiano-geckoview-cold', 'raptor-tp6m-facebook-geckoview-cold', 'raptor-tp6-outlook-firefox-cold', 'raptor-tp6-paypal-firefox-cold-mozproxy-replay', 'raptor-tp6-outlook-firefox-cold-mozproxy-replay', 'raptor-tp6m-youtube-watch-geckoview-cold', 'raptor-tp6-reddit-firefox-cold', 'raptor-tp6-amazon-firefox-cold-mozproxy-replay', 'raptor-tp6-tumblr-firefox-cold-mozproxy-replay', 'raptor-tp6-google-firefox-cold-mozproxy-replay', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-tp6-netflix-firefox-cold', 'raptor-tp6-instagram-firefox-cold-mozproxy-replay', 'raptor-tp6m-ebay-kleinanzeigen-geckoview-cold', 'raptor-tp6-binast-instagram-firefox', 'raptor-tp6m-booking-geckoview-cold', 'raptor-tp6m-jianshu-geckoview-cold-mozproxy-replay', 'raptor-tp6-microsoft-firefox-cold', 'raptor-tp6-yahoo-news-firefox-cold', 'raptor-tp6m-espn-geckoview-cold', 'raptor-tp6-ebay-firefox-cold', 'raptor-tp6m-jianshu-geckoview-cold', 'raptor-tp6-apple-firefox-cold', 'raptor-tp6m-allrecipes-geckoview-cold', 'raptor-tp6-facebook-firefox-cold', 'raptor-tp6m-stackoverflow-geckoview-cold', 'raptor-tp6-google-firefox-cold', 'raptor-tp6m-microsoft-support-geckoview-cold', 'raptor-tp6m-web-de-geckoview-cold', 'raptor-tp6m-bing-restaurants-geckoview-cold', 'raptor-tp6m-bbc-geckoview-cold', 'raptor-tp6-amazon-firefox-mitm5-cold-mozproxy-replay', 'raptor-tp6-office-firefox-cold', 'raptor-tp6-pinterest-firefox-cold', 'raptor-tp6-google-mail-firefox-cold-mozproxy-replay', 'raptor-tp6m-instagram-geckoview-cold', 'raptor-tp6m-reddit-geckoview-cold', 'raptor-tp6-yahoo-news-firefox-cold-mozproxy-replay', 'raptor-tp6-instagram-firefox-cold', 'raptor-tp6-yahoo-mail-firefox-cold', 'raptor-tp6-youtube-firefox-cold', 'raptor-tp6m-imdb-geckoview-cold', 'raptor-tp6m-bing-geckoview-cold', 'raptor-tp6-yandex-firefox-cold']

Joel Maher ( :jmaher ) (UTC -8)

Comment 20

•

4 years ago

Attached file Bug 1591466 - split raptor tests into tier-1 (high value) and tier-2 (lower value). r=sparky — Details

split raptor tests into tier-1 (high value) and tier-2 (lower value)

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

4 years ago

the only thing the above patch doesn't do is run the lower value tests every 25th push.

Comment 22

•

4 years ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2ed99d13012d
split raptor tests into tier-1 (high value) and tier-2 (lower value). r=sparky

Bogdan Tara[:bogdan_tara | bogdant]

Comment 23

•

4 years ago

•

Backed out changeset 2ed99d13012d (bug 1591466) on ahal's request.

Backout link: https://hg.mozilla.org/integration/autoland/rev/0b769610f6c4a18725f8ea758b1515e189947bf0

As ahal noticed, the backed out changes caused 300 Rap-t2 tasks running on every autoland push.

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

4 years ago

a few thoughts:

I find a way to keep these the same name, not tp6-t2. It went this way as it was simple and straightforward, I could be wrong though
I add the push-interval-25 to the tier-2 tests to force it less frequently.

sparky, do you have thoughts on either of these?

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Reporter

Comment 25

•

4 years ago

:jmaher, we could go with option (1) by using the by-raptor-subtest split:

tier:
	by-app:
		firefox:
			by-raptor-subtest:
				amazon: 1
				...
				default: 2
		default: 2

I just noticed that you would have to make a change in the transform to do this - we missed this when we make the name changes. You'd have to split out the shorthand-name of the raptor-subtest entry and make some adjustments down the line: (1) https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/raptor.py#179 (2) https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/raptor.py#246-250

I'm fine with adding the push-interval settings to the new tests as well but option (1) would keep a minimal number of task definitions.

Flags: needinfo?(gmierz2)

Joel Maher ( :jmaher ) (UTC -8)

Comment 26

•

4 years ago

I started down this path and realized by doing that I would have a list of subtests, and repeat it for tiers, and repeat it for push-interval. My latest patch will do push-interval-25 by default for for tier-1 push-interval-10.

Comment 27

•

4 years ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b7889537b4ff
split raptor tests into tier-1 (high value) and tier-2 (lower value). r=sparky

Bogdan Tara[:bogdan_tara | bogdant]

Comment 28

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b7889537b4ff

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox79: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla79

Joel Maher ( :jmaher ) (UTC -8)

Comment 29

•

4 years ago

Attached file Bug 1591466 - do not adjust tier and optimization for mobile. r=sparky — Details

do not adjust tier and optimization for mobile.

Comment 30

•

4 years ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac50886ec03f
do not adjust tier and optimization for mobile. r=sparky

Andreea Pavel [:apavel]

Comment 31

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ac50886ec03f

Florin Strugariu [:Bebe]

Assignee

Comment 32

•

4 years ago

Is there anything else to do here?

Flags: needinfo?(fstrugariu) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 33

•

4 years ago

we are all done.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 34

•

4 years ago

looking into this I think we will save ~4800 hours/week of computation time- this is rough calculations, but probably +-30% of that number.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]

You need to log in before you can comment on or make changes to this bug.