Use a reduced optimal TP6 pageset to reduce testing load
Categories
(Testing :: Raptor, task, P1)
Tracking
(firefox79 fixed)
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: sparky, Assigned: Bebe)
References
(Blocks 1 open bug)
Details
(Whiteboard: [ci-costs-2020:done])
Attachments
(2 files)
This bug is for adding an optimal, reduced TP6 pageset for desktop and mobile testing in an effort to reduce the test load.
See this document for background information: https://docs.google.com/document/d/1pMn77DzYIRQ8dB1hOjp0YDyNFkZtD51gM81uD9S8DF0/edit
The following graphs show the results of an analysis to find a reduced subset for desktop and mobile (tp6, and tp6m):
TP6: https://mozilla.slack.com/files/U9KF08E14/FPSUG1SH2/tp6_hist_with_uniques_dupes_removed.png
TP6M: https://mozilla.slack.com/files/U9KF08E14/FPTCZVCCV/tp6m_hist_with_uniques_dupes_removed.png
Using only the tests which uniquely caught a regression (in red), for tp6m, we find that we can catch 13/16 regressions/improvements ~= 81%. If we include those which caught improvements, then we could catch 15/16 regressions ~= 94%. Using warm and cold variations of those tests would allow us to catch 16/16 regressions = 100%.
For desktop, using the same method (only picking tests with red bars), we can catch ~85% of regressions. Including the ones which caught improvements, and using both warm and cold varieties of all of these tests, we can catch 100% of regressions.
Updated•5 years ago
|
Comment 1•4 years ago
|
||
using :sparky's tool chain:
https://github.com/gmierz/moz-current-tests/tree/master/high-value-tests
I found that looking at specific bugs (53 out of 61) that are determine to not be test only fixes or infra fixes we have 19 tests that we find as high value:
['raptor-tp6m-espn-geckoview', 'raptor-motionmark-htmlsuite-firefox', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-stylebench-firefox', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-slides-firefox-cold', 'raptor-speedometer-firefox', 'raptor-tp6-twitch-firefox-cold', 'raptor-tp6-fandom-firefox', 'raptor-tp6-twitter-firefox', 'raptor-tp6-facebook-firefox-cold', 'raptor-wasm-misc-baseline-firefox', 'raptor-tp6-tumblr-firefox', 'raptor-tp6-yandex-firefox-cold', 'raptor-tp6-wikia-firefox', 'raptor-assorted-dom-firefox', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-tp6-twitch-firefox', 'raptor-tp6-bing-firefox']
After we validate this and ensure we can update this data easier, it will be realistic to adjust tier status in the taskcluster .yml files. We can also apply this to talos tests.
Comment 2•4 years ago
|
||
doing the same analysis on talos, here are the tests that are high value for Talos:
['tabswitch', 'tsvgx', 'displaylist_mutate', 'tscrollx', 'sessionrestore', 'tp5n', 'tart', 'perf_reftest_singletons', 'startup_about_home_paint_realworld_webextensions', 'tp5o', 'kraken', 'ts_paint_webext', 'tsvgr_opacity', 'startup_about_home_paint', 'tp5o_scroll']
Comment 3•4 years ago
|
||
:davehunt, is it ok to move forward with marking tests as tier1/2 as outlined above?
Comment 4•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)
:davehunt, is it ok to move forward with marking tests as tier1/2 as outlined above?
Yes, but can we limit this to Raptor for now? I would suggest filing a separate bug for Talos.
Comment 5•4 years ago
|
||
yeah, talos should be considered separate (I filed bug 1626045)
Comment 8•4 years ago
|
||
I spent some time sanitizing data and cross referencing it in detail. There are 24 tests (5 on android) to consider:
['raptor-tp6m-espn-geckoview-cold', 'raptor-speedometer-firefox', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-slides-firefox-cold', 'raptor-tp6-slides-firefox', 'raptor-tp6-google-mail-firefox-cold', 'raptor-stylebench-firefox', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-tp6-twitch-firefox-cold', 'raptor-tp6-google-firefox-cold', 'raptor-tp6-twitch-firefox', 'raptor-tp6-tumblr-firefox', 'raptor-tp6m-espn-geckoview', 'raptor-tp6-fandom-firefox', 'raptor-tp6-bing-firefox', 'raptor-tp6-tumblr-firefox-cold', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-wasm-misc-firefox', 'raptor-tp6m-google-maps-geckoview-cold', 'raptor-tp6m-bing-geckoview', 'raptor-tp6-reddit-firefox-cold', 'raptor-tp6m-ebay-kleinanzeigen-search-geckoview', 'raptor-tp6-instagram-firefox', 'raptor-assorted-dom-firefox']
This is using a full year of data. Limiting this to 6 months of data, we have 9 tests (1 on android) to consider:
['raptor-tp6m-google-maps-geckoview-cold', 'raptor-tp6-tumblr-firefox-cold', 'raptor-tp6-slides-firefox', 'raptor-tp6-yandex-firefox-cold', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-tp6-twitch-firefox-cold', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-slides-firefox-cold', 'raptor-tp6-google-mail-firefox-cold']
As our goal is to keep these running and sheriffed full time just the tier-2 tests would run less frequently and alerts would show up a day or two later, there is little risk to this.
:esmyth, do you have concerns or other thoughts?
Comment 9•4 years ago
|
||
as discussed in an email thread, we feel that 6 months is a more representative sample, which would be the smaller pageset. This would apply across the board.
They key here is monthly we would re-evaluate this work to ensure that we adjust tier-1 tests as needed- since all tests will be sheriffed, just the tier-2 tests will be sheriffed up to a couple days later.
Updated•4 years ago
|
Comment 10•4 years ago
|
||
here are the tests we are going to run less frequently:
['raptor-tp6-outlook-firefox-cold', 'raptor-tp6-netflix-firefox-cold', 'raptor-tp6m-google-restaurants-geckoview-cold', 'raptor-tp6m-booking-geckoview-cold', 'raptor-tp6-yahoo-mail-firefox', 'raptor-tp6-microsoft-firefox-cold', 'raptor-tp6m-bing-restaurants-geckoview-cold', 'raptor-tp6m-wikipedia-geckoview-cold', 'raptor-tp6-yahoo-mail-firefox-cold', 'raptor-tp6m-bing-geckoview-cold', 'raptor-motionmark-htmlsuite-firefox', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-tp6-facebook-firefox', 'raptor-tp6-yandex-firefox-cold', 'raptor-tp6m-instagram-geckoview-cold', 'raptor-tp6-pinterest-firefox', 'raptor-tp6-apple-firefox-cold', 'raptor-tp6-instagram-firefox-cold']
Comment 11•4 years ago
|
||
I've spoken with Eric and he doesn't have any concerns with the identified tests. We discussed some related issues, which I'll follow up with separately and do not block this effort.
Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)
Updated•4 years ago
|
Reporter | ||
Comment 13•4 years ago
|
||
:bebe, can you provide an update here? I think this is waiting on the taskcluster split changes right?
Assignee | ||
Comment 14•4 years ago
|
||
This is waiting for Bug 1633874 - Update taskcluster settings to the new raptor file structure
after that we can generate a list of test to split these in the discussed lists
Comment 15•4 years ago
|
||
We can do the split. I think we should make a tp6 and tp6-tier2 that is scheduled. The difference is that pages as lower value (tier-2) will be in the test-subtests of the tp6-tier2 job while tp6 regular job will have the higher value test-subtests.
This should allow for easier moving of tests between tiers. as well as creating test-sets.yml where we can schedule things or set a specific raptor-tp6-tier2 as push-interval-25 while rpator-tp6 is push-interval-10
Comment 16•4 years ago
|
||
:bebe, does this way of thinking make sense to you?
Comment 17•4 years ago
|
||
:davehunt, can you answer this or help get this moving?
Updated•4 years ago
|
Comment 18•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)
:davehunt, can you answer this or help get this moving?
Your suggestion in comment 15 sounds good to me. I'm going to leave the needinfo open for Bebe, who is the assignee.
Comment 19•4 years ago
|
||
queried all raptor tests since december 1, 2019, here is the latest set
Chosen tests: ['raptor-tp6m-youtube-geckoview-cold', 'raptor-tp6-twitter-firefox-cold', 'raptor-tp6-twitch-firefox-cold', 'raptor-motionmark-animometer-firefox', 'raptor-tp6-google-mail-firefox-cold', 'raptor-tp6-amazon-firefox-cold', 'raptor-tp6-slides-firefox-cold', 'raptor-tp6-tumblr-firefox-cold', 'raptor-tp6m-google-maps-geckoview-cold', 'raptor-tp6-imgur-firefox-cold-mozproxy-replay', 'raptor-webaudio-firefox']
Rejected tests: ['raptor-tp6m-google-restaurants-geckoview-cold', 'raptor-tp6m-amazon-geckoview-cold', 'raptor-tp6m-wikipedia-geckoview-cold', 'raptor-tp6-paypal-firefox-cold', 'raptor-tp6-slides-firefox-cold-mozproxy-replay', 'raptor-tp6-fandom-firefox-cold', 'raptor-tp6-imgur-firefox-cold', 'raptor-tp6-pinterest-firefox-cold-mozproxy-replay', 'raptor-tp6-bing-firefox-cold', 'raptor-tp6m-cnn-ampstories-geckoview-cold', 'raptor-tp6m-cnn-geckoview-cold', 'raptor-tp6m-ebay-kleinanzeigen-search-geckoview-cold', 'raptor-tp6-docs-firefox-cold', 'raptor-tp6-binast-instagram-firefox-mozproxy-replay', 'raptor-tp6m-amazon-search-geckoview-cold', 'raptor-motionmark-htmlsuite-firefox', 'raptor-tp6m-google-geckoview-cold', 'raptor-tp6-linkedin-firefox-cold', 'raptor-tp6-sheets-firefox-cold', 'raptor-tp6m-aframeio-animation-geckoview-cold', 'raptor-tp6-twitter-firefox-cold-mozproxy-replay', 'raptor-tp6-imdb-firefox-cold', 'raptor-tp6m-facebook-cristiano-geckoview-cold', 'raptor-tp6m-facebook-geckoview-cold', 'raptor-tp6-outlook-firefox-cold', 'raptor-tp6-paypal-firefox-cold-mozproxy-replay', 'raptor-tp6-outlook-firefox-cold-mozproxy-replay', 'raptor-tp6m-youtube-watch-geckoview-cold', 'raptor-tp6-reddit-firefox-cold', 'raptor-tp6-amazon-firefox-cold-mozproxy-replay', 'raptor-tp6-tumblr-firefox-cold-mozproxy-replay', 'raptor-tp6-google-firefox-cold-mozproxy-replay', 'raptor-tp6-wikipedia-firefox-cold', 'raptor-tp6-netflix-firefox-cold', 'raptor-tp6-instagram-firefox-cold-mozproxy-replay', 'raptor-tp6m-ebay-kleinanzeigen-geckoview-cold', 'raptor-tp6-binast-instagram-firefox', 'raptor-tp6m-booking-geckoview-cold', 'raptor-tp6m-jianshu-geckoview-cold-mozproxy-replay', 'raptor-tp6-microsoft-firefox-cold', 'raptor-tp6-yahoo-news-firefox-cold', 'raptor-tp6m-espn-geckoview-cold', 'raptor-tp6-ebay-firefox-cold', 'raptor-tp6m-jianshu-geckoview-cold', 'raptor-tp6-apple-firefox-cold', 'raptor-tp6m-allrecipes-geckoview-cold', 'raptor-tp6-facebook-firefox-cold', 'raptor-tp6m-stackoverflow-geckoview-cold', 'raptor-tp6-google-firefox-cold', 'raptor-tp6m-microsoft-support-geckoview-cold', 'raptor-tp6m-web-de-geckoview-cold', 'raptor-tp6m-bing-restaurants-geckoview-cold', 'raptor-tp6m-bbc-geckoview-cold', 'raptor-tp6-amazon-firefox-mitm5-cold-mozproxy-replay', 'raptor-tp6-office-firefox-cold', 'raptor-tp6-pinterest-firefox-cold', 'raptor-tp6-google-mail-firefox-cold-mozproxy-replay', 'raptor-tp6m-instagram-geckoview-cold', 'raptor-tp6m-reddit-geckoview-cold', 'raptor-tp6-yahoo-news-firefox-cold-mozproxy-replay', 'raptor-tp6-instagram-firefox-cold', 'raptor-tp6-yahoo-mail-firefox-cold', 'raptor-tp6-youtube-firefox-cold', 'raptor-tp6m-imdb-geckoview-cold', 'raptor-tp6m-bing-geckoview-cold', 'raptor-tp6-yandex-firefox-cold']
Comment 20•4 years ago
|
||
split raptor tests into tier-1 (high value) and tier-2 (lower value)
Comment 21•4 years ago
|
||
the only thing the above patch doesn't do is run the lower value tests every 25th push.
Comment 22•4 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2ed99d13012d split raptor tests into tier-1 (high value) and tier-2 (lower value). r=sparky
Comment 23•4 years ago
•
|
||
Backed out changeset 2ed99d13012d (bug 1591466) on ahal's request.
Backout link: https://hg.mozilla.org/integration/autoland/rev/0b769610f6c4a18725f8ea758b1515e189947bf0
As ahal noticed, the backed out changes caused 300 Rap-t2 tasks running on every autoland push.
Comment 24•4 years ago
|
||
a few thoughts:
- I find a way to keep these the same name, not tp6-t2. It went this way as it was simple and straightforward, I could be wrong though
- I add the push-interval-25 to the tier-2 tests to force it less frequently.
sparky, do you have thoughts on either of these?
Reporter | ||
Comment 25•4 years ago
|
||
:jmaher, we could go with option (1) by using the by-raptor-subtest
split:
tier:
by-app:
firefox:
by-raptor-subtest:
amazon: 1
...
default: 2
default: 2
I just noticed that you would have to make a change in the transform to do this - we missed this when we make the name changes. You'd have to split out the shorthand-name of the raptor-subtest entry and make some adjustments down the line: (1) https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/raptor.py#179 (2) https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/raptor.py#246-250
I'm fine with adding the push-interval settings to the new tests as well but option (1) would keep a minimal number of task definitions.
Comment 26•4 years ago
|
||
I started down this path and realized by doing that I would have a list of subtests, and repeat it for tiers, and repeat it for push-interval. My latest patch will do push-interval-25 by default for for tier-1 push-interval-10.
Comment 27•4 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b7889537b4ff split raptor tests into tier-1 (high value) and tier-2 (lower value). r=sparky
Comment 28•4 years ago
|
||
bugherder |
Comment 29•4 years ago
|
||
do not adjust tier and optimization for mobile.
Comment 30•4 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ac50886ec03f do not adjust tier and optimization for mobile. r=sparky
Comment 31•4 years ago
|
||
bugherder |
Assignee | ||
Comment 32•4 years ago
|
||
Is there anything else to do here?
Comment 34•4 years ago
|
||
looking into this I think we will save ~4800 hours/week of computation time- this is rough calculations, but probably +-30% of that number.
Updated•4 years ago
|
Description
•