Reduce noise by increasing the browser settle time
Categories
(Testing :: Raptor, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: davehunt, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(1 file)
In bug 1525017 :acreskey found that increasing the time for the browser to settle from 30 to 90 made a significant improvment the the noise in the results. This bug is for identifying a suitable settle time for Raptor tests that balances noise improvements with test run times. Let's experiment with various settle times to see how the noise in impacted for each platform. If we can reduce noise significantly then we may also be able to reduce the number of page cycles, which could help to keep the job run times down.
Comment 1•6 years ago
|
||
A related benefit: developers often have to trigger repeat jobs in order to raise the confidence level on performance differences.
With reduced noise fewer repeat jobs would be required.
I'm capturing samples in order to determine what components are active after the current 30second delay in raptor. (So far I see telemetry being submitted, BHMgr Processor, a large GC major, and a few smaller tasks). Ideally we could prevent these as well.
Updated•6 years ago
|
Comment 2•6 years ago
•
|
||
I did a round of push to try for different times on browser settle time.
When these will be finished i will be able to compare in perfherder the results.
In the meantime i started to measure the metrics on my local setup and i will
put the data in this document:
https://docs.google.com/spreadsheets/d/1BKmUphvrCWoDuz0Tih5b83znOpLkqPS0vXeAVLq3_RE/edit?usp=sharing
Treeherder URLs:
Browser settle time = 35000
./mach try fuzzy -q tp6m -e
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fcf5cca33e6e5e131162ea66338534c2fa1d390
=====================================================================
Browser settle time = 35000
./mach try fuzzy -q tp6 -e
https://treeherder.mozilla.org/#/jobs?repo=try&revision=282d41a29c45148974225e7a2318bee248e87e62
=====================================================================
Browser settle time = 40000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a4f22bb13b99708296c7c35d450f88a3606d8e4
=====================================================================
Browser settle time = 50000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=344c322c312b42f27aace9ba799e49ca416940fe
=====================================================================
Browser settle time = 60000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bd7167ecc9747f74c25444ce175aa1a96562a79b
=====================================================================
Browser settle time = 70000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6577fc994924dab6e17559c64a90c819b4d99523
=====================================================================
Browser settle time = 80000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dfd534ed6a9af11b7a15879c0dffee1c26d9544
=====================================================================
Browser settle time = 90000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b732e7b74b8feaff15a76533451a657bda3d515
=====================================================================
Browser settle time = 100000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6eb59375c524bd46a946ca5838656c9123a9dae
PLATFORMS
=====================================================================
Browser settle time = 100000
./mach try chooser --full
- select all tests for tp6 and tp6m
android, android-aarch, win 32, win 64, linux, linux 64, mac 64 : normal and nightly
https://treeherder.mozilla.org/#/jobs?repo=try&revision=16a46e31cbff26bda9f42d62a272c1c706b437bf
=====================================================================
Browser settle time = 90000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0196b4cb97c5fb4df1e84311471a7859baca8342
=====================================================================
Browser settle time = 80000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c86e3c9f2bc143f53e3735cfc6d22a1177d90cfd
=====================================================================
Browser settle time = 70000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a87db66bea0fd1f541951a7742ae14d0a3ff8c0f
=====================================================================
Browser settle time = 60000
./mach try chooser --full
- select all tests for tp6 and tp6m
https://treeherder.mozilla.org/#/jobs?repo=try&revision=91c77a44121dc521449a9f002832594d0d269f90
Comment 3•6 years ago
|
||
Here are a set of measurements for a few tests running on local raptor:
https://docs.google.com/spreadsheets/d/1BKmUphvrCWoDuz0Tih5b83znOpLkqPS0vXeAVLq3_RE/edit#gid=0
Comment 4•6 years ago
|
||
Marian, can you tell me:
for each of the experiments in Comment 3 (e.g. Browser Settle time = 30000), how many times was the job run?
Also, I'm curious, is there a reason why only the mobile tp6m tests were run?
Comment 5•6 years ago
•
|
||
Hi Andrew,
For each test i run the job once (to be more specific 15 pagecycles per job).
For the second question - no there is no reason. Let me know if i should start measuring the desktop tests.
Thanks!
Comment 6•6 years ago
|
||
Marian, thanks, that's interesting - I didn't realize that the tp6m tests were 15 pagecycles and not 25.
Because we're working with results that are very noisy I highly recommend running the jobs numerous times to increase n
I would also suggest testing on desktop as well because there are different background tasks running when the desktop browser starts up vs on android (kinto download of intermediate certs is one that's fresh in my mind).
I personally find it's also much faster to get results using desktop on the try server.
Reporter | ||
Comment 7•6 years ago
|
||
acreskey: settle times for 25 page-cycles are now available on the spreadsheet, could you take a look?
Reporter | ||
Comment 8•6 years ago
|
||
marian: could you perform the same experiment on desktop?
Comment 9•6 years ago
|
||
Hi Dave,
Here are the push to try for desktop websites
160 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aed1e9993ae6383460c58061bef6b91e3e6a63e2
140 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc4088c1bd6f2b926e5003e6c25562817fc0b440
120 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b94dd25bf4206a11bb524386039b53efe747a64
100 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ef51c3d0f987587ca0ee9b44207dba775d7339f
90 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=05de7e37202f39d4ad150c1efb1b88d785052dcd
80 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=83da21b9a0eb9881e6bed00be17fa7d69f8aadaa
70 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b112e20b46ba8b2b250a9e105128e43fd7a9dfdb
60 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=570d2d2402ccfc8e61b689002b14e670ba8ed236
50 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d4d5757a917930402eaef67e6930b6038d1b712
40 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9edc0c659faef8c38346db9b5e5101dc964d3f34
30 s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=83981a0af0b33970ff26943d92830fa65e750d76
Comment 10•6 years ago
|
||
Those desktop experiments should be interesting, we can compare them with the Perfherder Compare feature.
e.g.
160s settle vs 30s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=aed1e9993ae6383460c58061bef6b91e3e6a63e2&newProject=try&newRevision=83981a0af0b33970ff26943d92830fa65e750d76&framework=10
Marian, thank you for increasing the pagecycle count.
I'm not sure I explained myself well but I think it's also very important that each test be run multiple times. And when I say multiple I mean a lot, e.g. 20+, until the values start to converge.
Maybe this was done in the spreadsheet? Not clear to me.
If you run one of those jobs again I think you will see that the results will be wildly different from the previous iteration.
The settle time is not the only variable affecting the noise (unfortunately), so we need to collect a large number of results to effectively reduce the standard error.
Reporter | ||
Comment 11•6 years ago
|
||
Marian: I believe you should be able to retrigger the jobs in Treeherder to build up results and show the relative noise. I think there's also a way to trigger rebuids via the command line when pushing to try. jmaher: could you confirm or direct marian to the relevant docs?
Comment 12•6 years ago
|
||
./mach try fuzzy -q '...' --rebuild X
so X is the rebuild you need.
you can also retrigger the jobs in the treeherder UI.
Comment 13•6 years ago
|
||
I have retriggered all the jobs for this platform : "Windows 10 x64 Shippable opt"
on the URLs from comment 9:
https://bugzilla.mozilla.org/show_bug.cgi?id=1536090#c9
Comment 14•6 years ago
|
||
I finished the cold-measurements.
https://docs.google.com/spreadsheets/d/1BKmUphvrCWoDuz0Tih5b83znOpLkqPS0vXeAVLq3_RE/edit#gid=0
On the “loadtime“ column i have selected the smallest two values for each test.
Darker green is for the smallest value, lighter green is for the next smallest value - but bigger than the first.
I cannot pick one browser settle time that will fit all the tests, because the results are not conclusive.
acreskey: What's your opinion on the results ?
Reporter | ||
Updated•6 years ago
|
Comment 15•6 years ago
|
||
I have updated the excel document with colors for each standard deviation column and loadtime field.
Green - smallest value
Pink / Magenta: next smallest value (but bigger than the green one)
https://docs.google.com/spreadsheets/d/1BKmUphvrCWoDuz0Tih5b83znOpLkqPS0vXeAVLq3_RE/edit#gid=0
Comment 16•6 years ago
|
||
Hi, Marian. Sorry for the delay.
To be clear, when looking at the results in the spreadsheet from Comment 15,
for instance
Browser Settle Time:
30000
Test: raptor-tp6m-5
raptor-tp6m-amazon-search-geckoview 2344.5 1088 227.2476985 1298.5 239.066603 1198 235.7633848 4448.5 1677.659432 9401.5 2479.269198
Are those the results from a single run of the given test?
Comment 17•6 years ago
|
||
To get more datapoints for this I kicked off two tests, one where the raptor post_startup_delay
was increased from 30s to 90s and another where it was reduced to 15s.
I'm comparing against an android baseline revision that I made last week:
Increased raptor settle time from 30seconds to 90seconds
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c059749d487e8668fb006b1f247de5f34edc5897&newProject=try&newRevision=7d4ad5440b37b53f8901ba65eae638649657542d&framework=10
Reduced raptor settle time from 30seconds to 15seconds
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c059749d487e8668fb006b1f247de5f34edc5897&newProject=try&newRevision=c68660a907b65fa44aecef508c860fea3271e9d3&framework=10
So there are still jobs to complete.
I'm only looking at results where there are 20 retries, because otherwise I feel the single job results are too noisy.
So far I'm not seeing anything conclusive here. Strangely the performance of amazon.com increases by 10% and 8% on each test (over 20 retries...)
Comment 18•6 years ago
|
||
Hi Andrew,
Yes, a single run of the main test with post_startup_delay set to 30000.
There were 15 page cycles for each subtest.
On the following tabs from the spreadsheet document i measured the same tests but with 25 pagecycles.
Comment 19•6 years ago
|
||
Marian,
As a quick test, can you redo one of the tests and share the results?
e.g.
30000
Test: raptor-tp6m-1
For example, the one with the 25 pagecycles.
My concern is that the results will be very different from what was recorded on the spreadsheet.
It's not uncommon to see the raptor medians vary by 20% one run to another.
You can see a good example of how the std dev for these metrics will vary run to run from Rob's results here.
I think this would be valuable to see here, otherwise it's hard to draw any conclusions.
Comment 20•6 years ago
|
||
Hi Andrew,
Here are the results:
Test:
raptor-tp6m-1 : 25 pagecycles, browser settle time : 30000
raptor-tp6m-amazon-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
792.22, 716.5, 170.9966469287324, 825.5, 173.31381753289634, 747.0, 173.81497325471196, 891.5, 356.40194445947225
raptor-tp6m-facebook-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
996.5, 1051.5, 97.66148833776714, 681.5, 72.88793015526535, 657.5, 69.01605305803966, 2092.0, 943.2073365438508
raptor-tp6m-google-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
165.89, 166.0, 29.481724872355677, 190.5, 33.25690513457972, 153.5, 30.64133326648542, 156.0, 27.521796368738702
raptor-tp6m-youtube-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
427.96, 606.5, 95.78076377929405, 485.0, 61.79067046506081, 159.5, 30.540748498368202, 713.5, 287.6285345946112
raptor-tp6m-amazon-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
806.6, 730.0, 259.2328111222012, 803.0, 273.1071583234981, 756.5, 263.7955263270113, 954.5, 1741.0053312399125
raptor-tp6m-facebook-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
1004.67, 1074.0, 651.7242014706901, 681.5, 656.4975895956721, 656.0, 650.9279607951463, 2121.0, 1219.565698416294
raptor-tp6m-google-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
157.02, 160.5, 15.763652927643022, 182.0, 21.756891295416718, 144.0, 19.27710777212108, 144.5, 18.20609668549904
raptor-tp6m-youtube-geckoview
geomean, dcf, dcf-stdev, fcp, fcp-stdev, fnbpaint, fnbpaint-stdev, loadtime, loadtime-stdev
417.4, 565.5, 337.33996742631666, 495.0, 313.3756832311335, 163.0, 17.665521565295293, 664.0, 377.3860803251604
Comment 21•6 years ago
|
||
Thanks Marian.
I think that with the large variations from those two runs you would agree that we will need to collect data from many runs into order to have confidence in our conclusions.
I'm comparing your initial patches against each other:
30s settle (left) to 40s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=9edc0c659faef8c38346db9b5e5101dc964d3f34&framework=10
Looking at the standard deviation for windows10-64-shippable metrics (the only one with more than 1 retry), I don't see any consistent improvements.
30s settle (left) to 50s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=5d4d5757a917930402eaef67e6930b6038d1b712&framework=10
In this case the std dev looks to be worse in general...
30s settle (left) to 60s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=570d2d2402ccfc8e61b689002b14e670ba8ed236&framework=10
This one is interesting because it's showing a high-confidence improvement of 18.84% on raptor-tp6-reddit-firefox opt
Otherwise not particularly helpful for noise.
30s settle (left) to 70s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=b112e20b46ba8b2b250a9e105128e43fd7a9dfdb&framework=10
Still seeing the improvement on raptor-tp6-reddit-firefox opt
(it's actually a ~42% improvement of loadtime)
And in general I would say there's a small improvement in std dev.
30s settle (left) to 80s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=83da21b9a0eb9881e6bed00be17fa7d69f8aadaa&framework=10
Similar to 60s, 70s
30s settle (left) to 90s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=05de7e37202f39d4ad150c1efb1b88d785052dcd&framework=10
This one adds a high-confidence ~5% performance improvement on raptor-tp6-instagram-firefox opt
30s settle (left) to 100s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=9ef51c3d0f987587ca0ee9b44207dba775d7339f&framework=10
Improvement on raptor-tp6-instagram-firefox opt
still present.
30s settle (left) to 120s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=0b94dd25bf4206a11bb524386039b53efe747a64&framework=10
Similar
30s settle (left) to 140s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=cc4088c1bd6f2b926e5003e6c25562817fc0b440&framework=10
Similar
30s settle (left) to 160s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=83981a0af0b33970ff26943d92830fa65e750d76&newProject=try&newRevision=aed1e9993ae6383460c58061bef6b91e3e6a63e2&framework=10
Similar
Comment 22•6 years ago
|
||
So overall, at least on windows10-64-shippable
, I see almost no evidence that increasing the browser settle time reduces noise in any significant way.
It might be useful to add jobs to the reference laptop tests (windows10-64-ux
) as we know that the hardware is strained and the results could be different there.
What I find most interesting is that there is a consistent loadtime improvement on raptor-tp6-reddit-firefox opt
by 40-45% once the settle time has increased. And a less improvement of perhaps 3-5% on instagram.
This I will log and profile to find out what's happening.
Comment 23•6 years ago
|
||
Reddit loadtime significantly improved by browser settle time: Bug 1549594
Reporter | ||
Comment 24•6 years ago
|
||
:marauder could you schedule jobs testing 30s, 60s and 90s settle times on windows10-64-ux?
Comment 25•6 years ago
|
||
Hi Dave, Andrew,
Push to try for windows10-64-ux :
Desktop websites running with post_startup_delay set to 90s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd1b2b027592f817d4f07f0ddff00f5fcd776a03
Desktop websites running with post_startup_delay set to 60s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cdd475d25db60ff05925c13d5cee7c2b04916355
Desktop websites running with post_startup_delay set to 30s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ed5f45529e9619cc4fc9a3b74f22b5bd3b97184
Comment 26•6 years ago
|
||
Hi Marian, for reasons unknown to me the tp6 jobs on those pushes all failed with exception
?
Comment 27•6 years ago
|
||
Hi Andrew,
I pushed to try again. Let's see how these goes:
Platform: windows10-64-ux
Desktop websites running with post_startup_delay set to 30s (default value)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6fb140a572c11c7261d9732345c8096fb192ba15
Desktop websites running with post_startup_delay set to 60s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=55b37916ae7439380e2681b5e45b06060d171501
(2nd push to try for 60s but with the post_startup_delay modified from cmdline.py)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb4e5d4d2da1ba48993ff699648db62d453efb6c
Desktop websites running with post_startup_delay set to 90s (from cmdline.py)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62e2e2a13aacd49c2ae2c50e1ad754cded21fd1c
Updated•6 years ago
|
Comment 28•6 years ago
|
||
I added retriggers and setup the comparison views:
30s settle (left) to 60s settle
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6fb140a572c11c7261d9732345c8096fb192ba15&newProject=try&newRevision=55b37916ae7439380e2681b5e45b06060d171501&framework=10
30s settle (left) to 60s settle via cmdline.py
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6fb140a572c11c7261d9732345c8096fb192ba15&newProject=try&newRevision=cb4e5d4d2da1ba48993ff699648db62d453efb6c&framework=10
30s settle (left) to 90s settle via cmdline.py
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6fb140a572c11c7261d9732345c8096fb192ba15&newProject=try&newRevision=62e2e2a13aacd49c2ae2c50e1ad754cded21fd1c&framework=10
60s settle (left) to 60s settle via cmdline.py
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=55b37916ae7439380e2681b5e45b06060d171501&newProject=try&newRevision=cb4e5d4d2da1ba48993ff699648db62d453efb6c&framework=10
Comment 29•6 years ago
|
||
Hmm... those retry jobs completed as "exception" ... soft freeze?
Comment 30•6 years ago
|
||
Marian, I don't know why, but your try jobs from Comment 27 have all failed. Can you trigger repeat runs on them?
Comment 31•6 years ago
|
||
Hi Andrew,
I have retrigger a few jobs -
- 10 retriggers for tp6-5 on first URL,
- 5 for each test on the 2nd url
- 3 for each test on the 3rd url
- 2 for each test on the 4th url.
I tried small numbers because these tests fail very often.
Maybe the machines that are running windows10-64-ux are just a few and get blocked quickly.
Updated•6 years ago
|
Comment 32•6 years ago
|
||
I added 5 to each set so we can get a better view.
Comment 33•5 years ago
|
||
I'm looking at the comparisons from comment 28 .
Is the Noise Metric valid when the number of jobs for each revision is different? I'm not sure about that.
But, test for test, I see a general drop in noise as the settle time is increased to 60 seconds.
Not clear that it's improved at all going from 60 to 90 seconds.
I created this bug a while back. Since then there are other possible solutions that I've learnt about:
• Increase the browser settle time, as in the bug title
• Use a conditioned profile: A new profile is made, waits for a long settle time, e.g. 2 minutes, and then is copied so that it can be used as the basis for each test. We've had a lot of success with this route in local browsertime testing. It also reduces the amount of time it takes to run tests since the browser only has to "settle" once.
• Profile the early tp6 pageloads on the reference laptop and find more root causes of early noise. One problem is that these cannot always be solved -- e.g. the GC that tends to run during the first loads.
From these try experiments I would say that increasing the browser settle time would help noise on the UX hardware, but I think it would have to be part of an overall strategy to improve the runtime/test configuration for these devices.
From looking at the two 60-second settle comparisons (which should be the same effective code), we see that many tests differ in reported geomeans by 6+%, even over 7+ runs.
60s settle (left) to 60s settle via cmdline.py
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=55b37916ae7439380e2681b5e45b06060d171501&newProject=try&newRevision=cb4e5d4d2da1ba48993ff699648db62d453efb6c&framework=10
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Comment 34•5 years ago
|
||
Let's implement the conditioned profiles (bug 1537944) first, and then revisit the settle time so increasing it doesn't exponentially increase the run time of these tests.
Marian, are you still working on this bug? If not please unassign yourself and reset the priority.
Comment 37•5 years ago
|
||
I finished the investigation on this a while ago.
As i remember it was a plan to retest these measurements when conditioned profile is landed.
- details here : https://bugzilla.mozilla.org/show_bug.cgi?id=1536090#c34
I will perform those push to try and generate the compare links to see how it looks.
Thanks!
Reporter | ||
Comment 38•5 years ago
|
||
(In reply to Marian Raiciof [:marauder] from comment #37)
I finished the investigation on this a while ago.
As i remember it was a plan to retest these measurements when conditioned profile is landed.
- details here : https://bugzilla.mozilla.org/show_bug.cgi?id=1536090#c34
I will perform those push to try and generate the compare links to see how it looks.
The change here would now be to the conditioned profiles themselves. Currently the "settled" profile waits 30 seconds. We would want to experiment with variations on this, but we should wait for bug 1602657 to land.
Updated•5 years ago
|
Comment 39•5 years ago
•
|
||
The base is a push to try for the default value : post_startup_delay = 1000ms
The latest results:
Comment 40•5 years ago
|
||
Comment 41•5 years ago
|
||
These are the try runs using specific scenarios for conditioned profile:
BASE (30s):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8700ae99043713883eaa2ae00bcc08676af38131
60s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=05c85c4249df92408a01a73cdc2777c955717e75
90s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f337ca8da43606ae2a93359c8be62cd24a3c2680
120s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6374c073ae4d447ac663691dd4405f107e881986
300s
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d3b3f170c1e04d1487bbbcb70872815c5118afb
Comment 42•5 years ago
|
||
Reporter | ||
Comment 43•5 years ago
|
||
Marian: Could you retrigger base and new so we have at least 5 runs to improve the confidence of the comparison? At the moment there doesn't appear to be any clear signal.
Comment 44•5 years ago
|
||
A few updates here:
- there were some issues with the conditioned profiles and the try jobs didn't find the coresponding tar files.
- it happened that Treeherder ran the tp6, tp6m tests before the condprof jobs were triggered, causing failures
Tarek fixed those issues and now we're using hardcoded task ids list to get the profiles.
Updated•5 years ago
|
For any work here we might want to wait until bug 1626604 has been fixed. As we know conditioned profile usage is broken right now.
Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)
Updated•5 years ago
|
Looks like we should wait for the stabilization of conditioned profiles.
Comment 48•5 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:marauder, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 49•5 years ago
|
||
Tarek, what do you think about Henrik's comment https://bugzilla.mozilla.org/show_bug.cgi?id=1536090#c47
is the patch good for landing or should we wait ?
Thanks!
Reporter | ||
Updated•4 years ago
|
Comment 51•4 years ago
|
||
@tarek is this patch still valid? If not maybe we can remove it from the review qeue
Updated•4 years ago
|
Comment 52•4 years ago
|
||
Blocking on getting more reproducibility for conditioned profiles (or hardening them): https://bugzilla.mozilla.org/show_bug.cgi?id=1626604
Updated•4 years ago
|
Comment 53•2 years ago
|
||
Clear a needinfo that is pending on an inactive user.
Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE
.
For more information, please visit auto_nag documentation.
Comment 54•2 years ago
|
||
Adding the triage flag to discuss if we can investigate this some more now given that we create conditioned profiles on the fly during the test.
Updated•2 years ago
|
Comment 55•2 years ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Updated•2 years ago
|
Description
•