[LINUX] linux1804-64-shippable-qr on try runs unexpectedly fast every now and then
Categories
(Core :: Performance, defect)
Tracking
()
People
(Reporter: jstutte, Unassigned)
Details
Attachments
(1 file)
21.73 KB,
image/png
|
Details |
### Basic information
Steps to Reproduce:
Run many times speedometer 3 tests on Linux shippable with the same revision.
Expected Results:
Deviations in results should be reasonably small.
Actual Results:
Every 15-30 runs we see a (positive) spike in execution performance, like 20-25% better than the rest.
Performance recording (profile)
Profile URL:
(If this report is about slow performance or high CPU usage, please capture a performance profile by following the instructions at https://profiler.firefox.com/. Then upload the profile and insert the link here.)
System configuration:
OS version:
GPU model:
Number of cores:
Amount of memory (RAM):
More information
Please consider attaching the following information after filing this bug, if relevant:
- Screenshot / screen recording
- Anonymized about:memory dump, for issues with memory usage
- Troubleshooting information: Go to about:support, click "Copy text to clipboard", paste it to a file, save it, and attach the file here.
Thanks so much for your help.
Reporter | ||
Comment 1•2 months ago
|
||
An example perf run where this happens.
My best guess is that those runs hit a cold physical CPU that is otherwise idle and has more room in its thermal budget for overclocking.
If it is inevitable (which might be the case) we can probably exclude extreme outliers from our calculations (both if faster or slower) if we have enough runs ?
Comment 2•2 months ago
|
||
It would be nice to investigate that improvement outlier, but I'm not sure we have the infrastructure setup to properly profile it yet. I plan on continuing the work in bug 1893493 which should help collect this sort of data in the future.
Sparky, I recall some discussions before about trying to filter out these outliers. Do you remember where we landed on that?
Comment 3•2 months ago
|
||
Also, are we clamping the CPU frequency on the moonshot devices so every test uses the same frequency?
Comment 4•2 months ago
|
||
There are no CPU optimizations being done on the CI machines. Regarding outliers, there isn't much that we can do on the test side, and we'd like to implement some things on the analysis side of things to handle the outliers better there. This might come from better detection techniques or something else. We're currently looking into alternate detection techniques.
This seems to be a machine-specific issue though. Here's a graph showing how the noise in sp3 data is above 11+: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&replicates=0&series=try,4569401,1,13&timerange=1209600&zoom=1733205310825,1733301950374,8.438664737088072,12.704742188068465
I went to redash and queried to find which machine is causing them: https://sql.telemetry.mozilla.org/queries/104251/source
All of those noisy score values are coming from a single machine: t-linux64-ms-055
:aerickson, could we remove the linux t-linux64-ms-055
machine and replace it with a new one?
Comment 5•2 months ago
|
||
Not sure why this one blade would be so much faster. I've quarantined the host.
We haven't done blade swapping on the Moonshots yet, so we'll need to develop a plan and procedure. We don't have any spares online.
Comment 6•2 months ago
|
||
Thanks :aerickson! That sounds good to me.
:denispal/:jstutte, could either of you do some try runs for sp3 to see if that outlier is still there?
Reporter | ||
Comment 7•2 months ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #6)
:denispal/:jstutte, could either of you do some try runs for sp3 to see if that outlier is still there?
All of those noisy score values are coming from a single machine
Surprising, but obviously a much better explanation for such a consistent difference. Thanks for finding!
Reporter | ||
Comment 8•2 months ago
|
||
After ~80 runs I still see no more extreme outliers. Thanks!
Description
•