Closed
Bug 1416501
Opened 7 years ago
Closed 6 years ago
Try to reduce noise on taskcluster linux talos hardware/vms
Categories
(Testing :: Talos, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rwood, Unassigned)
References
Details
(Whiteboard: [PI:January])
Attachments
(9 files)
In Bug 1412031 some new ubuntu machines (dedicated hardware running 16.04 vms, for running talos via taskcluster) were stood up and made available for testing. Initial talos results look quite noisy, making it very difficult (if not impossible) for our talos/perfhereder algorithms to automatically detect regressions. Using a loaner we need to try to figure out if it is possible to reduce the talos testing data noise on these machines/vms.
Reporter | ||
Comment 1•7 years ago
|
||
talos.json output from running perf-reftest-singletons-e10s on the tc linux loaner machine/vm
Reporter | ||
Comment 2•7 years ago
|
||
talos.json from running talos tp6-e10s suite on the loaner tc linux hw/vm
Reporter | ||
Updated•7 years ago
|
Whiteboard: [PI:November]
Reporter | ||
Comment 3•7 years ago
|
||
Screenshot of 'top' utility running on tc linux hw/loaner during the talos perf-reftest-singletons-e10s suite run
Comment 4•7 years ago
|
||
that is odd that Firefox is chewing up so much CPU, although that is probably expected. We might need to look at IO counters and memory- top doesn't look like the memory is exhausted.
Reporter | ||
Comment 5•7 years ago
|
||
Using 'atop' for a little more detail during talos run
Reporter | ||
Comment 6•7 years ago
|
||
A bit of disk i/o info via 'atop' during talos run
Reporter | ||
Comment 7•7 years ago
|
||
(In reply to Robert Wood [:rwood] from comment #6) > Created attachment 8927872 [details] > tc-linux-loaner-during-talos-atop-2.png > > A bit of disk i/o info via 'atop' during talos run Not sure what 121% 'ACPU' for "Web Content" means vs Firefox
Reporter | ||
Comment 8•7 years ago
|
||
Demonstration of 'noisy' data. Recent run from existing linux x64 bb hardware: name "bloom-basic.html" replicates 0 89.84 1 90.63499999999999 2 88.42999999999999 3 92.66499999999999 4 91.58000000000001 5 92.14 6 88.355 7 96.63000000000001 8 89.29499999999999 9 90.51 10 89.945 11 90.06000000000002 12 90.565 13 86.74 14 87.595 unit "ms" value 90.0025 Run on new linux tc hw / vm loaner (ssh'd into the machine and running talos from terminal mirroring production): name "bloom-basic.html" replicates 0 54.789999999999964 1 214.23000000000002 2 143.695 3 201.41500000000002 4 194.84500000000003 5 43.035 6 212.43 7 40.07000000000002 8 165.12 9 47.35499999999999 10 146.09999999999997 11 187.78000000000003 12 40.670000000000016 13 59.150000000000034 14 49.670000000000016 unit "ms" value 54.410000000000025
Reporter | ||
Comment 9•7 years ago
|
||
Unsure if this helps, but it's a netstat capture taken during the perf-reftest-singletons talos suite.
Reporter | ||
Comment 10•7 years ago
|
||
On the existing buildbot linux hardware, cpu usage during talos is similar, so I don't believe that's an issue on the new hw.
Reporter | ||
Comment 11•7 years ago
|
||
Sample run on existing buildbot linux hardware loaner (initiated via terminal & mozharness mirroring production): "name": "bloom-basic.html", "replicates": [ 106.31500000000003, 101.89500000000001, 101.21000000000001, 93.385, 97.60500000000002, 92.215, 105.52000000000001, 93.55, 95.795, 94.21000000000001, 95.57, 101.30999999999999, 93.255, 100.57, 96.195 ], "unit": "ms", "value": 95.6825
Reporter | ||
Comment 12•7 years ago
|
||
about:support for existing talos buildbot linux hw
Reporter | ||
Comment 13•7 years ago
|
||
about:support for new talos taskcluster linux hw (vm)
Comment 14•7 years ago
|
||
it is interesting that webrender is not enabled on the new machines- possibly this is by design. :milan, can you look at the about:support from comment 13 and verify this looks right for what we should be testing with talos performance and graphics?
Flags: needinfo?(milan)
Reporter | ||
Comment 15•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > it is interesting that webrender is not enabled on the new machines- > possibly this is by design. > > :milan, can you look at the about:support from comment 13 and verify this > looks right for what we should be testing with talos performance and > graphics? I think it's just because I'm using an older mozhanress release url/download package on the new hw (comment 13) Firefox 58 but on the existing bb hardware (comment 14) I used a new release package today (Firefox 59).
Comment 16•7 years ago
|
||
that might be what is going on assuming we changed that in the last week.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > it is interesting that webrender is not enabled on the new machines- > possibly this is by design. > > :milan, can you look at the about:support from comment 13 and verify this > looks right for what we should be testing with talos performance and > graphics? You'd need to force enable acceleration on Linux in order for WebRender to be available (and then enabled WebRender if you actually want to use it.)
Flags: needinfo?(milan)
Comment 18•7 years ago
|
||
I am not sure if that makes a difference in our reliability- just thought I would point it out.
Comment 19•7 years ago
|
||
Looking at https://bug1416501.bmoattachments.org/attachment.cgi?id=8928211 it looks like we're a bit behind on the Intel graphics driver. Currently reporting 12.0.6, but the 2017Q3 Intel graphics stack (https://01.org/linuxgraphics/downloads/2017q3-intel-graphics-stack-recipe) has 17.1.0. I suspect other things there are also older. Is there a requirement for the older version, or should we update that?
Comment 20•7 years ago
|
||
I don't think there is a specific version required. :milan- do you have a specific version or set of features on the intel graphics driver that we use in the new hardware for performance?
Flags: needinfo?(milan)
We don't currently have a minimum version, but that may show up if we start discovering issues specific to some driver versions. I'd keep as much up to date as practical. I don't know how old what we currently have is, but it's probably good to be less than a year out of date, and up to date is the best.
Flags: needinfo?(milan)
Updated•7 years ago
|
Whiteboard: [PI:November] → [PI:December]
Comment 22•6 years ago
|
||
I think we can mark this as done?
Reporter | ||
Comment 23•6 years ago
|
||
Yes! It was resolved in Bug 1424465
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•