1220804 - remote-tsvgx on autophone is bimodal for the composite tests, resulting in overall noise

Reporter

Description

•

10 years ago

there are 4 tests (composite*.svg): https://git.mozilla.org/?p=automation/ep1.git;a=tree;f=talos/tsvg;h=40f83cb02172a3d5c1716a331287369e53470b61;hb=579fa0b401f717f6449cb3ce656b3ed2746d1629 these are bi-modal on autophone: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,9ba98452b2b91a601962e875e73d8cff588a6957,1]&series=[mozilla-inbound,e4037106cbd2f46d3c803ced3e77d2214df41af8,1]&series=[mozilla-inbound,3e6c927bdeb7946e4388129f60629a93db186472,1]&series=[mozilla-inbound,871bd63c399fee019f4ed291b0861bb18610e226,1] this results in the overall score of svgx being much noisier than that of the panda version- we should investigate and fix this.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

10 years ago

some data from log files: 1;composite-scale.svg;451;373;136;117;132;132 1;composite-scale.svg;223;145;365;131;114;356 1;composite-scale.svg;427;129;136;105;124;113 1;composite-scale.svg;231;119;118;113;363;163 we actually drop the first value, and take the geometric_mean of the remaining values. If we took a median of the remaining values it would be more consistent. Maybe we should consider why we have a few 300+ numbers and a mostly <150 numbers.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

10 years ago

for both the pandas and autophone we are loading the pages from a remote server. I wonder if the autophone server has issues with file serving when other stuff is going on over the network? since tp4m is so reliably, I would think not.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

10 years ago

I see similar noise on the nexus-s for the summary, it is just a noisy set of composite tests instead of truly bi-modal. This indicates it is device specific

Bob Clary [:bc] (inactive)

Comment 4

•

10 years ago

A couple of those files are pretty big. I did notice some GC activity in the logcat but there didn't appear to be markers in logcat to tell what was going on. 522K gearflowers.svg 389K hixie-007.xml

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

10 years ago

I looked around for signs in the logcat, sadly the signs I see are the same ones that I see on the panda boards. One thought I had was the large files (gearflowers.svg, hixie-001.xml, and hixie-007.xml) might be having side effects on the other load times. Right now we have a 250ms delay between each cycle (currently we do 6 cycles), so maybe we could put the larger pages at the end of the cycle, and increase the delay to 500ms? Likewise we could try removing the really large pages and seeing what that does. I am open to trying a few things. I just don't know why this is so unique on the nexus-7. it makes me wonder if we would switch devices in the future if we would play this same game.

Bob Clary [:bc] (inactive)

Comment 6

•

10 years ago

The tests don't take an inordinate amount of time. Perhaps we could put them on some of the other devices which don't have too much load so we can get a better picture. What is the story with reporting with different devices/android versions? How would PerfHerder handle Nexus S, Nexus 4, Nexus 5 in addition to Nexus 7?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

10 years ago

Right now it will be keyed off of platform which is currently "android-4-3-armv7-api11". There is a bug on file to make treeherder use more than the raw platform name in the backend, sadly I cannot find it after 20 minutes inside of bugzilla, so I filed a new one, bug 1224571. If we have a device like the Nexus 4 which has a different version of android, then it will probably be a simple fix to run remote-tsvgx on there and compare. Honestly that seems like the right approach here to determine if it is device specific, then figure out how we want to handle the pattern of data on that device. :bc:, thoughts?

Bob Clary [:bc] (inactive)

Comment 8

•

10 years ago

Our breakdown is currently: nexus s Android 2.3 API9 nexus 4 Android 4.2 API11+ nexus 7 Android 4.3 API11+ nexus 5 Android 4.4 API11+ that should be sufficient to distinguish them for now?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 9

•

10 years ago

yes, I say lets go for it and get a weeks worth of data!

William Lachance (:wlach)

Comment 10

•

10 years ago

Yes, perfherder specifically looks for machine platform to distinguish platforms. So autophone can set that to whatever it likes if you want data to be organized differently. But as Joel said, the fact that the devices are running different versions of Android should be enough to distinguish them for now. I'd prefer to see what the outcome of your experiments are, as well as determining the requirements of Android perf testing in general before committing time to modifying Perfherder itself. I have a lot of other things to do right now...

Robert Wood [:rwood]

Comment 11

•

8 years ago

:bc, is this bug still a valid concern that should be kept open? Thanks :)

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Comment 12

•

8 years ago

While investigating the current status of our tests, I ran a couple of try runs on the production servers and autophone-4: production: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b42f3165db903d7bb137f6611e7399a82794fee9&group_state=expanded autophone-4: https://treeherder.allizom.org/#/jobs?repo=try&revision=b42f3165db903d7bb137f6611e7399a82794fee9&group_state=expanded production tp4m: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Btry,02c8bd1f934b0336c3a45f1c863313ef6aae77b6,1,3%5D&selected=%5Btry,02c8bd1f934b0336c3a45f1c863313ef6aae77b6,260638,134678930%5D staging tp4m: https://treeherder.allizom.org/perf.html#/graphs?series=%5Btry,02c8bd1f934b0336c3a45f1c863313ef6aae77b6,1,3%5D&selected=%5Btry,02c8bd1f934b0336c3a45f1c863313ef6aae77b6,388161,127835726%5D production tsvg: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Btry,b651e33205845624deea16fa9a2b9cfe9bcb9e0d,1,3%5D&selected=%5Btry,b651e33205845624deea16fa9a2b9cfe9bcb9e0d,260638,134678931%5D staging tsvg: https://treeherder.allizom.org/perf.html#/graphs?series=%5Btry,b651e33205845624deea16fa9a2b9cfe9bcb9e0d,1,3%5D&selected=%5Btry,b651e33205845624deea16fa9a2b9cfe9bcb9e0d,388161,127835725%5D One thing to remember is these results are from pairs of devices which can contribute to a bimodal result. But overall, these don't seem too bad and certainly not as bad as jmaher originally detected. I say: This isn't that much of a concern today but we can probably improve the noise by only using one device instead of two for the talos tests. jmaher?

Flags: needinfo?(bob) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

8 years ago

I don't see this as a concern, it is noisy, but not bi-modal- I think a single device would help reduce noise- should we make that change?

Flags: needinfo?(jmaher)

Bob Clary [:bc] (inactive)

Updated

•

8 years ago

Comment 14

•

8 years ago

Filed Bug 1405707 to reduce the number of devices to one. I'll go ahead and resolve this as wfm.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WORKSFORME

Bugzilla

remote-tsvgx on autophone is bimodal for the composite tests, resulting in overall noise

Categories

(Testing :: Talos, defect)

Tracking

(Not tracked)

People

(Reporter: jmaher, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14