Open Bug 1675458 Opened 4 years ago Updated 9 months ago

tp5n main_normal_netio (windows10-64-shippable, windows10-64-shippable-qr) alert that looks like an improvement

Categories

(Testing :: Performance, defect, P3)

Default
defect

Tracking

(firefox84 wontfix)

Tracking Status
firefox84 --- wontfix

People

(Reporter: alexandrui, Unassigned, NeedInfo)

Details

(4 keywords)

There's the below alert that looks like an improvement. tp5n main_normal_netio tests are testing the interaction between the browser and the file stored on local system. The metrics are preferably as low as possible, but not 0 because that means there's no interaction at all.
This alert notifies the changing of the tests results from > 0 to == 0 for jobs scheduled to run against autoland. The values > 0 between b631c11d4acfb and 6df4ac02675fb (the revisions of the 2 alerts) in this graph are the results of backfilling and retirggering. Without those jobs manually triggered by the sheriffs, the graph shows all-zeroes (look at the graph after 6df4ac02675fb).
I would understand that sometimes they are zeroes, but it is a bit weird that only the manually triggered jobs are > 0.

Mike, can you please look into this?

https://treeherder.mozilla.org/perfherder/alerts?id=27402

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
100% tp5n main_normal_netio windows10-64-shippable e10s stylo 1,678,413.17 -> 0.00
100% tp5n main_normal_netio windows10-64-shippable-qr e10s stylo webrender-sw 3,061,531.58 -> 0.00

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

Flags: needinfo?(mconley)
Priority: -- → P3

I'm not sure what to do about this bug. But if we're not planning to do anything, we should close it WONTFIX and move on.

It looks like this is the only alert for main_normal_netio in the last 12 months, and it's invalid because it's a drop to zero when the majority of data points are also zero. I would question if there's value running this test at all. The xperf tests are not well documented, which makes it difficult to make an informed decision. :sparky :igoldan any thoughts on the value of continuing to report/alert on these results?

Flags: needinfo?(igoldan)
Flags: needinfo?(gmierz2)

I can say that trying to maintain these tests is like going down the rabbit hole. We can uncover Windows infra dependency issues that are out of our expertise. (I experienced this once or twice.)

Flags: needinfo?(igoldan)

I agree with :igoldan, this can turn into a rabbit hole. We should disable it, but we should also file a follow-up bug so we can figure out exactly what we're losing from this test (or the purpose of this test). Maybe there's a different and more reliable method we could use more measuring fileio.

Flags: needinfo?(gmierz2) → needinfo?(dave.hunt)

Let's disable alerts for the test and raise a P3 bug to review our xperf/fileio tests.

Flags: needinfo?(dave.hunt)
Flags: needinfo?(aionescu)

Ionut, how can tp5n main_normal_netio test can be disabled?

Flags: needinfo?(aionescu) → needinfo?(igoldan)

So this subtest' s data is collected by the xperf package, which is integrated in Talos.
I don't think there's an explicit config for it, which we can simply toggle off.
Nor any literal mentioning of it throughout Talos' code base.

Rather, xperf is dynamically collecting a bunch of metrics such as the one above & dumping them in the logs.
I think you should identify the data structure that's collecting these metrics & filter out tp5n main_normal_netio. That way, this subtest' s data won't be dumped. Thus, it won't be reported, making the test appear disabled.

Flags: needinfo?(igoldan)
You need to log in before you can comment on or make changes to this bug.