Closed Bug 1672794 Opened 4 years ago Closed 3 years ago

Stop summarising page load as geometric mean

Tracking

(firefox89 fixed)

Status:

RESOLVED FIXED

Milestone:

89 Branch

Tracking Flags:

Tracking

Status

firefox89

---

fixed

People

(Reporter: davehunt, Assigned: davehunt)

Details

Attachments

(1 file)

Bug 1672794 - Stop summarising page load as geometric mean 3 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 48 bytes, text/x-phabricator-request		Details \| Review

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Description

•

4 years ago

We have questioned in the past the value of summarising the page load results as a geometric mean. When we started measuring loadtime, we also started alerting on several subtests in addition to the suite level geomean (bug 1504227).

By removing the suite level value, all subtests will become tests in their own right. This means they will appear when calling Perfherder endpoints without needing to explicitly request subtests, which may impact dashboards such as health.graphics and arewefastyet.com.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 1

•

4 years ago

:igoldan can you think of any other side-effects of removing the suite-level value for page load tests?

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 2

•

4 years ago

•

Edited

Looking closer over the ingestion code, it seems there is at least one noticeable issue: the subtest signatures get decoupled. New signatures will be created for them, which will collect new data points separately.
All old signatures (both parent & child) will eventually be deleted (in approximately a year).

As we already know, decoupling perf data leaves our perf sheriffs blind for at least 24h. Given that we run our jobs more rarely now, this blind period could be higher, up to a couple of days.

Flags: needinfo?(igoldan)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 3

•

4 years ago

(In reply to Ionuț Goldan [:igoldan] from comment #2)

Looking closer over the ingestion code, it seems there is at least one noticeable issue: the subtest signatures get decoupled. New signatures will be created for them, which will collected new data points separately.
All old signatures (both parent & child) will eventually be deleted (in approximately a year).

As we already know, decoupling perf data leaves our perf sheriffs blind for at least 24h. Given that we run our jobs more rarely now, this blind period could be higher, up to a couple of days.

In that case, it would seem to make sense to do this in Raptor-browsertime before the desktop migration is complete, to minimise the impact on sheriffing. We could have the sheriffs manually check the mobile page load results for 1-2 days after this change.

Apologies for all the needinfos, but considering the impact of this change I wanted to make sure we're all in agreement that this is the correct course of action.

Flags: needinfo?(rjesup)

Flags: needinfo?(gmierz2)

Flags: needinfo?(dpalmeiro)

Flags: needinfo?(acreskey)

Denis Palmeiro [:denispal]

Comment 4

•

4 years ago

Sounds reasonable to me.

Flags: needinfo?(dpalmeiro)

Greg Mierzwinski [:sparky]

Comment 5

•

4 years ago

I think this should be fine, but I'm wondering if anyone has checked if any valid regressions/improvements were caught by the suite-level value but not by the subtests?

Flags: needinfo?(gmierz2)

Andrew Creskey [:acreskey]

Comment 6

•

4 years ago

This seems good to me.

(In reply to Greg Mierzwinski [:sparky] from comment #5)

I think this should be fine, but I'm wondering if anyone has checked if any valid regressions/improvements were caught by the suite-level value but not by the subtests?

Intuitively I think that we may catch more regressions/improvements via the individual subtests rather than the suite-level geomean.

I don't have a specific example to point at, but I have seen cases where significant changes in one metric were effectively masked via the geomean.

Flags: needinfo?(acreskey)

Randell Jesup [:jesup] (needinfo me)

Comment 7

•

4 years ago

Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).

Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.

Flags: needinfo?(rjesup)

Greg Mierzwinski [:sparky]

Comment 8

•

4 years ago

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)

Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).

I don't see why this would matter if we already know this happens and that we're looking to removing the geomean in any case. Furthermore, both the subtests and the geomean have alerting enabled. So even if the geomean didn't change and trigger an alert, the subtests would have. I disagree with spending time on any analysis there.

Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.

:jesup, can you file a separate bug for this one? We should check to make sure that fnbpaint and fcp always alert together before removing one or the other.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 9

•

4 years ago

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)

Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).

Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.

We currently only enable alerting on fcp and loadtime subtests, so as I understand there's little to no cost in measuring fnbpaint. When these subtests become standard tests (once the geomean summary is removed) we should ensure the alerting remains the same.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 10

•

4 years ago

Bumping this to P2 as this will improve the performance workflow by reducing the noise in alert summaries and compare view.

Priority: P3 → P2

Kimberly Sereduck :kimberlythegeek

Updated

•

3 years ago

Assignee: nobody → ksereduck

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Updated

•

3 years ago

Assignee: ksereduck → dave.hunt

Status: NEW → ASSIGNED

Priority: P2 → P1

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 11

•

3 years ago

Attached file Bug 1672794 - Stop summarising page load as geometric mean — Details

Pulsebot

Comment 12

•

3 years ago

Pushed by dhunt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6b49aa18592
Stop summarising page load as geometric mean r=perftest-reviewers,Bebe,sparky,igoldan

Andreea Pavel [:apavel]

Comment 13

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e6b49aa18592

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox89: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 89 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Stop summarising page load as geometric mean

Categories

(Testing :: Raptor, task, P1)

Tracking

(firefox89 fixed)

People

(Reporter: davehunt, Assigned: davehunt)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Comment 11

Comment 12

Comment 13

Attachment

General

Description

File Name

Content Type