Closed Bug 1672794 Opened 4 years ago Closed 3 years ago

Stop summarising page load as geometric mean

Categories

(Testing :: Raptor, task, P1)

task

Tracking

(firefox89 fixed)

RESOLVED FIXED
89 Branch
Tracking Status
firefox89 --- fixed

People

(Reporter: davehunt, Assigned: davehunt)

Details

Attachments

(1 file)

We have questioned in the past the value of summarising the page load results as a geometric mean. When we started measuring loadtime, we also started alerting on several subtests in addition to the suite level geomean (bug 1504227).

By removing the suite level value, all subtests will become tests in their own right. This means they will appear when calling Perfherder endpoints without needing to explicitly request subtests, which may impact dashboards such as health.graphics and arewefastyet.com.

:igoldan can you think of any other side-effects of removing the suite-level value for page load tests?

Flags: needinfo?(igoldan)

Looking closer over the ingestion code, it seems there is at least one noticeable issue: the subtest signatures get decoupled. New signatures will be created for them, which will collect new data points separately.
All old signatures (both parent & child) will eventually be deleted (in approximately a year).

As we already know, decoupling perf data leaves our perf sheriffs blind for at least 24h. Given that we run our jobs more rarely now, this blind period could be higher, up to a couple of days.

Flags: needinfo?(igoldan)

(In reply to Ionuț Goldan [:igoldan] from comment #2)

Looking closer over the ingestion code, it seems there is at least one noticeable issue: the subtest signatures get decoupled. New signatures will be created for them, which will collected new data points separately.
All old signatures (both parent & child) will eventually be deleted (in approximately a year).

As we already know, decoupling perf data leaves our perf sheriffs blind for at least 24h. Given that we run our jobs more rarely now, this blind period could be higher, up to a couple of days.

In that case, it would seem to make sense to do this in Raptor-browsertime before the desktop migration is complete, to minimise the impact on sheriffing. We could have the sheriffs manually check the mobile page load results for 1-2 days after this change.

Apologies for all the needinfos, but considering the impact of this change I wanted to make sure we're all in agreement that this is the correct course of action.

Flags: needinfo?(rjesup)
Flags: needinfo?(gmierz2)
Flags: needinfo?(dpalmeiro)
Flags: needinfo?(acreskey)

Sounds reasonable to me.

Flags: needinfo?(dpalmeiro)

I think this should be fine, but I'm wondering if anyone has checked if any valid regressions/improvements were caught by the suite-level value but not by the subtests?

Flags: needinfo?(gmierz2)

This seems good to me.

(In reply to Greg Mierzwinski [:sparky] from comment #5)

I think this should be fine, but I'm wondering if anyone has checked if any valid regressions/improvements were caught by the suite-level value but not by the subtests?

Intuitively I think that we may catch more regressions/improvements via the individual subtests rather than the suite-level geomean.

I don't have a specific example to point at, but I have seen cases where significant changes in one metric were effectively masked via the geomean.

Flags: needinfo?(acreskey)

Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).

Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.

Flags: needinfo?(rjesup)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)

Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).

I don't see why this would matter if we already know this happens and that we're looking to removing the geomean in any case. Furthermore, both the subtests and the geomean have alerting enabled. So even if the geomean didn't change and trigger an alert, the subtests would have. I disagree with spending time on any analysis there.

Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.

:jesup, can you file a separate bug for this one? We should check to make sure that fnbpaint and fcp always alert together before removing one or the other.

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)

Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).

Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.

We currently only enable alerting on fcp and loadtime subtests, so as I understand there's little to no cost in measuring fnbpaint. When these subtests become standard tests (once the geomean summary is removed) we should ensure the alerting remains the same.

Bumping this to P2 as this will improve the performance workflow by reducing the noise in alert summaries and compare view.

Priority: P3 → P2
Assignee: nobody → ksereduck
Assignee: ksereduck → dave.hunt
Status: NEW → ASSIGNED
Priority: P2 → P1
Pushed by dhunt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6b49aa18592
Stop summarising page load as geometric mean r=perftest-reviewers,Bebe,sparky,igoldan
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: