Stop summarising page load as geometric mean
Categories
(Testing :: Raptor, task, P1)
Tracking
(firefox89 fixed)
Tracking | Status | |
---|---|---|
firefox89 | --- | fixed |
People
(Reporter: davehunt, Assigned: davehunt)
Details
Attachments
(1 file)
We have questioned in the past the value of summarising the page load results as a geometric mean. When we started measuring loadtime, we also started alerting on several subtests in addition to the suite level geomean (bug 1504227).
By removing the suite level value, all subtests will become tests in their own right. This means they will appear when calling Perfherder endpoints without needing to explicitly request subtests, which may impact dashboards such as health.graphics and arewefastyet.com.
Assignee | ||
Comment 1•4 years ago
|
||
:igoldan can you think of any other side-effects of removing the suite-level value for page load tests?
Comment 2•4 years ago
•
|
||
Looking closer over the ingestion code, it seems there is at least one noticeable issue: the subtest signatures get decoupled. New signatures will be created for them, which will collect new data points separately.
All old signatures (both parent & child) will eventually be deleted (in approximately a year).
As we already know, decoupling perf data leaves our perf sheriffs blind for at least 24h. Given that we run our jobs more rarely now, this blind period could be higher, up to a couple of days.
Assignee | ||
Comment 3•4 years ago
|
||
(In reply to Ionuț Goldan [:igoldan] from comment #2)
Looking closer over the ingestion code, it seems there is at least one noticeable issue: the subtest signatures get decoupled. New signatures will be created for them, which will collected new data points separately.
All old signatures (both parent & child) will eventually be deleted (in approximately a year).As we already know, decoupling perf data leaves our perf sheriffs blind for at least 24h. Given that we run our jobs more rarely now, this blind period could be higher, up to a couple of days.
In that case, it would seem to make sense to do this in Raptor-browsertime before the desktop migration is complete, to minimise the impact on sheriffing. We could have the sheriffs manually check the mobile page load results for 1-2 days after this change.
Apologies for all the needinfos, but considering the impact of this change I wanted to make sure we're all in agreement that this is the correct course of action.
Comment 5•4 years ago
|
||
I think this should be fine, but I'm wondering if anyone has checked if any valid regressions/improvements were caught by the suite-level value but not by the subtests?
Comment 6•4 years ago
|
||
This seems good to me.
(In reply to Greg Mierzwinski [:sparky] from comment #5)
I think this should be fine, but I'm wondering if anyone has checked if any valid regressions/improvements were caught by the suite-level value but not by the subtests?
Intuitively I think that we may catch more regressions/improvements via the individual subtests rather than the suite-level geomean.
I don't have a specific example to point at, but I have seen cases where significant changes in one metric were effectively masked via the geomean.
Comment 7•4 years ago
|
||
Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).
Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.
Comment 8•4 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)
Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).
I don't see why this would matter if we already know this happens and that we're looking to removing the geomean in any case. Furthermore, both the subtests and the geomean have alerting enabled. So even if the geomean didn't change and trigger an alert, the subtests would have. I disagree with spending time on any analysis there.
Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.
:jesup, can you file a separate bug for this one? We should check to make sure that fnbpaint and fcp always alert together before removing one or the other.
Assignee | ||
Comment 9•4 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)
Agreed: We should check how many new alerts would fire that didn't due to the effect of the geomean (and if these were actual regressions/improvements we'd care about at all).
Also, as has been mentioned before, fcp and fnbpaint are almost perfectly correlated, typically just with a small offset; if we're updating these I suggest dropping one.
We currently only enable alerting on fcp and loadtime subtests, so as I understand there's little to no cost in measuring fnbpaint. When these subtests become standard tests (once the geomean summary is removed) we should ensure the alerting remains the same.
Assignee | ||
Comment 10•4 years ago
|
||
Bumping this to P2 as this will improve the performance workflow by reducing the noise in alert summaries and compare view.
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 11•3 years ago
|
||
Comment 12•3 years ago
|
||
Pushed by dhunt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e6b49aa18592 Stop summarising page load as geometric mean r=perftest-reviewers,Bebe,sparky,igoldan
Comment 13•3 years ago
|
||
bugherder |
Description
•