Closed Bug 1756123 Opened 2 years ago Closed 2 years ago

4.39 - 0.14% tart / tart (Windows) regression on Fri February 11 2022

Categories

(Testing :: Performance, defect, P3)

Desktop
Windows 10
defect

Tracking

(firefox-esr91 unaffected, firefox97 unaffected, firefox98 unaffected, firefox99 affected)

RESOLVED WONTFIX
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- unaffected
firefox98 --- unaffected
firefox99 --- affected

People

(Reporter: afinder, Unassigned)

References

(Regression)

Details

(4 keywords)

Perfherder has detected a talos performance regression from push b247e781ba93444c892e2e45f919600b82af8891. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
4% tart windows10-64-shippable-qr e10s fission stylo webrender-sw 2.36 -> 2.47
0.14% tart windows10-64-shippable-qr e10s fission stylo webrender-sw 2.48 -> 2.48

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Regressed by: 1748759
Priority: -- → P3
Priority: -- → P3

Hi Randell, could you please help with any updates on this regression?

Flags: needinfo?(rjesup)
Has Regression Range: --- → yes

Mike, Nika - any idea how this patch would cause this? Nika indicated in the reviews that Unsound_NumQueuedMessages always returned 0, and so it's removed in the patch. Perhaps this isn't the case? (Mike, could this impact the tart test this way if NumQueuedMessages isn't returning 0?)

None of the subtests are above "low" confidence, but the final value is high confidence... also, close to half the subtests are improvements (though perhaps slightly less large than average regressions), and noise is down a lot.

Flags: needinfo?(rjesup)
Flags: needinfo?(nika)
Flags: needinfo?(mconley)

I can't imagine that this regression has anything to do with Unsound_NumQueuedMessages. There should genuinely be no way that the value returned there would be changed, and the performance of doing a couple fewer virtual calls for it should have no impact.

The only differences I'm seeing here are that we now acquire the MessageChannel's mutex when asking IsClosed() (but I wouldn't expect us to be collecting memory reports at all during this particular test, and there should be no other callers of that method), and that we no longer update the atomic output_queue_length_ member of IPC::Channel when queueing/dequeueing messages (though I would expect that to only lead to performance improvements, and extremely slight ones at that).

Interestingly, looking at the graph it appears that the regression is for a similar-ish amount to the recent improvement reported in bug 1753436, although I have no idea how they could be related. Perhaps there's some optimizer threshold which we happen to be very near right now, and is causing small swings on this particular test?

Flags: needinfo?(nika)

I'm no data scientist, but looking at the subtests and the noise metric, I suspect what's happening here is that we're simply reducing the noise and that's causing a "truer" sense of the score here to emerge, and that truer sense is at a slightly higher score.

The actual absolute value of the regression in milliseconds is a fraction of a millisecond. I don't think it's worth the engineering time to really dig into what happened here. We should take the win on the noise metric, and take the apparent regression, and close this as WONTFIX.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(mconley)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.