Closed Bug 1794282 Opened 2 years ago Closed 1 year ago

Regression on AWFY Speedometer Inferno-TodoMVC and VueJS around October7

Categories

(Core :: IPC, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
109 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox105 --- unaffected
firefox106 --- unaffected
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 --- fixed

People

(Reporter: mayankleoboy1, Assigned: nika)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Go to https://arewefastyet.com/win10/benchmarks/raptor-desktop-speedometer?numDays=60

See the following tests :

Inferno-TodoMVC
Inferno-TodoMVC/DeletingItems
Inferno-TodoMVC/DeletingItems/Sync
VueJS-TodoMVC
VueJS-TodoMVC/Adding100Items
VueJS-TodoMVC/Adding100Items/Async
VueJS-TodoMVC/CompletingAllItems
VueJS-TodoMVC/CompletingAllItems/Async

Regression range is this : https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=0f03945e3b362a0e5df2962c9ec093c5d1e03d1c&tochange=1807a36ff99f01abca1c37442fb5b344465bfbdf

Summary: Regression on AWFY Speedometer Inferno-TodoMVC around October7 → Regression on AWFY Speedometer Inferno-TodoMVC and VueJS around October7

What is the reasoning for using the JavaScript JIT component for this bug?

Flags: needinfo?(mayankleoboy1)
Flags: needinfo?(nika)
Flags: needinfo?(emilio)
Flags: needinfo?(aethanyc)

Fairly confident https://hg.mozilla.org/integration/autoland/rev/e336fca5b6fd811258aeee43e69d429d7b1e169b is not the source of the regression here.

Flags: needinfo?(emilio)

(In reply to Nicolas B. Pierron [:nbp] from comment #1)

What is the reasoning for using the JavaScript JIT component for this bug?

No particular reason, except that Speedometer is generally considered as "JS benchmark" and usually any improvements in it are caused by JS engine improvements.
For future, is there a component where regressions in Speedometer should be filed under?

Edit: more generally, is it worthwhile to file regression bugs from AWFY graphs? As in, if results from AWFY graphs are important enough, they would be monitored by the perf sheriffs or the JS team (for the "JS benchmarks").

Flags: needinfo?(mayankleoboy1) → needinfo?(nicolas.b.pierron)

(In reply to Mayank Bansal from comment #3)

(In reply to Nicolas B. Pierron [:nbp] from comment #1)

What is the reasoning for using the JavaScript JIT component for this bug?

No particular reason, except that Speedometer is generally considered as "JS benchmark" and usually any improvements in it are caused by JS engine improvements.
For future, is there a component where regressions in Speedometer should be filed under?

https://browserbench.org/Speedometer2.0/ :

Speedometer is a browser benchmark that measures the responsiveness of Web applications.

I do not know if we have any "browser" component apart from Firefox General component. In which case, the problem should be identified before moving this bug to the correct component.

Edit: more generally, is it worthwhile to file regression bugs from AWFY graphs? As in, if results from AWFY graphs are important enough, they would be monitored by the perf sheriffs or the JS team (for the "JS benchmarks").

Yes, the sooner regressions are identified, the better in general.

Flags: needinfo?(nicolas.b.pierron)

My changes in bug 1794035 are not expected to change performance metrics. I pushed a try run reverting my changes in Bug 1794035, but the performance of the tests listed in comment 0 didn't change much.

Here is comparison of the commit on top of the regression range vs my try run. (If I clicked "Show only important changes", there's no result.)
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=autoland&newProject=try&newRevision=f25a10e96dae2bff9ad5cb07dbf9b379d11b2c0e&originalSignature=3912818&newSignature=3451139&framework=13&originalRevision=1807a36ff99f01abca1c37442fb5b344465bfbdf&page=1

Flags: needinfo?(aethanyc)

I triggered some jobs on autoland, let's see if that tells us something.

Nika, retriggers point to bug 1792474. Any idea what's going on here?

Regressed by: 1792474

Set release status flags based on info from the regressing bug 1792474

Component: JavaScript Engine: JIT → IPC

It looks like we're somehow blocking for a significant-enough amount of time that it shows up in profiles in the overlapped NtWriteFile call (small profile segment: https://share.firefox.dev/3fZjW70). This is somewhat surprising to me, as it's very similar to what Chromium does in their mojo backend, and I generally wouldn't expect that WriteFile call (https://searchfox.org/mozilla-central/rev/f118dae98073bc17efb8604a32abfcb7b4e05880/ipc/chromium/src/chrome/common/ipc_channel_win.cc#542-543) to block for a significant amount of time.

This will require more investigation to figure out what's going on.

:jld could the priority and severity on this be triaged?
Wondering what the impact is for the 107 release

Flags: needinfo?(jld)

Set release status flags based on info from the regressing bug 1792474

Nika, do you have an idea of how severe this might be for real-world non-benchmark use?

Flags: needinfo?(jld)
Assignee: nobody → nika
Flags: needinfo?(nika)

Marking as S3, as it mostly comes up when there's a very rapid spam of ipc messages such as in an artificial test case. I am optimistic it will not come up as a major impact on real-world code.

Severity: -- → S3
Priority: -- → P3

In profiles of the regressing benchmarks, it appears that there is some amount
of locking contention when large numbers of small messages messages over IPC.
While the bulk of this contention is probably caused by the OnIOCompleted
callbacks on the IO thread, there's not much we can do about those wake-ups.

This patch tries to reduce the contention by avoiding acquiring the mutex when
receiving IPC messages using a MutexSingleWriter-like pattern. By combining the
send mutex and IOThread capability, we can have members of the IPC channel
which are only able to be modified when both on the IO thread and holding the
send mutex, meaning they can be safely read when holding only one of those
capabilities.

I am hopeful that this will slightly improve performance, however it seems
likely that there will still be slowdowns due to other parts (like NtWriteFile)
not being easy to optimize.

Future changes such as using a shared memory ring buffer for IPC on windows, or
an approach like ipcz which uses shared memory to avoid signalling messages in
some situations, may improve performance here in the future.

Depends on D160531

Attachment #9300598 - Attachment description: Bug 1794282 - Reduce locking contention in IPC::Channel, r=#ipc-reviewers → Bug 1794282 - Part 1: Introduce IPC::ChannelCapability, r=#ipc-reviewers

In profiles of the regressing benchmarks, it appears that there is some amount
of locking contention when large numbers of small messages messages over IPC.
While the bulk of this contention is probably caused by the OnIOCompleted
callbacks on the IO thread, there's not much we can do about those wake-ups.

This patch tries to reduce the contention by avoiding acquiring the mutex when
receiving IPC messages using the ChannelCapability introduced in part 1.

I am hopeful that this will slightly improve performance, however it seems
likely that there will still be slowdowns due to other parts (like NtWriteFile)
not being easy to optimize.

Future changes such as using a shared memory ring buffer for IPC on windows, or
an approach like ipcz which uses shared memory to avoid signalling messages in
some situations, may improve performance here in the future.

Depends on D160532

Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/45a45932cc6f Part 1: Introduce IPC::ChannelCapability, r=ipc-reviewers,mccr8 https://hg.mozilla.org/integration/autoland/rev/477fc05de84f Part 2: Reduce locking contention in IPC::Channel, r=ipc-reviewers,mccr8
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch

The patch landed in nightly and beta is affected.
:nika, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox108 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(nika)

This is a fairly large change which might have small performance improvements for some specific speedometer benchmarks. I'm inclined to not uplift the change unless we see substantial performance improvements from it.

Flags: needinfo?(nika)
Duplicate of this bug: 1804335

I'm reopening this because the patches to mitigate it don't appear to have had an effect. Speedometer is a huge priority for us going into 2023 and I don't think we can afford to lose this much on it if it's at all avoidable.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Almost all of these messages are sent here which was added to address an intermittent failure in bug 1618386. Sending an IPC message for every single HTML Element focus (?) seems like a rather heavy fix for such an intermittent. I'm wondering if the direction we need to be looking is there.

See Also: → 1804485

Some initial testing for my ideas in bug 1804485 seems to be improving the Inferno benchmark, by avoiding the issues which caused this regression. I'll try to post patches there soon, but that's the direction I'm currently looking to fix this regression better.

Should this bug be closed now?

Flags: needinfo?(nika)

Yeah, if there's anything left here I don't think we're going to change it at this point. Let's close for now.

Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Flags: needinfo?(nika)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: