Regression on AWFY Speedometer Inferno-TodoMVC and VueJS around October7
Categories
(Core :: IPC, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox105 | --- | unaffected |
firefox106 | --- | unaffected |
firefox107 | --- | wontfix |
firefox108 | --- | wontfix |
firefox109 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: nika)
References
(Regression)
Details
(Keywords: regression)
Attachments
(2 files)
Go to https://arewefastyet.com/win10/benchmarks/raptor-desktop-speedometer?numDays=60
See the following tests :
Inferno-TodoMVC
Inferno-TodoMVC/DeletingItems
Inferno-TodoMVC/DeletingItems/Sync
VueJS-TodoMVC
VueJS-TodoMVC/Adding100Items
VueJS-TodoMVC/Adding100Items/Async
VueJS-TodoMVC/CompletingAllItems
VueJS-TodoMVC/CompletingAllItems/Async
Regression range is this : https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=0f03945e3b362a0e5df2962c9ec093c5d1e03d1c&tochange=1807a36ff99f01abca1c37442fb5b344465bfbdf
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
What is the reasoning for using the JavaScript JIT component for this bug?
Updated•2 years ago
|
Comment 2•2 years ago
|
||
Fairly confident https://hg.mozilla.org/integration/autoland/rev/e336fca5b6fd811258aeee43e69d429d7b1e169b is not the source of the regression here.
Reporter | ||
Comment 3•2 years ago
•
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
What is the reasoning for using the JavaScript JIT component for this bug?
No particular reason, except that Speedometer is generally considered as "JS benchmark" and usually any improvements in it are caused by JS engine improvements.
For future, is there a component where regressions in Speedometer should be filed under?
Edit: more generally, is it worthwhile to file regression bugs from AWFY graphs? As in, if results from AWFY graphs are important enough, they would be monitored by the perf sheriffs or the JS team (for the "JS benchmarks").
Comment 4•2 years ago
|
||
(In reply to Mayank Bansal from comment #3)
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
What is the reasoning for using the JavaScript JIT component for this bug?
No particular reason, except that Speedometer is generally considered as "JS benchmark" and usually any improvements in it are caused by JS engine improvements.
For future, is there a component where regressions in Speedometer should be filed under?
https://browserbench.org/Speedometer2.0/ :
Speedometer is a browser benchmark that measures the responsiveness of Web applications.
I do not know if we have any "browser" component apart from Firefox General component. In which case, the problem should be identified before moving this bug to the correct component.
Edit: more generally, is it worthwhile to file regression bugs from AWFY graphs? As in, if results from AWFY graphs are important enough, they would be monitored by the perf sheriffs or the JS team (for the "JS benchmarks").
Yes, the sooner regressions are identified, the better in general.
Comment 5•2 years ago
|
||
My changes in bug 1794035 are not expected to change performance metrics. I pushed a try run reverting my changes in Bug 1794035, but the performance of the tests listed in comment 0 didn't change much.
Here is comparison of the commit on top of the regression range vs my try run. (If I clicked "Show only important changes", there's no result.)
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=autoland&newProject=try&newRevision=f25a10e96dae2bff9ad5cb07dbf9b379d11b2c0e&originalSignature=3912818&newSignature=3451139&framework=13&originalRevision=1807a36ff99f01abca1c37442fb5b344465bfbdf&page=1
Comment 6•2 years ago
|
||
I triggered some jobs on autoland, let's see if that tells us something.
Comment 7•2 years ago
|
||
Nika, retriggers point to bug 1792474. Any idea what's going on here?
Comment 8•2 years ago
|
||
Set release status flags based on info from the regressing bug 1792474
Updated•2 years ago
|
Assignee | ||
Comment 9•2 years ago
|
||
It looks like we're somehow blocking for a significant-enough amount of time that it shows up in profiles in the overlapped NtWriteFile
call (small profile segment: https://share.firefox.dev/3fZjW70). This is somewhat surprising to me, as it's very similar to what Chromium does in their mojo backend, and I generally wouldn't expect that WriteFile
call (https://searchfox.org/mozilla-central/rev/f118dae98073bc17efb8604a32abfcb7b4e05880/ipc/chromium/src/chrome/common/ipc_channel_win.cc#542-543) to block for a significant amount of time.
This will require more investigation to figure out what's going on.
Comment 10•2 years ago
|
||
:jld could the priority and severity on this be triaged?
Wondering what the impact is for the 107 release
Comment 11•2 years ago
|
||
Set release status flags based on info from the regressing bug 1792474
Comment 12•2 years ago
|
||
Nika, do you have an idea of how severe this might be for real-world non-benchmark use?
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 13•2 years ago
|
||
Marking as S3, as it mostly comes up when there's a very rapid spam of ipc messages such as in an artificial test case. I am optimistic it will not come up as a major impact on real-world code.
Updated•2 years ago
|
Assignee | ||
Comment 14•2 years ago
|
||
In profiles of the regressing benchmarks, it appears that there is some amount
of locking contention when large numbers of small messages messages over IPC.
While the bulk of this contention is probably caused by the OnIOCompleted
callbacks on the IO thread, there's not much we can do about those wake-ups.
This patch tries to reduce the contention by avoiding acquiring the mutex when
receiving IPC messages using a MutexSingleWriter-like pattern. By combining the
send mutex and IOThread capability, we can have members of the IPC channel
which are only able to be modified when both on the IO thread and holding the
send mutex, meaning they can be safely read when holding only one of those
capabilities.
I am hopeful that this will slightly improve performance, however it seems
likely that there will still be slowdowns due to other parts (like NtWriteFile)
not being easy to optimize.
Future changes such as using a shared memory ring buffer for IPC on windows, or
an approach like ipcz which uses shared memory to avoid signalling messages in
some situations, may improve performance here in the future.
Depends on D160531
Updated•2 years ago
|
Assignee | ||
Comment 15•2 years ago
|
||
In profiles of the regressing benchmarks, it appears that there is some amount
of locking contention when large numbers of small messages messages over IPC.
While the bulk of this contention is probably caused by the OnIOCompleted
callbacks on the IO thread, there's not much we can do about those wake-ups.
This patch tries to reduce the contention by avoiding acquiring the mutex when
receiving IPC messages using the ChannelCapability introduced in part 1.
I am hopeful that this will slightly improve performance, however it seems
likely that there will still be slowdowns due to other parts (like NtWriteFile)
not being easy to optimize.
Future changes such as using a shared memory ring buffer for IPC on windows, or
an approach like ipcz which uses shared memory to avoid signalling messages in
some situations, may improve performance here in the future.
Depends on D160532
Comment 16•2 years ago
|
||
Comment 17•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/45a45932cc6f
https://hg.mozilla.org/mozilla-central/rev/477fc05de84f
Updated•2 years ago
|
Comment 18•2 years ago
|
||
The patch landed in nightly and beta is affected.
:nika, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox108
towontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 19•2 years ago
|
||
This is a fairly large change which might have small performance improvements for some specific speedometer benchmarks. I'm inclined to not uplift the change unless we see substantial performance improvements from it.
Comment 21•2 years ago
|
||
I'm reopening this because the patches to mitigate it don't appear to have had an effect. Speedometer is a huge priority for us going into 2023 and I don't think we can afford to lose this much on it if it's at all avoidable.
Comment 22•2 years ago
|
||
Almost all of these messages are sent here which was added to address an intermittent failure in bug 1618386. Sending an IPC message for every single HTML Element focus (?) seems like a rather heavy fix for such an intermittent. I'm wondering if the direction we need to be looking is there.
Assignee | ||
Comment 23•2 years ago
|
||
Some initial testing for my ideas in bug 1804485 seems to be improving the Inferno
benchmark, by avoiding the issues which caused this regression. I'll try to post patches there soon, but that's the direction I'm currently looking to fix this regression better.
Assignee | ||
Comment 25•1 year ago
|
||
Yeah, if there's anything left here I don't think we're going to change it at this point. Let's close for now.
Description
•