Closed Bug 1363383 Opened 8 years ago Closed 4 months ago

Channel switching on Slack is slow

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Performance Impact ?

People

(Reporter: eoger, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(4 keywords)

Attachments

(2 files)

Switching channels on Slack feels slow.

Here is a profile where I switch channels:
https://perf-html.io/from-addon/calltree/?range=1.3431_2.2739&thread=3

I didn't assign a component, I wasn't really sure where this bug belonged.
Whiteboard: [qf]
Component: Untriaged → General
Edouard, how well does channel switching perform in Chrome?

btw, your link points to the profile viewer URL, not your specific profile.
Wow I feel like an idiot, here is a fresh profiler capture: https://perfht.ml/2psEe8Z
I'll try to make a video capture to compare both (but it's definitely faster on chrome).
Attached video firefox.webm
I realize that the chronometer is useless since I don't show any visual indication that I clicked.
Attached video chrome.webm
I just tested Firefox and Chrome using my iPhone's slo-mo camera. Chrome's channel switches are almost 2x faster than Firefox's. The Slack channel switch has a fade out/in transition. Most of the Firefox delay seems to happen before the fade in.
Component: General → JavaScript Engine
Whiteboard: [qf] → [qf:investigate:p1]
Assignee: nobody → kvijayan
Not a QRC bug so p2
Whiteboard: [qf:investigate:p1] → [qf:p2]
Keywords: perf
Kannan will attempt to deconstruct Slack switches into sub bugs.
Flags: needinfo?(kvijayan)
Whiteboard: [qf:p2] → [qf:p1]
Whiteboard: [qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
This is starting to get to me so I profiled it and a cursory glance seems to indicate, quelle surprise, it's spending a lot of time in page scripts.
Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]
Hey eoger,

Are you still using Slack in a browser tab and still experiencing this? I think we need an up-to-date profile here.
Flags: needinfo?(eoger)
Whiteboard: [qf:f61][qf:p1] → [qf:f64][qf:p1]
Updated profile from my machine: https://perfht.ml/2jYtmz7
Thanks, overholt.

I think I've isolated a single channel switch here:

https://perfht.ml/2jWLglM
Flags: needinfo?(eoger)
Whiteboard: [qf:f64][qf:p1] → [qf:p1:f64]
Assignee: kvijayan → sstangl
On Fedora Linux x86_64, Firefox channel switching on Slack does indeed still take about twice the time that Chrome takes. On my machine, that's about 400ms. Chrome takes an estimated 200ms -- but even that still feels janky, and if you press alt+down too quickly, it drops keys. So even if we reach Chrome's performance, it still doesn't really feel good.

Here's an uploaded profile on my machine: https://perfht.ml/2O83uOG

Of the 387ms recorded there, it's broken down as follows:

- 180ms for running Slack scripts. They are interpreter-only, so this is basically benchmarking interpreter performance. Chrome's interpreter is much faster than ours, so it's possible that this makes up much of the difference.

- 92ms resolving Promises tied to DOM mutation events, caused by keydown. Calls out to DOM. Runs deeply-nested (React?) JS code, also interpreter-only. DOM methods that stood out a bit are History.pushState(), dom::Element_Binding::get_ScrollTop(), and HTMLElement_Binding::focus().

- ~110ms executing some synthetic mouse-move scroll event, more interpreter-only scripts and DOM code.

Next I'll see how these segments line up with the equivalent events in Chrome.
Moving QF Target to qf:p1:f67, we may also be able to break this up into multiple bugs, some which could tart FF64.  Sean is still in the process of assessing this bug.
Whiteboard: [qf:p1:f64] → [qf:p1:f67]
Whiteboard: [qf:p1:f67] → [qf:p1:f67][qf:js:investigate]
Looking at Chrome's profile, for each of the phases, they take about half the time that we do. Nothing in particular stands out in their profile, but in ours, we have a lot of heavy DOM calls.

Fortunately, Slack ships the same code to Chrome and Firefox, so if you look at their stack charts, they match up exactly. It's very easy to tell where there's room for improvement.

The following things stand out:

1. The heaviest in one phase seems to be a getter of Element.scrollTop() which causes a reflow that takes 12ms -- Chrome also does a reflow but it takes negligible time.

2. Similarly a call to getBoundingClientRect() in the profile takes ~11ms in Firefox, while Chrome takes 2.5ms.

3. In general, in the first part (responding to the key press event), it looks like the leaf nodes of the chart are all DOM code, and our DOM code is just slower across-the-board compared to Chrome's.

4. The second part seems mostly dominated by JS code, where we take 7x the execution time of Chrome.

5. There are some heavy calls to mprotect() for GetPropICs that end up eating a bunch of time cumulatively.

6. There are multiple methods that Chrome finishes nearly-instantly (0.15ms) in which we spend a lot of time (~6.7ms) linking JitCode and calling ReprotectRegion() (mprotect again). CodeGenerator::link() alone on this function takes 3x the time that Chrome takes for the function's full execution.

Unfortunately or fortunately, depending on your view, these are the same categories that profiles usually show:

1. DOM code likely missing some caching opportunity compared to Chrome's implementation.
2. Excessive or inopportune Ion compilation heuristics.
3. On-main-thread linking and mprotect.

None of this is really Slack-specific, but Slack may be a useful testcase because it's so easy to reproduce and compare across browsers.
Since this bug has wound up being sort of a microcosm of everything that's possibly wrong with the compiler, I've set it to depend on Bug 1490849. When we wind up resolving Baseline compilation issues, it's worth re-profiling to see how the world has changed.

I still think this is a particularly useful test case, because it's so clear to compare the timing of each function call to the time taken by the equivalent function call in Chrome.
Depends on: 1490849
Whiteboard: [qf:p1:f67][qf:js:investigate] → [qf:js:investigate]

This should be re-measured after Jan's landing of changes to ion tuning and compilation times.

Flags: needinfo?(kvijayan)

I agree with Kannan's comment about re-measuring once Jan lands his changes. But for now this is not a priority for us and will mark a P3, and we change the priority as necessary once we re-measure.

Priority: -- → P3

Here is a profile captured today: https://perfht.ml/33AkYdT
Between the click even that triggers the channel selection change and the channel content getting painted, it takes 1.5s on my fast macbook pro. This feels pretty terrible.

Whiteboard: [qf:js:investigate] → [qf:js:investigate][qf:p1:responsiveness]

Here's a profile captured today: https://share.firefox.dev/3tR6ioB

Could someone from the js team take a look at this again. Most of the time is spent in JS.

Assignee: sstangl → nobody
Flags: needinfo?(sdetar)

This is another case where the hot code is megamorphic property access and ReactDOM code.

Iain will be looking into move these bugs in the near future.

Flags: needinfo?(sdetar)
Performance Impact: --- → P1
Whiteboard: [qf:js:investigate][qf:p1:responsiveness]

The Performance Priority Calculator has determined this bug's performance priority to be P1.

Platforms: [x] Windows [x] macOS [x] Linux
Impact on site: Causes noticeable jank
[x] Affects major website
[x] Able to reproduce locally
[x] Multiple reporters

Keywords: reproducible, top50
Severity: normal → S3

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:sdetar, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(sdetar)
Flags: needinfo?(sdetar)

Slack is a React-heavy website, and we did a lot of work to improve React performance during the sp3 push. Is this still a problem?

Performance Impact: high → ?

:florian would you be willing to provide a new profile for us to investigate if this is still an issue?

Flags: needinfo?(florian)
Performance Impact: ? → pending-needinfo

(In reply to Frank Doty [:fdoty] from comment #27)

:florian would you be willing to provide a new profile for us to investigate if this is still an issue?

https://share.firefox.dev/4a7BWCc

Flags: needinfo?(florian)

Visually comparing the size/frequency of the angry red bars in that profile vs the older profiles, it looks like there's a lot less jank here now. Not none, but a definite improvement. Anecdotally, clicking around in slack also feels faster than I remember it feeling a couple years ago, although memory is fallible.

Somebody with a better eye than me should figure out whether anything here still counts as "noticeable jank".

Performance Impact: pending-needinfo → ?

(In reply to Iain Ireland [:iain] from comment #29)

Visually comparing the size/frequency of the angry red bars in that profile vs the older profiles, it looks like there's a lot less jank here now. Not none, but a definite improvement.

The M1 Max CPU of my current macbook is also dramatically faster than the CPU I had in my Intel Macbook 3 years ago.

Looking at slack on Firefox vs Chrome I get the same experience on both. It seems like this isn't an issue anymore.

This bug doesn’t seem to happen anymore in current versions of Firefox. Please reopen or file a new bug if you see it again.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: